[GH-ISSUE #14836] GGML_ASSERT(ggml_nbytes(src0) <= INT_MAX) failed when loading gpt-oss:20b (MXFP4) on CUDA — runner crash #56086

Open
opened 2026-04-29 10:14:57 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @MonjushaPreeti on GitHub (Mar 13, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/14836

What is the issue?

Bug description

Loading the model gpt-oss:20b (MXFP4, 20.9B parameters) causes the Ollama runner to crash with an assertion failure in the CUDA backend. The server returns HTTP 500 and "model failed to load".

Assertion that fails

//ml/backend/ggml/ggml/src/ggml-cuda/cpy.cu:396: GGML_ASSERT(ggml_nbytes(src0) <= INT_MAX) failed
So a tensor size exceeds INT_MAX (~2.1 GB) and the copy path assumes a 32-bit size.

Steps to reproduce

  1. Install Ollama 0.17.6.
  2. Pull model: ollama pull gpt-oss:20b
  3. Start server: OLLAMA_NUM_PARALLEL=8 OLLAMA_MAX_LOADED_MODELS=1 ollama serve 2>&1
  4. In another terminal: curl -s http://127.0.0.1:11434/api/generate -d '{"model":"gpt-oss:20b","prompt":"Hi","stream":false}'
  5. Runner crashes (SIGABRT, exit status 2); curl gets {"error":"model failed to load, ..."}.

Environment

  • Ollama version: 0.17.6
  • OS: Linux (aarch64)
  • GPU: NVIDIA GB10, 119.6 GiB VRAM, compute capability 12.1
  • Model: gpt-oss:20b, format GGUF, family gptoss, quantization MXFP4, size ~13.8 GB

Server log excerpt (from ollama serve terminal when the request is sent)

  • "requested context size too large for model" (num_ctx=262144, n_ctx_train=131072) — then flash attention enabled, load starts.
  • ggml: architecture=gptoss, file_type=MXFP4, then:
    • ggml-cuda/cpy.cu:396: GGML_ASSERT(ggml_nbytes(src0) <= INT_MAX) failed
  • Then: "do load request" error (EOF / connection refused to runner port), "Load failed", "llama runner terminated" error="exit status 2", and [GIN] 500 for POST "/api/generate".

Expected behavior

The model should load and respond to generate requests when enough GPU memory is available (e.g. 120 GB VRAM).

Possible fix

The CUDA copy path in ggml-cuda/cpy.cu (around line 396) should support tensor sizes larger than INT_MAX (e.g. use size_t or 64-bit size instead of int for the byte count).

Relevant log output

OLLAMA_NUM_PARALLEL=8 OLLAMA_MAX_LOADED_MODELS=1 ollama serve 2>&1
time=2026-03-13T16:33:42.398-06:00 level=INFO source=routes.go:1664 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:0 OLLAMA_DEBUG:INFO OLLAMA_EDITOR: OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:1 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/home/neelima/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NO_CLOUD:false OLLAMA_NUM_PARALLEL:8 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false OLLAMA_VULKAN:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
time=2026-03-13T16:33:42.398-06:00 level=INFO source=routes.go:1666 msg="Ollama cloud disabled: false"
time=2026-03-13T16:33:42.399-06:00 level=INFO source=images.go:477 msg="total blobs: 5"
time=2026-03-13T16:33:42.399-06:00 level=INFO source=images.go:484 msg="total unused blobs removed: 0"
time=2026-03-13T16:33:42.399-06:00 level=INFO source=routes.go:1719 msg="Listening on 127.0.0.1:11434 (version 0.17.6)"
time=2026-03-13T16:33:42.399-06:00 level=INFO source=runner.go:67 msg="discovering available GPUs..."
time=2026-03-13T16:33:42.400-06:00 level=INFO source=server.go:430 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 43959"
time=2026-03-13T16:33:42.790-06:00 level=INFO source=server.go:430 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 43717"
time=2026-03-13T16:33:43.105-06:00 level=INFO source=server.go:430 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 39151"
time=2026-03-13T16:33:43.105-06:00 level=INFO source=server.go:430 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 34887"
time=2026-03-13T16:33:43.566-06:00 level=INFO source=types.go:42 msg="inference compute" id=GPU-bceb0c12-9cfd-b91c-3f26-4bfbf5ed1344 filter_id="" library=CUDA compute=12.1 name=CUDA0 description="NVIDIA GB10" libdirs=ollama,cuda_v13 driver=13.0 pci_id=000f:01:00.0 type=iGPU total="119.6 GiB" available="105.6 GiB"
time=2026-03-13T16:33:43.566-06:00 level=INFO source=routes.go:1769 msg="vram-based default context" total_vram="119.6 GiB" default_num_ctx=262144
time=2026-03-13T16:34:03.266-06:00 level=INFO source=server.go:430 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 45275"
time=2026-03-13T16:34:03.732-06:00 level=WARN source=server.go:168 msg="requested context size too large for model" num_ctx=262144 n_ctx_train=131072
time=2026-03-13T16:34:03.732-06:00 level=INFO source=server.go:246 msg="enabling flash attention"
time=2026-03-13T16:34:03.733-06:00 level=INFO source=server.go:430 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --model /home/neelima/.ollama/models/blobs/sha256-e7b273f9636059a689e3ddcab3716e4f65abe0143ac978e46673ad0e52d09efb --port 38549"
time=2026-03-13T16:34:03.733-06:00 level=INFO source=sched.go:489 msg="system memory" total="119.6 GiB" free="105.6 GiB" free_swap="16.0 GiB"
time=2026-03-13T16:34:03.733-06:00 level=INFO source=sched.go:496 msg="gpu memory" id=GPU-bceb0c12-9cfd-b91c-3f26-4bfbf5ed1344 library=CUDA available="105.1 GiB" free="105.6 GiB" minimum="457.0 MiB" overhead="0 B"
time=2026-03-13T16:34:03.733-06:00 level=INFO source=server.go:757 msg="loading model" "model layers"=25 requested=-1
time=2026-03-13T16:34:03.745-06:00 level=INFO source=runner.go:1429 msg="starting ollama engine"
time=2026-03-13T16:34:03.745-06:00 level=INFO source=runner.go:1464 msg="Server listening on 127.0.0.1:38549"
time=2026-03-13T16:34:03.756-06:00 level=INFO source=runner.go:1302 msg=load request="{Operation:fit LoraPath:[] Parallel:8 BatchSize:512 FlashAttention:Enabled KvSize:1048576 KvCacheType: NumThreads:20 GPULayers:25[ID:GPU-bceb0c12-9cfd-b91c-3f26-4bfbf5ed1344 Layers:25(0..24)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2026-03-13T16:34:03.798-06:00 level=INFO source=ggml.go:136 msg="" architecture=gptoss file_type=MXFP4 name="" description="" num_tensors=459 num_key_values=32
load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu.so
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GB10, compute capability 12.1, VMM: yes, ID: GPU-bceb0c12-9cfd-b91c-3f26-4bfbf5ed1344
load_backend: loaded CUDA backend from /usr/local/lib/ollama/cuda_v13/libggml-cuda.so
time=2026-03-13T16:34:04.099-06:00 level=INFO source=ggml.go:104 msg=system CPU.0.NEON=1 CPU.0.ARM_FMA=1 CPU.0.LLAMAFILE=1 CPU.1.NEON=1 CPU.1.ARM_FMA=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=750,800,860,870,890,900,1000,1030,1100,1200,1210 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(clang)
//ml/backend/ggml/ggml/src/ggml-cuda/cpy.cu:396: GGML_ASSERT(ggml_nbytes(src0) <= INT_MAX) failed
[New LWP 394762]
[New LWP 394761]
[New LWP 394760]
[New LWP 394759]
[New LWP 394758]
[New LWP 394747]
[New LWP 394743]
[New LWP 394742]
[New LWP 394741]
[New LWP 394740]
[New LWP 394739]
[New LWP 394738]
[New LWP 394737]
[New LWP 394736]
[New LWP 394735]
[New LWP 394734]
[New LWP 394733]

This GDB supports auto-downloading debuginfo from the following URLs:
  <https://debuginfod.ubuntu.com>
Enable debuginfod for this session? (y or [n]) [answered N; input not from terminal]
Debuginfod has been disabled.
To make this setting permanent, add 'set debuginfod enabled off' to .gdbinit.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/aarch64-linux-gnu/libthread_db.so.1".
0x0000c34ca208145c in ?? ()
#0  0x0000c34ca208145c in ?? ()
#1  0x0000000000000080 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
[Inferior 1 (process 394732) detached]
SIGABRT: abort
PC=0xebddc1837608 m=15 sigcode=18446744073709551610
signal arrived during cgo execution

goroutine 52 gp=0x4000505340 m=15 mp=0x4000600808 [syscall]:
runtime.cgocall(0xc34ca2dba634, 0x40017950a8)
        runtime/cgocall.go:167 +0x44 fp=0x4001795060 sp=0x4001795020 pc=0xc34ca2074ae4
github.com/ollama/ollama/ml/backend/ggml._Cfunc_ggml_backend_sched_reserve(0xc34cdfd16dd0, 0xebdaf4f9f170)
        _cgo_gotypes.go:1012 +0x34 fp=0x40017950a0 sp=0x4001795060 pc=0xc34ca24e6f04
github.com/ollama/ollama/ml/backend/ggml.(*Context).Reserve.func2(...)
        github.com/ollama/ollama/ml/backend/ggml/ggml.go:850
github.com/ollama/ollama/ml/backend/ggml.(*Context).Reserve(0x400059a080)
        github.com/ollama/ollama/ml/backend/ggml/ggml.go:850 +0xe0 fp=0x4001795330 sp=0x40017950a0 pc=0xc34ca24f18a0
github.com/ollama/ollama/runner/ollamarunner.(*Server).reserveWorstCaseGraph(0x40002550e0, 0x1)
        github.com/ollama/ollama/runner/ollamarunner/runner.go:1187 +0x834 fp=0x4001795660 sp=0x4001795330 pc=0xc34ca25e7424
github.com/ollama/ollama/runner/ollamarunner.(*Server).allocModel(0x40002550e0, {0xffffd418fa9e?, 0x0?}, {0x0, 0x14, {0x400059b900, 0x1, 0x1}, 0x1}, {0x0?, ...}, ...)
        github.com/ollama/ollama/runner/ollamarunner/runner.go:1250 +0x2e4 fp=0x4001795710 sp=0x4001795660 pc=0xc34ca25e7af4
github.com/ollama/ollama/runner/ollamarunner.(*Server).load(0x40002550e0, {0xc34ca36bd720, 0x40001f9500}, 0x400012a3c0)
        github.com/ollama/ollama/runner/ollamarunner/runner.go:1329 +0x460 fp=0x4001795aa0 sp=0x4001795710 pc=0xc34ca25e8400
github.com/ollama/ollama/runner/ollamarunner.(*Server).load-fm({0xc34ca36bd720?, 0x40001f9500?}, 0x4000513b28?)
        <autogenerated>:1 +0x40 fp=0x4001795ad0 sp=0x4001795aa0 pc=0xc34ca25ea2b0
net/http.HandlerFunc.ServeHTTP(0x40000fc3c0?, {0xc34ca36bd720?, 0x40001f9500?}, 0x4000513b10?)
        net/http/server.go:2294 +0x38 fp=0x4001795b00 sp=0x4001795ad0 pc=0xc34ca2331be8
net/http.(*ServeMux).ServeHTTP(0x10?, {0xc34ca36bd720, 0x40001f9500}, 0x400012a3c0)
        net/http/server.go:2822 +0x1b4 fp=0x4001795b50 sp=0x4001795b00 pc=0xc34ca2333774
net/http.serverHandler.ServeHTTP({0xc34ca36b9a10?}, {0xc34ca36bd720?, 0x40001f9500?}, 0x1?)
        net/http/server.go:3301 +0xbc fp=0x4001795b80 sp=0x4001795b50 pc=0xc34ca234f45c
net/http.(*conn).serve(0x400012c480, {0xc34ca36bfe78, 0x4000129380})
        net/http/server.go:2102 +0x52c fp=0x4001795fa0 sp=0x4001795b80 pc=0xc34ca233038c
net/http.(*Server).Serve.gowrap3()
        net/http/server.go:3454 +0x30 fp=0x4001795fd0 sp=0x4001795fa0 pc=0xc34ca2335550
runtime.goexit({})
        runtime/asm_arm64.s:1223 +0x4 fp=0x4001795fd0 sp=0x4001795fd0 pc=0xc34ca2080004
created by net/http.(*Server).Serve in goroutine 1
        net/http/server.go:3454 +0x3d8

goroutine 1 gp=0x40000021c0 m=nil [IO wait]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:435 +0xc8 fp=0x4001797710 sp=0x40017976f0 pc=0xc34ca2077ff8
runtime.netpollblock(0x7000000000?, 0x6?, 0x0?)
        runtime/netpoll.go:575 +0x158 fp=0x4001797750 sp=0x4001797710 pc=0xc34ca203cde8
internal/poll.runtime_pollWait(0xebdd7a4b7f30, 0x72)
        runtime/netpoll.go:351 +0xa0 fp=0x4001797780 sp=0x4001797750 pc=0xc34ca20771b0
internal/poll.(*pollDesc).wait(0x4000124780?, 0xc34ca2100428?, 0x0)
        internal/poll/fd_poll_runtime.go:84 +0x28 fp=0x40017977b0 sp=0x4001797780 pc=0xc34ca20f99c8
internal/poll.(*pollDesc).waitRead(...)
        internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Accept(0x4000124780)
        internal/poll/fd_unix.go:620 +0x24c fp=0x4001797860 sp=0x40017977b0 pc=0xc34ca20fe29c
net.(*netFD).accept(0x4000124780)
        net/fd_unix.go:172 +0x28 fp=0x4001797920 sp=0x4001797860 pc=0xc34ca216d108
net.(*TCPListener).accept(0x400059b780)
        net/tcpsock_posix.go:159 +0x24 fp=0x4001797970 sp=0x4001797920 pc=0xc34ca21825a4
net.(*TCPListener).Accept(0x400059b780)
        net/tcpsock.go:380 +0x2c fp=0x40017979b0 sp=0x4001797970 pc=0xc34ca218153c
net/http.(*onceCloseListener).Accept(0x400012c480?)
        <autogenerated>:1 +0x30 fp=0x40017979d0 sp=0x40017979b0 pc=0xc34ca235ba80
net/http.(*Server).Serve(0x4000507700, {0xc34ca36bd540, 0x400059b780})
        net/http/server.go:3424 +0x290 fp=0x4001797b00 sp=0x40017979d0 pc=0xc34ca23351c0
github.com/ollama/ollama/runner/ollamarunner.Execute({0x40000320a0, 0x4, 0x4})
        github.com/ollama/ollama/runner/ollamarunner/runner.go:1465 +0x7fc fp=0x4001797cd0 sp=0x4001797b00 pc=0xc34ca25e9cdc
github.com/ollama/ollama/runner.Execute({0x4000032080?, 0x0?, 0x0?})
        github.com/ollama/ollama/runner/runner.go:18 +0x14c fp=0x4001797d10 sp=0x4001797cd0 pc=0xc34ca2670b3c
github.com/ollama/ollama/cmd.NewCLI.func3(0x4000507300?, {0xc34ca30d1249?, 0x4?, 0xc34ca30d124d?})
        github.com/ollama/ollama/cmd/cmd.go:2259 +0x54 fp=0x4001797d40 sp=0x4001797d10 pc=0xc34ca2d692b4
github.com/spf13/cobra.(*Command).execute(0x4000131b08, {0x40004fe640, 0x5, 0x5})
        github.com/spf13/cobra@v1.7.0/command.go:940 +0x648 fp=0x4001797e60 sp=0x4001797d40 pc=0xc34ca21dce08
github.com/spf13/cobra.(*Command).ExecuteC(0x40004da908)
        github.com/spf13/cobra@v1.7.0/command.go:1068 +0x320 fp=0x4001797f20 sp=0x4001797e60 pc=0xc34ca21dd550
github.com/spf13/cobra.(*Command).Execute(...)
        github.com/spf13/cobra@v1.7.0/command.go:992
github.com/spf13/cobra.(*Command).ExecuteContext(...)
        github.com/spf13/cobra@v1.7.0/command.go:985
main.main()
        github.com/ollama/ollama/main.go:12 +0x54 fp=0x4001797f40 sp=0x4001797f20 pc=0xc34ca2d6b144
runtime.main()
        runtime/proc.go:283 +0x284 fp=0x4001797fd0 sp=0x4001797f40 pc=0xc34ca2044194
runtime.goexit({})
        runtime/asm_arm64.s:1223 +0x4 fp=0x4001797fd0 sp=0x4001797fd0 pc=0xc34ca2080004

goroutine 2 gp=0x4000002c40 m=nil [force gc (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:435 +0xc8 fp=0x400009af90 sp=0x400009af70 pc=0xc34ca2077ff8
runtime.goparkunlock(...)
        runtime/proc.go:441
runtime.forcegchelper()
        runtime/proc.go:348 +0xb8 fp=0x400009afd0 sp=0x400009af90 pc=0xc34ca20444e8
runtime.goexit({})
        runtime/asm_arm64.s:1223 +0x4 fp=0x400009afd0 sp=0x400009afd0 pc=0xc34ca2080004
created by runtime.init.7 in goroutine 1
        runtime/proc.go:336 +0x24

goroutine 3 gp=0x4000003180 m=nil [GC sweep wait]:
runtime.gopark(0x1?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:435 +0xc8 fp=0x400009b760 sp=0x400009b740 pc=0xc34ca2077ff8
runtime.goparkunlock(...)
        runtime/proc.go:441
runtime.bgsweep(0x4000042100)
        runtime/mgcsweep.go:316 +0x108 fp=0x400009b7b0 sp=0x400009b760 pc=0xc34ca202ed18
runtime.gcenable.gowrap1()
        runtime/mgc.go:204 +0x28 fp=0x400009b7d0 sp=0x400009b7b0 pc=0xc34ca2022b48
runtime.goexit({})
        runtime/asm_arm64.s:1223 +0x4 fp=0x400009b7d0 sp=0x400009b7d0 pc=0xc34ca2080004
created by runtime.gcenable in goroutine 1
        runtime/mgc.go:204 +0x6c

goroutine 4 gp=0x4000003340 m=nil [GC scavenge wait]:
runtime.gopark(0x10000?, 0xc34ca32e55d8?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:435 +0xc8 fp=0x400009bf60 sp=0x400009bf40 pc=0xc34ca2077ff8
runtime.goparkunlock(...)
        runtime/proc.go:441
runtime.(*scavengerState).park(0xc34ca409f560)
        runtime/mgcscavenge.go:425 +0x5c fp=0x400009bf90 sp=0x400009bf60 pc=0xc34ca202c7dc
runtime.bgscavenge(0x4000042100)
        runtime/mgcscavenge.go:658 +0xac fp=0x400009bfb0 sp=0x400009bf90 pc=0xc34ca202cd5c
runtime.gcenable.gowrap2()
        runtime/mgc.go:205 +0x28 fp=0x400009bfd0 sp=0x400009bfb0 pc=0xc34ca2022ae8
runtime.goexit({})
        runtime/asm_arm64.s:1223 +0x4 fp=0x400009bfd0 sp=0x400009bfd0 pc=0xc34ca2080004
created by runtime.gcenable in goroutine 1
        runtime/mgc.go:205 +0xac

goroutine 5 gp=0x4000003c00 m=nil [finalizer wait]:
runtime.gopark(0x18000001b8?, 0xebddc1778ef0?, 0x8?, 0x1?, 0x1c0?)
        runtime/proc.go:435 +0xc8 fp=0x400009a590 sp=0x400009a570 pc=0xc34ca2077ff8
runtime.runfinq()
        runtime/mfinal.go:196 +0x108 fp=0x400009a7d0 sp=0x400009a590 pc=0xc34ca2021b48
runtime.goexit({})
        runtime/asm_arm64.s:1223 +0x4 fp=0x400009a7d0 sp=0x400009a7d0 pc=0xc34ca2080004
created by runtime.createfing in goroutine 1
        runtime/mfinal.go:166 +0x80

goroutine 6 gp=0x4000210700 m=nil [chan receive]:
runtime.gopark(0x40001b1b80?, 0x4000508000?, 0x48?, 0xc7?, 0xc34ca2145628?)
        runtime/proc.go:435 +0xc8 fp=0x400009c6f0 sp=0x400009c6d0 pc=0xc34ca2077ff8
runtime.chanrecv(0x400004c380, 0x0, 0x1)
        runtime/chan.go:664 +0x42c fp=0x400009c770 sp=0x400009c6f0 pc=0xc34ca2013bac
runtime.chanrecv1(0x0?, 0x0?)
        runtime/chan.go:506 +0x14 fp=0x400009c7a0 sp=0x400009c770 pc=0xc34ca2013744
runtime.unique_runtime_registerUniqueMapCleanup.func2(...)
        runtime/mgc.go:1796
runtime.unique_runtime_registerUniqueMapCleanup.gowrap1()
        runtime/mgc.go:1799 +0x3c fp=0x400009c7d0 sp=0x400009c7a0 pc=0xc34ca2025d6c
runtime.goexit({})
        runtime/asm_arm64.s:1223 +0x4 fp=0x400009c7d0 sp=0x400009c7d0 pc=0xc34ca2080004
created by unique.runtime_registerUniqueMapCleanup in goroutine 1
        runtime/mgc.go:1794 +0x78

goroutine 7 gp=0x4000210a80 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:435 +0xc8 fp=0x400009cf10 sp=0x400009cef0 pc=0xc34ca2077ff8
runtime.gcBgMarkWorker(0x400004d7a0)
        runtime/mgc.go:1423 +0xdc fp=0x400009cfb0 sp=0x400009cf10 pc=0xc34ca2024fdc
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x28 fp=0x400009cfd0 sp=0x400009cfb0 pc=0xc34ca2024ec8
runtime.goexit({})
        runtime/asm_arm64.s:1223 +0x4 fp=0x400009cfd0 sp=0x400009cfd0 pc=0xc34ca2080004
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x140

goroutine 18 gp=0x4000102380 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:435 +0xc8 fp=0x4000096710 sp=0x40000966f0 pc=0xc34ca2077ff8
runtime.gcBgMarkWorker(0x400004d7a0)
        runtime/mgc.go:1423 +0xdc fp=0x40000967b0 sp=0x4000096710 pc=0xc34ca2024fdc
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x28 fp=0x40000967d0 sp=0x40000967b0 pc=0xc34ca2024ec8
runtime.goexit({})
        runtime/asm_arm64.s:1223 +0x4 fp=0x40000967d0 sp=0x40000967d0 pc=0xc34ca2080004
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x140

goroutine 19 gp=0x4000102540 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:435 +0xc8 fp=0x4000096f10 sp=0x4000096ef0 pc=0xc34ca2077ff8
runtime.gcBgMarkWorker(0x400004d7a0)
        runtime/mgc.go:1423 +0xdc fp=0x4000096fb0 sp=0x4000096f10 pc=0xc34ca2024fdc
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x28 fp=0x4000096fd0 sp=0x4000096fb0 pc=0xc34ca2024ec8
runtime.goexit({})
        runtime/asm_arm64.s:1223 +0x4 fp=0x4000096fd0 sp=0x4000096fd0 pc=0xc34ca2080004
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x140

goroutine 8 gp=0x4000210c40 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:435 +0xc8 fp=0x400009d710 sp=0x400009d6f0 pc=0xc34ca2077ff8
runtime.gcBgMarkWorker(0x400004d7a0)
        runtime/mgc.go:1423 +0xdc fp=0x400009d7b0 sp=0x400009d710 pc=0xc34ca2024fdc
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x28 fp=0x400009d7d0 sp=0x400009d7b0 pc=0xc34ca2024ec8
runtime.goexit({})
        runtime/asm_arm64.s:1223 +0x4 fp=0x400009d7d0 sp=0x400009d7d0 pc=0xc34ca2080004
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x140

goroutine 9 gp=0x4000210e00 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:435 +0xc8 fp=0x400009df10 sp=0x400009def0 pc=0xc34ca2077ff8
runtime.gcBgMarkWorker(0x400004d7a0)
        runtime/mgc.go:1423 +0xdc fp=0x400009dfb0 sp=0x400009df10 pc=0xc34ca2024fdc
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x28 fp=0x400009dfd0 sp=0x400009dfb0 pc=0xc34ca2024ec8
runtime.goexit({})
        runtime/asm_arm64.s:1223 +0x4 fp=0x400009dfd0 sp=0x400009dfd0 pc=0xc34ca2080004
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x140

goroutine 10 gp=0x4000210fc0 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:435 +0xc8 fp=0x40004aa710 sp=0x40004aa6f0 pc=0xc34ca2077ff8
runtime.gcBgMarkWorker(0x400004d7a0)
        runtime/mgc.go:1423 +0xdc fp=0x40004aa7b0 sp=0x40004aa710 pc=0xc34ca2024fdc
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x28 fp=0x40004aa7d0 sp=0x40004aa7b0 pc=0xc34ca2024ec8
runtime.goexit({})
        runtime/asm_arm64.s:1223 +0x4 fp=0x40004aa7d0 sp=0x40004aa7d0 pc=0xc34ca2080004
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x140

goroutine 11 gp=0x4000211180 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:435 +0xc8 fp=0x40004aaf10 sp=0x40004aaef0 pc=0xc34ca2077ff8
runtime.gcBgMarkWorker(0x400004d7a0)
        runtime/mgc.go:1423 +0xdc fp=0x40004aafb0 sp=0x40004aaf10 pc=0xc34ca2024fdc
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x28 fp=0x40004aafd0 sp=0x40004aafb0 pc=0xc34ca2024ec8
runtime.goexit({})
        runtime/asm_arm64.s:1223 +0x4 fp=0x40004aafd0 sp=0x40004aafd0 pc=0xc34ca2080004
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x140

goroutine 12 gp=0x4000211340 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:435 +0xc8 fp=0x40004ab710 sp=0x40004ab6f0 pc=0xc34ca2077ff8
runtime.gcBgMarkWorker(0x400004d7a0)
        runtime/mgc.go:1423 +0xdc fp=0x40004ab7b0 sp=0x40004ab710 pc=0xc34ca2024fdc
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x28 fp=0x40004ab7d0 sp=0x40004ab7b0 pc=0xc34ca2024ec8
runtime.goexit({})
        runtime/asm_arm64.s:1223 +0x4 fp=0x40004ab7d0 sp=0x40004ab7d0 pc=0xc34ca2080004
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x140

goroutine 34 gp=0x4000504000 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:435 +0xc8 fp=0x40004a6710 sp=0x40004a66f0 pc=0xc34ca2077ff8
runtime.gcBgMarkWorker(0x400004d7a0)
        runtime/mgc.go:1423 +0xdc fp=0x40004a67b0 sp=0x40004a6710 pc=0xc34ca2024fdc
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x28 fp=0x40004a67d0 sp=0x40004a67b0 pc=0xc34ca2024ec8
runtime.goexit({})
        runtime/asm_arm64.s:1223 +0x4 fp=0x40004a67d0 sp=0x40004a67d0 pc=0xc34ca2080004
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x140

goroutine 35 gp=0x40005041c0 m=nil [GC worker (idle)]:
runtime.gopark(0x2781f46708ce6?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:435 +0xc8 fp=0x40004a6f10 sp=0x40004a6ef0 pc=0xc34ca2077ff8
runtime.gcBgMarkWorker(0x400004d7a0)
        runtime/mgc.go:1423 +0xdc fp=0x40004a6fb0 sp=0x40004a6f10 pc=0xc34ca2024fdc
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x28 fp=0x40004a6fd0 sp=0x40004a6fb0 pc=0xc34ca2024ec8
runtime.goexit({})
        runtime/asm_arm64.s:1223 +0x4 fp=0x40004a6fd0 sp=0x40004a6fd0 pc=0xc34ca2080004
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x140

goroutine 20 gp=0x4000102700 m=nil [GC worker (idle)]:
runtime.gopark(0x2781f46701716?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:435 +0xc8 fp=0x4000097710 sp=0x40000976f0 pc=0xc34ca2077ff8
runtime.gcBgMarkWorker(0x400004d7a0)
        runtime/mgc.go:1423 +0xdc fp=0x40000977b0 sp=0x4000097710 pc=0xc34ca2024fdc
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x28 fp=0x40000977d0 sp=0x40000977b0 pc=0xc34ca2024ec8
runtime.goexit({})
        runtime/asm_arm64.s:1223 +0x4 fp=0x40000977d0 sp=0x40000977d0 pc=0xc34ca2080004
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x140

goroutine 13 gp=0x4000211500 m=nil [GC worker (idle)]:
runtime.gopark(0x2781f467bd775?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:435 +0xc8 fp=0x40004abf10 sp=0x40004abef0 pc=0xc34ca2077ff8
runtime.gcBgMarkWorker(0x400004d7a0)
        runtime/mgc.go:1423 +0xdc fp=0x40004abfb0 sp=0x40004abf10 pc=0xc34ca2024fdc
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x28 fp=0x40004abfd0 sp=0x40004abfb0 pc=0xc34ca2024ec8
runtime.goexit({})
        runtime/asm_arm64.s:1223 +0x4 fp=0x40004abfd0 sp=0x40004abfd0 pc=0xc34ca2080004
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x140

goroutine 14 gp=0x40002116c0 m=nil [GC worker (idle)]:
runtime.gopark(0xc34ca417a740?, 0x1?, 0x10?, 0xca?, 0x0?)
        runtime/proc.go:435 +0xc8 fp=0x40004ac710 sp=0x40004ac6f0 pc=0xc34ca2077ff8
runtime.gcBgMarkWorker(0x400004d7a0)
        runtime/mgc.go:1423 +0xdc fp=0x40004ac7b0 sp=0x40004ac710 pc=0xc34ca2024fdc
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x28 fp=0x40004ac7d0 sp=0x40004ac7b0 pc=0xc34ca2024ec8
runtime.goexit({})
        runtime/asm_arm64.s:1223 +0x4 fp=0x40004ac7d0 sp=0x40004ac7d0 pc=0xc34ca2080004
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x140

goroutine 21 gp=0x40001028c0 m=nil [GC worker (idle)]:
runtime.gopark(0xc34ca417a740?, 0x1?, 0xff?, 0x27?, 0x0?)
        runtime/proc.go:435 +0xc8 fp=0x4000097f10 sp=0x4000097ef0 pc=0xc34ca2077ff8
runtime.gcBgMarkWorker(0x400004d7a0)
        runtime/mgc.go:1423 +0xdc fp=0x4000097fb0 sp=0x4000097f10 pc=0xc34ca2024fdc
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x28 fp=0x4000097fd0 sp=0x4000097fb0 pc=0xc34ca2024ec8
runtime.goexit({})
        runtime/asm_arm64.s:1223 +0x4 fp=0x4000097fd0 sp=0x4000097fd0 pc=0xc34ca2080004
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x140

goroutine 36 gp=0x4000504380 m=nil [GC worker (idle)]:
runtime.gopark(0x2781f46776b85?, 0x1?, 0xa0?, 0x8d?, 0x0?)
        runtime/proc.go:435 +0xc8 fp=0x40004a7710 sp=0x40004a76f0 pc=0xc34ca2077ff8
runtime.gcBgMarkWorker(0x400004d7a0)
        runtime/mgc.go:1423 +0xdc fp=0x40004a77b0 sp=0x40004a7710 pc=0xc34ca2024fdc
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x28 fp=0x40004a77d0 sp=0x40004a77b0 pc=0xc34ca2024ec8
runtime.goexit({})
        runtime/asm_arm64.s:1223 +0x4 fp=0x40004a77d0 sp=0x40004a77d0 pc=0xc34ca2080004
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x140

goroutine 22 gp=0x4000102a80 m=nil [GC worker (idle)]:
runtime.gopark(0xc34ca417a740?, 0x1?, 0x3f?, 0x1c?, 0x0?)
        runtime/proc.go:435 +0xc8 fp=0x4000098710 sp=0x40000986f0 pc=0xc34ca2077ff8
runtime.gcBgMarkWorker(0x400004d7a0)
        runtime/mgc.go:1423 +0xdc fp=0x40000987b0 sp=0x4000098710 pc=0xc34ca2024fdc
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x28 fp=0x40000987d0 sp=0x40000987b0 pc=0xc34ca2024ec8
runtime.goexit({})
        runtime/asm_arm64.s:1223 +0x4 fp=0x40000987d0 sp=0x40000987d0 pc=0xc34ca2080004
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x140

goroutine 37 gp=0x4000504540 m=nil [GC worker (idle)]:
runtime.gopark(0x2781f467033b6?, 0x1?, 0xa0?, 0x47?, 0x0?)
        runtime/proc.go:435 +0xc8 fp=0x40004a7f10 sp=0x40004a7ef0 pc=0xc34ca2077ff8
runtime.gcBgMarkWorker(0x400004d7a0)
        runtime/mgc.go:1423 +0xdc fp=0x40004a7fb0 sp=0x40004a7f10 pc=0xc34ca2024fdc
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x28 fp=0x40004a7fd0 sp=0x40004a7fb0 pc=0xc34ca2024ec8
runtime.goexit({})
        runtime/asm_arm64.s:1223 +0x4 fp=0x40004a7fd0 sp=0x40004a7fd0 pc=0xc34ca2080004
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x140

goroutine 15 gp=0x4000211880 m=nil [GC worker (idle)]:
runtime.gopark(0x2781f46713815?, 0x1?, 0xa0?, 0x12?, 0x0?)
        runtime/proc.go:435 +0xc8 fp=0x40004acf10 sp=0x40004acef0 pc=0xc34ca2077ff8
runtime.gcBgMarkWorker(0x400004d7a0)
        runtime/mgc.go:1423 +0xdc fp=0x40004acfb0 sp=0x40004acf10 pc=0xc34ca2024fdc
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x28 fp=0x40004acfd0 sp=0x40004acfb0 pc=0xc34ca2024ec8
runtime.goexit({})
        runtime/asm_arm64.s:1223 +0x4 fp=0x40004acfd0 sp=0x40004acfd0 pc=0xc34ca2080004
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x140

goroutine 16 gp=0x4000211a40 m=nil [GC worker (idle)]:
runtime.gopark(0x2781f4670c0d6?, 0x3?, 0xe0?, 0x79?, 0x0?)
        runtime/proc.go:435 +0xc8 fp=0x40004ad710 sp=0x40004ad6f0 pc=0xc34ca2077ff8
runtime.gcBgMarkWorker(0x400004d7a0)
        runtime/mgc.go:1423 +0xdc fp=0x40004ad7b0 sp=0x40004ad710 pc=0xc34ca2024fdc
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x28 fp=0x40004ad7d0 sp=0x40004ad7b0 pc=0xc34ca2024ec8
runtime.goexit({})
        runtime/asm_arm64.s:1223 +0x4 fp=0x40004ad7d0 sp=0x40004ad7d0 pc=0xc34ca2080004
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x140

goroutine 50 gp=0x4000211c00 m=nil [GC worker (idle)]:
runtime.gopark(0xc34ca417a740?, 0x1?, 0xdf?, 0x72?, 0x0?)
        runtime/proc.go:435 +0xc8 fp=0x40004adf10 sp=0x40004adef0 pc=0xc34ca2077ff8
runtime.gcBgMarkWorker(0x400004d7a0)
        runtime/mgc.go:1423 +0xdc fp=0x40004adfb0 sp=0x40004adf10 pc=0xc34ca2024fdc
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x28 fp=0x40004adfd0 sp=0x40004adfb0 pc=0xc34ca2024ec8
runtime.goexit({})
        runtime/asm_arm64.s:1223 +0x4 fp=0x40004adfd0 sp=0x40004adfd0 pc=0xc34ca2080004
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x140

goroutine 51 gp=0x4000505180 m=nil [sync.WaitGroup.Wait]:
runtime.gopark(0xc34ca40b7d20?, 0x0?, 0x80?, 0xc1?, 0x0?)
        runtime/proc.go:435 +0xc8 fp=0x40000ada90 sp=0x40000ada70 pc=0xc34ca2077ff8
runtime.goparkunlock(...)
        runtime/proc.go:441
runtime.semacquire1(0x4000255198, 0x0, 0x1, 0x0, 0x18)
        runtime/sema.go:188 +0x204 fp=0x40000adae0 sp=0x40000ada90 pc=0xc34ca2058634
sync.runtime_SemacquireWaitGroup(0x0?)
        runtime/sema.go:110 +0x2c fp=0x40000adb20 sp=0x40000adae0 pc=0xc34ca20799ac
sync.(*WaitGroup).Wait(0x4000255190)
        sync/waitgroup.go:118 +0x70 fp=0x40000adb40 sp=0x40000adb20 pc=0xc34ca208b6e0
github.com/ollama/ollama/runner/ollamarunner.(*Server).run(0x40002550e0, {0xc34ca36bfeb0, 0x40004fe6e0})
        github.com/ollama/ollama/runner/ollamarunner/runner.go:442 +0x38 fp=0x40000adfa0 sp=0x40000adb40 pc=0xc34ca25e1db8
github.com/ollama/ollama/runner/ollamarunner.Execute.gowrap1()
        github.com/ollama/ollama/runner/ollamarunner/runner.go:1442 +0x30 fp=0x40000adfd0 sp=0x40000adfa0 pc=0xc34ca25e9f00
runtime.goexit({})
        runtime/asm_arm64.s:1223 +0x4 fp=0x40000adfd0 sp=0x40000adfd0 pc=0xc34ca2080004
created by github.com/ollama/ollama/runner/ollamarunner.Execute in goroutine 1
        github.com/ollama/ollama/runner/ollamarunner/runner.go:1442 +0x448

goroutine 53 gp=0x4000505500 m=nil [IO wait]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:435 +0xc8 fp=0x40004a8580 sp=0x40004a8560 pc=0xc34ca2077ff8
runtime.netpollblock(0x0?, 0xffffffff?, 0xff?)
        runtime/netpoll.go:575 +0x158 fp=0x40004a85c0 sp=0x40004a8580 pc=0xc34ca203cde8
internal/poll.runtime_pollWait(0xebdd7a4b7e18, 0x72)
        runtime/netpoll.go:351 +0xa0 fp=0x40004a85f0 sp=0x40004a85c0 pc=0xc34ca20771b0
internal/poll.(*pollDesc).wait(0x4000124800?, 0x4000129481?, 0x0)
        internal/poll/fd_poll_runtime.go:84 +0x28 fp=0x40004a8620 sp=0x40004a85f0 pc=0xc34ca20f99c8
internal/poll.(*pollDesc).waitRead(...)
        internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Read(0x4000124800, {0x4000129481, 0x1, 0x1})
        internal/poll/fd_unix.go:165 +0x1fc fp=0x40004a86c0 sp=0x40004a8620 pc=0xc34ca20fac7c
net.(*netFD).Read(0x4000124800, {0x4000129481?, 0x0?, 0x0?})
        net/fd_posix.go:55 +0x28 fp=0x40004a8710 sp=0x40004a86c0 pc=0xc34ca216b6d8
net.(*conn).Read(0x400009ea20, {0x4000129481?, 0x0?, 0x0?})
        net/net.go:194 +0x34 fp=0x40004a8760 sp=0x40004a8710 pc=0xc34ca2178e14
net/http.(*connReader).backgroundRead(0x4000129470)
        net/http/server.go:690 +0x40 fp=0x40004a87b0 sp=0x40004a8760 pc=0xc34ca232ad00
net/http.(*connReader).startBackgroundRead.gowrap2()
        net/http/server.go:686 +0x28 fp=0x40004a87d0 sp=0x40004a87b0 pc=0xc34ca232abe8
runtime.goexit({})
        runtime/asm_arm64.s:1223 +0x4 fp=0x40004a87d0 sp=0x40004a87d0 pc=0xc34ca2080004
created by net/http.(*connReader).startBackgroundRead in goroutine 52
        net/http/server.go:686 +0xc4

r0      0x0
r1      0x60609
r2      0x6
r3      0xebdaddcf0140
r4      0xebddc1dfcb50
r5      0x1
r6      0x20
r7      0xebdaddcee8d0
r8      0x83
r9      0x0
r10     0x53
r11     0x101010101010101
r12     0xc
r13     0x0
r14     0x0
r15     0x38
r16     0x1
r17     0xebddc17d7d0c
r18     0x3b0108e
r19     0x60609
r20     0xebdaddcf0140
r21     0x6
r22     0xebdd35e667a8
r23     0xebdaddcef690
r24     0xebdaf4f9da70
r25     0xc34cdfd17d68
r26     0xebdaf4f9da70
r27     0xebdade501760
r28     0x53
r29     0xebdaddcee860
lr      0xebddc18375f4
sp      0xebdaddcee850
pc      0xebddc1837608
fault   0x0
time=2026-03-13T16:34:06.149-06:00 level=ERROR source=server.go:1205 msg="do load request" error="Post \"http://127.0.0.1:38549/load\": EOF"
time=2026-03-13T16:34:06.150-06:00 level=ERROR source=server.go:1205 msg="do load request" error="Post \"http://127.0.0.1:38549/load\": dial tcp 127.0.0.1:38549: connect: connection refused"
time=2026-03-13T16:34:06.150-06:00 level=INFO source=sched.go:516 msg="Load failed" model=/home/neelima/.ollama/models/blobs/sha256-e7b273f9636059a689e3ddcab3716e4f65abe0143ac978e46673ad0e52d09efb error="model failed to load, this may be due to resource limitations or an internal error, check ollama server logs for details"
time=2026-03-13T16:34:06.247-06:00 level=ERROR source=server.go:303 msg="llama runner terminated" error="exit status 2"
[GIN] 2026/03/13 - 16:34:06 | 500 |  3.199960591s |       127.0.0.1 | POST     "/api/generate"

OS

Linux

GPU

Nvidia

CPU

Other

Ollama version

0.17.6

Originally created by @MonjushaPreeti on GitHub (Mar 13, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/14836 ### What is the issue? ### Bug description Loading the model `gpt-oss:20b` (MXFP4, 20.9B parameters) causes the Ollama runner to crash with an assertion failure in the CUDA backend. The server returns HTTP 500 and "model failed to load". ### Assertion that fails //ml/backend/ggml/ggml/src/ggml-cuda/cpy.cu:396: GGML_ASSERT(ggml_nbytes(src0) <= INT_MAX) failed So a tensor size exceeds INT_MAX (~2.1 GB) and the copy path assumes a 32-bit size. ### Steps to reproduce 1. Install Ollama 0.17.6. 2. Pull model: `ollama pull gpt-oss:20b` 3. Start server: `OLLAMA_NUM_PARALLEL=8 OLLAMA_MAX_LOADED_MODELS=1 ollama serve 2>&1` 4. In another terminal: `curl -s http://127.0.0.1:11434/api/generate -d '{"model":"gpt-oss:20b","prompt":"Hi","stream":false}'` 5. Runner crashes (SIGABRT, exit status 2); curl gets `{"error":"model failed to load, ..."}`. ### Environment - **Ollama version:** 0.17.6 - **OS:** Linux (aarch64) - **GPU:** NVIDIA GB10, 119.6 GiB VRAM, compute capability 12.1 - **Model:** gpt-oss:20b, format GGUF, family gptoss, quantization MXFP4, size ~13.8 GB ### Server log excerpt (from `ollama serve` terminal when the request is sent) - "requested context size too large for model" (num_ctx=262144, n_ctx_train=131072) — then flash attention enabled, load starts. - ggml: architecture=gptoss, file_type=MXFP4, then: - `ggml-cuda/cpy.cu:396: GGML_ASSERT(ggml_nbytes(src0) <= INT_MAX) failed` - Then: "do load request" error (EOF / connection refused to runner port), "Load failed", "llama runner terminated" error="exit status 2", and [GIN] 500 for POST "/api/generate". ### Expected behavior The model should load and respond to generate requests when enough GPU memory is available (e.g. 120 GB VRAM). ### Possible fix The CUDA copy path in `ggml-cuda/cpy.cu` (around line 396) should support tensor sizes larger than INT_MAX (e.g. use `size_t` or 64-bit size instead of `int` for the byte count). ### Relevant log output ```shell OLLAMA_NUM_PARALLEL=8 OLLAMA_MAX_LOADED_MODELS=1 ollama serve 2>&1 time=2026-03-13T16:33:42.398-06:00 level=INFO source=routes.go:1664 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:0 OLLAMA_DEBUG:INFO OLLAMA_EDITOR: OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:1 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/home/neelima/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NO_CLOUD:false OLLAMA_NUM_PARALLEL:8 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false OLLAMA_VULKAN:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]" time=2026-03-13T16:33:42.398-06:00 level=INFO source=routes.go:1666 msg="Ollama cloud disabled: false" time=2026-03-13T16:33:42.399-06:00 level=INFO source=images.go:477 msg="total blobs: 5" time=2026-03-13T16:33:42.399-06:00 level=INFO source=images.go:484 msg="total unused blobs removed: 0" time=2026-03-13T16:33:42.399-06:00 level=INFO source=routes.go:1719 msg="Listening on 127.0.0.1:11434 (version 0.17.6)" time=2026-03-13T16:33:42.399-06:00 level=INFO source=runner.go:67 msg="discovering available GPUs..." time=2026-03-13T16:33:42.400-06:00 level=INFO source=server.go:430 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 43959" time=2026-03-13T16:33:42.790-06:00 level=INFO source=server.go:430 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 43717" time=2026-03-13T16:33:43.105-06:00 level=INFO source=server.go:430 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 39151" time=2026-03-13T16:33:43.105-06:00 level=INFO source=server.go:430 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 34887" time=2026-03-13T16:33:43.566-06:00 level=INFO source=types.go:42 msg="inference compute" id=GPU-bceb0c12-9cfd-b91c-3f26-4bfbf5ed1344 filter_id="" library=CUDA compute=12.1 name=CUDA0 description="NVIDIA GB10" libdirs=ollama,cuda_v13 driver=13.0 pci_id=000f:01:00.0 type=iGPU total="119.6 GiB" available="105.6 GiB" time=2026-03-13T16:33:43.566-06:00 level=INFO source=routes.go:1769 msg="vram-based default context" total_vram="119.6 GiB" default_num_ctx=262144 time=2026-03-13T16:34:03.266-06:00 level=INFO source=server.go:430 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 45275" time=2026-03-13T16:34:03.732-06:00 level=WARN source=server.go:168 msg="requested context size too large for model" num_ctx=262144 n_ctx_train=131072 time=2026-03-13T16:34:03.732-06:00 level=INFO source=server.go:246 msg="enabling flash attention" time=2026-03-13T16:34:03.733-06:00 level=INFO source=server.go:430 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --model /home/neelima/.ollama/models/blobs/sha256-e7b273f9636059a689e3ddcab3716e4f65abe0143ac978e46673ad0e52d09efb --port 38549" time=2026-03-13T16:34:03.733-06:00 level=INFO source=sched.go:489 msg="system memory" total="119.6 GiB" free="105.6 GiB" free_swap="16.0 GiB" time=2026-03-13T16:34:03.733-06:00 level=INFO source=sched.go:496 msg="gpu memory" id=GPU-bceb0c12-9cfd-b91c-3f26-4bfbf5ed1344 library=CUDA available="105.1 GiB" free="105.6 GiB" minimum="457.0 MiB" overhead="0 B" time=2026-03-13T16:34:03.733-06:00 level=INFO source=server.go:757 msg="loading model" "model layers"=25 requested=-1 time=2026-03-13T16:34:03.745-06:00 level=INFO source=runner.go:1429 msg="starting ollama engine" time=2026-03-13T16:34:03.745-06:00 level=INFO source=runner.go:1464 msg="Server listening on 127.0.0.1:38549" time=2026-03-13T16:34:03.756-06:00 level=INFO source=runner.go:1302 msg=load request="{Operation:fit LoraPath:[] Parallel:8 BatchSize:512 FlashAttention:Enabled KvSize:1048576 KvCacheType: NumThreads:20 GPULayers:25[ID:GPU-bceb0c12-9cfd-b91c-3f26-4bfbf5ed1344 Layers:25(0..24)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2026-03-13T16:34:03.798-06:00 level=INFO source=ggml.go:136 msg="" architecture=gptoss file_type=MXFP4 name="" description="" num_tensors=459 num_key_values=32 load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu.so ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 CUDA devices: Device 0: NVIDIA GB10, compute capability 12.1, VMM: yes, ID: GPU-bceb0c12-9cfd-b91c-3f26-4bfbf5ed1344 load_backend: loaded CUDA backend from /usr/local/lib/ollama/cuda_v13/libggml-cuda.so time=2026-03-13T16:34:04.099-06:00 level=INFO source=ggml.go:104 msg=system CPU.0.NEON=1 CPU.0.ARM_FMA=1 CPU.0.LLAMAFILE=1 CPU.1.NEON=1 CPU.1.ARM_FMA=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=750,800,860,870,890,900,1000,1030,1100,1200,1210 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(clang) //ml/backend/ggml/ggml/src/ggml-cuda/cpy.cu:396: GGML_ASSERT(ggml_nbytes(src0) <= INT_MAX) failed [New LWP 394762] [New LWP 394761] [New LWP 394760] [New LWP 394759] [New LWP 394758] [New LWP 394747] [New LWP 394743] [New LWP 394742] [New LWP 394741] [New LWP 394740] [New LWP 394739] [New LWP 394738] [New LWP 394737] [New LWP 394736] [New LWP 394735] [New LWP 394734] [New LWP 394733] This GDB supports auto-downloading debuginfo from the following URLs: <https://debuginfod.ubuntu.com> Enable debuginfod for this session? (y or [n]) [answered N; input not from terminal] Debuginfod has been disabled. To make this setting permanent, add 'set debuginfod enabled off' to .gdbinit. [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/aarch64-linux-gnu/libthread_db.so.1". 0x0000c34ca208145c in ?? () #0 0x0000c34ca208145c in ?? () #1 0x0000000000000080 in ?? () Backtrace stopped: previous frame identical to this frame (corrupt stack?) [Inferior 1 (process 394732) detached] SIGABRT: abort PC=0xebddc1837608 m=15 sigcode=18446744073709551610 signal arrived during cgo execution goroutine 52 gp=0x4000505340 m=15 mp=0x4000600808 [syscall]: runtime.cgocall(0xc34ca2dba634, 0x40017950a8) runtime/cgocall.go:167 +0x44 fp=0x4001795060 sp=0x4001795020 pc=0xc34ca2074ae4 github.com/ollama/ollama/ml/backend/ggml._Cfunc_ggml_backend_sched_reserve(0xc34cdfd16dd0, 0xebdaf4f9f170) _cgo_gotypes.go:1012 +0x34 fp=0x40017950a0 sp=0x4001795060 pc=0xc34ca24e6f04 github.com/ollama/ollama/ml/backend/ggml.(*Context).Reserve.func2(...) github.com/ollama/ollama/ml/backend/ggml/ggml.go:850 github.com/ollama/ollama/ml/backend/ggml.(*Context).Reserve(0x400059a080) github.com/ollama/ollama/ml/backend/ggml/ggml.go:850 +0xe0 fp=0x4001795330 sp=0x40017950a0 pc=0xc34ca24f18a0 github.com/ollama/ollama/runner/ollamarunner.(*Server).reserveWorstCaseGraph(0x40002550e0, 0x1) github.com/ollama/ollama/runner/ollamarunner/runner.go:1187 +0x834 fp=0x4001795660 sp=0x4001795330 pc=0xc34ca25e7424 github.com/ollama/ollama/runner/ollamarunner.(*Server).allocModel(0x40002550e0, {0xffffd418fa9e?, 0x0?}, {0x0, 0x14, {0x400059b900, 0x1, 0x1}, 0x1}, {0x0?, ...}, ...) github.com/ollama/ollama/runner/ollamarunner/runner.go:1250 +0x2e4 fp=0x4001795710 sp=0x4001795660 pc=0xc34ca25e7af4 github.com/ollama/ollama/runner/ollamarunner.(*Server).load(0x40002550e0, {0xc34ca36bd720, 0x40001f9500}, 0x400012a3c0) github.com/ollama/ollama/runner/ollamarunner/runner.go:1329 +0x460 fp=0x4001795aa0 sp=0x4001795710 pc=0xc34ca25e8400 github.com/ollama/ollama/runner/ollamarunner.(*Server).load-fm({0xc34ca36bd720?, 0x40001f9500?}, 0x4000513b28?) <autogenerated>:1 +0x40 fp=0x4001795ad0 sp=0x4001795aa0 pc=0xc34ca25ea2b0 net/http.HandlerFunc.ServeHTTP(0x40000fc3c0?, {0xc34ca36bd720?, 0x40001f9500?}, 0x4000513b10?) net/http/server.go:2294 +0x38 fp=0x4001795b00 sp=0x4001795ad0 pc=0xc34ca2331be8 net/http.(*ServeMux).ServeHTTP(0x10?, {0xc34ca36bd720, 0x40001f9500}, 0x400012a3c0) net/http/server.go:2822 +0x1b4 fp=0x4001795b50 sp=0x4001795b00 pc=0xc34ca2333774 net/http.serverHandler.ServeHTTP({0xc34ca36b9a10?}, {0xc34ca36bd720?, 0x40001f9500?}, 0x1?) net/http/server.go:3301 +0xbc fp=0x4001795b80 sp=0x4001795b50 pc=0xc34ca234f45c net/http.(*conn).serve(0x400012c480, {0xc34ca36bfe78, 0x4000129380}) net/http/server.go:2102 +0x52c fp=0x4001795fa0 sp=0x4001795b80 pc=0xc34ca233038c net/http.(*Server).Serve.gowrap3() net/http/server.go:3454 +0x30 fp=0x4001795fd0 sp=0x4001795fa0 pc=0xc34ca2335550 runtime.goexit({}) runtime/asm_arm64.s:1223 +0x4 fp=0x4001795fd0 sp=0x4001795fd0 pc=0xc34ca2080004 created by net/http.(*Server).Serve in goroutine 1 net/http/server.go:3454 +0x3d8 goroutine 1 gp=0x40000021c0 m=nil [IO wait]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xc8 fp=0x4001797710 sp=0x40017976f0 pc=0xc34ca2077ff8 runtime.netpollblock(0x7000000000?, 0x6?, 0x0?) runtime/netpoll.go:575 +0x158 fp=0x4001797750 sp=0x4001797710 pc=0xc34ca203cde8 internal/poll.runtime_pollWait(0xebdd7a4b7f30, 0x72) runtime/netpoll.go:351 +0xa0 fp=0x4001797780 sp=0x4001797750 pc=0xc34ca20771b0 internal/poll.(*pollDesc).wait(0x4000124780?, 0xc34ca2100428?, 0x0) internal/poll/fd_poll_runtime.go:84 +0x28 fp=0x40017977b0 sp=0x4001797780 pc=0xc34ca20f99c8 internal/poll.(*pollDesc).waitRead(...) internal/poll/fd_poll_runtime.go:89 internal/poll.(*FD).Accept(0x4000124780) internal/poll/fd_unix.go:620 +0x24c fp=0x4001797860 sp=0x40017977b0 pc=0xc34ca20fe29c net.(*netFD).accept(0x4000124780) net/fd_unix.go:172 +0x28 fp=0x4001797920 sp=0x4001797860 pc=0xc34ca216d108 net.(*TCPListener).accept(0x400059b780) net/tcpsock_posix.go:159 +0x24 fp=0x4001797970 sp=0x4001797920 pc=0xc34ca21825a4 net.(*TCPListener).Accept(0x400059b780) net/tcpsock.go:380 +0x2c fp=0x40017979b0 sp=0x4001797970 pc=0xc34ca218153c net/http.(*onceCloseListener).Accept(0x400012c480?) <autogenerated>:1 +0x30 fp=0x40017979d0 sp=0x40017979b0 pc=0xc34ca235ba80 net/http.(*Server).Serve(0x4000507700, {0xc34ca36bd540, 0x400059b780}) net/http/server.go:3424 +0x290 fp=0x4001797b00 sp=0x40017979d0 pc=0xc34ca23351c0 github.com/ollama/ollama/runner/ollamarunner.Execute({0x40000320a0, 0x4, 0x4}) github.com/ollama/ollama/runner/ollamarunner/runner.go:1465 +0x7fc fp=0x4001797cd0 sp=0x4001797b00 pc=0xc34ca25e9cdc github.com/ollama/ollama/runner.Execute({0x4000032080?, 0x0?, 0x0?}) github.com/ollama/ollama/runner/runner.go:18 +0x14c fp=0x4001797d10 sp=0x4001797cd0 pc=0xc34ca2670b3c github.com/ollama/ollama/cmd.NewCLI.func3(0x4000507300?, {0xc34ca30d1249?, 0x4?, 0xc34ca30d124d?}) github.com/ollama/ollama/cmd/cmd.go:2259 +0x54 fp=0x4001797d40 sp=0x4001797d10 pc=0xc34ca2d692b4 github.com/spf13/cobra.(*Command).execute(0x4000131b08, {0x40004fe640, 0x5, 0x5}) github.com/spf13/cobra@v1.7.0/command.go:940 +0x648 fp=0x4001797e60 sp=0x4001797d40 pc=0xc34ca21dce08 github.com/spf13/cobra.(*Command).ExecuteC(0x40004da908) github.com/spf13/cobra@v1.7.0/command.go:1068 +0x320 fp=0x4001797f20 sp=0x4001797e60 pc=0xc34ca21dd550 github.com/spf13/cobra.(*Command).Execute(...) github.com/spf13/cobra@v1.7.0/command.go:992 github.com/spf13/cobra.(*Command).ExecuteContext(...) github.com/spf13/cobra@v1.7.0/command.go:985 main.main() github.com/ollama/ollama/main.go:12 +0x54 fp=0x4001797f40 sp=0x4001797f20 pc=0xc34ca2d6b144 runtime.main() runtime/proc.go:283 +0x284 fp=0x4001797fd0 sp=0x4001797f40 pc=0xc34ca2044194 runtime.goexit({}) runtime/asm_arm64.s:1223 +0x4 fp=0x4001797fd0 sp=0x4001797fd0 pc=0xc34ca2080004 goroutine 2 gp=0x4000002c40 m=nil [force gc (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xc8 fp=0x400009af90 sp=0x400009af70 pc=0xc34ca2077ff8 runtime.goparkunlock(...) runtime/proc.go:441 runtime.forcegchelper() runtime/proc.go:348 +0xb8 fp=0x400009afd0 sp=0x400009af90 pc=0xc34ca20444e8 runtime.goexit({}) runtime/asm_arm64.s:1223 +0x4 fp=0x400009afd0 sp=0x400009afd0 pc=0xc34ca2080004 created by runtime.init.7 in goroutine 1 runtime/proc.go:336 +0x24 goroutine 3 gp=0x4000003180 m=nil [GC sweep wait]: runtime.gopark(0x1?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xc8 fp=0x400009b760 sp=0x400009b740 pc=0xc34ca2077ff8 runtime.goparkunlock(...) runtime/proc.go:441 runtime.bgsweep(0x4000042100) runtime/mgcsweep.go:316 +0x108 fp=0x400009b7b0 sp=0x400009b760 pc=0xc34ca202ed18 runtime.gcenable.gowrap1() runtime/mgc.go:204 +0x28 fp=0x400009b7d0 sp=0x400009b7b0 pc=0xc34ca2022b48 runtime.goexit({}) runtime/asm_arm64.s:1223 +0x4 fp=0x400009b7d0 sp=0x400009b7d0 pc=0xc34ca2080004 created by runtime.gcenable in goroutine 1 runtime/mgc.go:204 +0x6c goroutine 4 gp=0x4000003340 m=nil [GC scavenge wait]: runtime.gopark(0x10000?, 0xc34ca32e55d8?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xc8 fp=0x400009bf60 sp=0x400009bf40 pc=0xc34ca2077ff8 runtime.goparkunlock(...) runtime/proc.go:441 runtime.(*scavengerState).park(0xc34ca409f560) runtime/mgcscavenge.go:425 +0x5c fp=0x400009bf90 sp=0x400009bf60 pc=0xc34ca202c7dc runtime.bgscavenge(0x4000042100) runtime/mgcscavenge.go:658 +0xac fp=0x400009bfb0 sp=0x400009bf90 pc=0xc34ca202cd5c runtime.gcenable.gowrap2() runtime/mgc.go:205 +0x28 fp=0x400009bfd0 sp=0x400009bfb0 pc=0xc34ca2022ae8 runtime.goexit({}) runtime/asm_arm64.s:1223 +0x4 fp=0x400009bfd0 sp=0x400009bfd0 pc=0xc34ca2080004 created by runtime.gcenable in goroutine 1 runtime/mgc.go:205 +0xac goroutine 5 gp=0x4000003c00 m=nil [finalizer wait]: runtime.gopark(0x18000001b8?, 0xebddc1778ef0?, 0x8?, 0x1?, 0x1c0?) runtime/proc.go:435 +0xc8 fp=0x400009a590 sp=0x400009a570 pc=0xc34ca2077ff8 runtime.runfinq() runtime/mfinal.go:196 +0x108 fp=0x400009a7d0 sp=0x400009a590 pc=0xc34ca2021b48 runtime.goexit({}) runtime/asm_arm64.s:1223 +0x4 fp=0x400009a7d0 sp=0x400009a7d0 pc=0xc34ca2080004 created by runtime.createfing in goroutine 1 runtime/mfinal.go:166 +0x80 goroutine 6 gp=0x4000210700 m=nil [chan receive]: runtime.gopark(0x40001b1b80?, 0x4000508000?, 0x48?, 0xc7?, 0xc34ca2145628?) runtime/proc.go:435 +0xc8 fp=0x400009c6f0 sp=0x400009c6d0 pc=0xc34ca2077ff8 runtime.chanrecv(0x400004c380, 0x0, 0x1) runtime/chan.go:664 +0x42c fp=0x400009c770 sp=0x400009c6f0 pc=0xc34ca2013bac runtime.chanrecv1(0x0?, 0x0?) runtime/chan.go:506 +0x14 fp=0x400009c7a0 sp=0x400009c770 pc=0xc34ca2013744 runtime.unique_runtime_registerUniqueMapCleanup.func2(...) runtime/mgc.go:1796 runtime.unique_runtime_registerUniqueMapCleanup.gowrap1() runtime/mgc.go:1799 +0x3c fp=0x400009c7d0 sp=0x400009c7a0 pc=0xc34ca2025d6c runtime.goexit({}) runtime/asm_arm64.s:1223 +0x4 fp=0x400009c7d0 sp=0x400009c7d0 pc=0xc34ca2080004 created by unique.runtime_registerUniqueMapCleanup in goroutine 1 runtime/mgc.go:1794 +0x78 goroutine 7 gp=0x4000210a80 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xc8 fp=0x400009cf10 sp=0x400009cef0 pc=0xc34ca2077ff8 runtime.gcBgMarkWorker(0x400004d7a0) runtime/mgc.go:1423 +0xdc fp=0x400009cfb0 sp=0x400009cf10 pc=0xc34ca2024fdc runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x28 fp=0x400009cfd0 sp=0x400009cfb0 pc=0xc34ca2024ec8 runtime.goexit({}) runtime/asm_arm64.s:1223 +0x4 fp=0x400009cfd0 sp=0x400009cfd0 pc=0xc34ca2080004 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x140 goroutine 18 gp=0x4000102380 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xc8 fp=0x4000096710 sp=0x40000966f0 pc=0xc34ca2077ff8 runtime.gcBgMarkWorker(0x400004d7a0) runtime/mgc.go:1423 +0xdc fp=0x40000967b0 sp=0x4000096710 pc=0xc34ca2024fdc runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x28 fp=0x40000967d0 sp=0x40000967b0 pc=0xc34ca2024ec8 runtime.goexit({}) runtime/asm_arm64.s:1223 +0x4 fp=0x40000967d0 sp=0x40000967d0 pc=0xc34ca2080004 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x140 goroutine 19 gp=0x4000102540 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xc8 fp=0x4000096f10 sp=0x4000096ef0 pc=0xc34ca2077ff8 runtime.gcBgMarkWorker(0x400004d7a0) runtime/mgc.go:1423 +0xdc fp=0x4000096fb0 sp=0x4000096f10 pc=0xc34ca2024fdc runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x28 fp=0x4000096fd0 sp=0x4000096fb0 pc=0xc34ca2024ec8 runtime.goexit({}) runtime/asm_arm64.s:1223 +0x4 fp=0x4000096fd0 sp=0x4000096fd0 pc=0xc34ca2080004 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x140 goroutine 8 gp=0x4000210c40 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xc8 fp=0x400009d710 sp=0x400009d6f0 pc=0xc34ca2077ff8 runtime.gcBgMarkWorker(0x400004d7a0) runtime/mgc.go:1423 +0xdc fp=0x400009d7b0 sp=0x400009d710 pc=0xc34ca2024fdc runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x28 fp=0x400009d7d0 sp=0x400009d7b0 pc=0xc34ca2024ec8 runtime.goexit({}) runtime/asm_arm64.s:1223 +0x4 fp=0x400009d7d0 sp=0x400009d7d0 pc=0xc34ca2080004 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x140 goroutine 9 gp=0x4000210e00 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xc8 fp=0x400009df10 sp=0x400009def0 pc=0xc34ca2077ff8 runtime.gcBgMarkWorker(0x400004d7a0) runtime/mgc.go:1423 +0xdc fp=0x400009dfb0 sp=0x400009df10 pc=0xc34ca2024fdc runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x28 fp=0x400009dfd0 sp=0x400009dfb0 pc=0xc34ca2024ec8 runtime.goexit({}) runtime/asm_arm64.s:1223 +0x4 fp=0x400009dfd0 sp=0x400009dfd0 pc=0xc34ca2080004 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x140 goroutine 10 gp=0x4000210fc0 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xc8 fp=0x40004aa710 sp=0x40004aa6f0 pc=0xc34ca2077ff8 runtime.gcBgMarkWorker(0x400004d7a0) runtime/mgc.go:1423 +0xdc fp=0x40004aa7b0 sp=0x40004aa710 pc=0xc34ca2024fdc runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x28 fp=0x40004aa7d0 sp=0x40004aa7b0 pc=0xc34ca2024ec8 runtime.goexit({}) runtime/asm_arm64.s:1223 +0x4 fp=0x40004aa7d0 sp=0x40004aa7d0 pc=0xc34ca2080004 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x140 goroutine 11 gp=0x4000211180 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xc8 fp=0x40004aaf10 sp=0x40004aaef0 pc=0xc34ca2077ff8 runtime.gcBgMarkWorker(0x400004d7a0) runtime/mgc.go:1423 +0xdc fp=0x40004aafb0 sp=0x40004aaf10 pc=0xc34ca2024fdc runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x28 fp=0x40004aafd0 sp=0x40004aafb0 pc=0xc34ca2024ec8 runtime.goexit({}) runtime/asm_arm64.s:1223 +0x4 fp=0x40004aafd0 sp=0x40004aafd0 pc=0xc34ca2080004 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x140 goroutine 12 gp=0x4000211340 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xc8 fp=0x40004ab710 sp=0x40004ab6f0 pc=0xc34ca2077ff8 runtime.gcBgMarkWorker(0x400004d7a0) runtime/mgc.go:1423 +0xdc fp=0x40004ab7b0 sp=0x40004ab710 pc=0xc34ca2024fdc runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x28 fp=0x40004ab7d0 sp=0x40004ab7b0 pc=0xc34ca2024ec8 runtime.goexit({}) runtime/asm_arm64.s:1223 +0x4 fp=0x40004ab7d0 sp=0x40004ab7d0 pc=0xc34ca2080004 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x140 goroutine 34 gp=0x4000504000 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xc8 fp=0x40004a6710 sp=0x40004a66f0 pc=0xc34ca2077ff8 runtime.gcBgMarkWorker(0x400004d7a0) runtime/mgc.go:1423 +0xdc fp=0x40004a67b0 sp=0x40004a6710 pc=0xc34ca2024fdc runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x28 fp=0x40004a67d0 sp=0x40004a67b0 pc=0xc34ca2024ec8 runtime.goexit({}) runtime/asm_arm64.s:1223 +0x4 fp=0x40004a67d0 sp=0x40004a67d0 pc=0xc34ca2080004 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x140 goroutine 35 gp=0x40005041c0 m=nil [GC worker (idle)]: runtime.gopark(0x2781f46708ce6?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xc8 fp=0x40004a6f10 sp=0x40004a6ef0 pc=0xc34ca2077ff8 runtime.gcBgMarkWorker(0x400004d7a0) runtime/mgc.go:1423 +0xdc fp=0x40004a6fb0 sp=0x40004a6f10 pc=0xc34ca2024fdc runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x28 fp=0x40004a6fd0 sp=0x40004a6fb0 pc=0xc34ca2024ec8 runtime.goexit({}) runtime/asm_arm64.s:1223 +0x4 fp=0x40004a6fd0 sp=0x40004a6fd0 pc=0xc34ca2080004 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x140 goroutine 20 gp=0x4000102700 m=nil [GC worker (idle)]: runtime.gopark(0x2781f46701716?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xc8 fp=0x4000097710 sp=0x40000976f0 pc=0xc34ca2077ff8 runtime.gcBgMarkWorker(0x400004d7a0) runtime/mgc.go:1423 +0xdc fp=0x40000977b0 sp=0x4000097710 pc=0xc34ca2024fdc runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x28 fp=0x40000977d0 sp=0x40000977b0 pc=0xc34ca2024ec8 runtime.goexit({}) runtime/asm_arm64.s:1223 +0x4 fp=0x40000977d0 sp=0x40000977d0 pc=0xc34ca2080004 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x140 goroutine 13 gp=0x4000211500 m=nil [GC worker (idle)]: runtime.gopark(0x2781f467bd775?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xc8 fp=0x40004abf10 sp=0x40004abef0 pc=0xc34ca2077ff8 runtime.gcBgMarkWorker(0x400004d7a0) runtime/mgc.go:1423 +0xdc fp=0x40004abfb0 sp=0x40004abf10 pc=0xc34ca2024fdc runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x28 fp=0x40004abfd0 sp=0x40004abfb0 pc=0xc34ca2024ec8 runtime.goexit({}) runtime/asm_arm64.s:1223 +0x4 fp=0x40004abfd0 sp=0x40004abfd0 pc=0xc34ca2080004 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x140 goroutine 14 gp=0x40002116c0 m=nil [GC worker (idle)]: runtime.gopark(0xc34ca417a740?, 0x1?, 0x10?, 0xca?, 0x0?) runtime/proc.go:435 +0xc8 fp=0x40004ac710 sp=0x40004ac6f0 pc=0xc34ca2077ff8 runtime.gcBgMarkWorker(0x400004d7a0) runtime/mgc.go:1423 +0xdc fp=0x40004ac7b0 sp=0x40004ac710 pc=0xc34ca2024fdc runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x28 fp=0x40004ac7d0 sp=0x40004ac7b0 pc=0xc34ca2024ec8 runtime.goexit({}) runtime/asm_arm64.s:1223 +0x4 fp=0x40004ac7d0 sp=0x40004ac7d0 pc=0xc34ca2080004 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x140 goroutine 21 gp=0x40001028c0 m=nil [GC worker (idle)]: runtime.gopark(0xc34ca417a740?, 0x1?, 0xff?, 0x27?, 0x0?) runtime/proc.go:435 +0xc8 fp=0x4000097f10 sp=0x4000097ef0 pc=0xc34ca2077ff8 runtime.gcBgMarkWorker(0x400004d7a0) runtime/mgc.go:1423 +0xdc fp=0x4000097fb0 sp=0x4000097f10 pc=0xc34ca2024fdc runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x28 fp=0x4000097fd0 sp=0x4000097fb0 pc=0xc34ca2024ec8 runtime.goexit({}) runtime/asm_arm64.s:1223 +0x4 fp=0x4000097fd0 sp=0x4000097fd0 pc=0xc34ca2080004 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x140 goroutine 36 gp=0x4000504380 m=nil [GC worker (idle)]: runtime.gopark(0x2781f46776b85?, 0x1?, 0xa0?, 0x8d?, 0x0?) runtime/proc.go:435 +0xc8 fp=0x40004a7710 sp=0x40004a76f0 pc=0xc34ca2077ff8 runtime.gcBgMarkWorker(0x400004d7a0) runtime/mgc.go:1423 +0xdc fp=0x40004a77b0 sp=0x40004a7710 pc=0xc34ca2024fdc runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x28 fp=0x40004a77d0 sp=0x40004a77b0 pc=0xc34ca2024ec8 runtime.goexit({}) runtime/asm_arm64.s:1223 +0x4 fp=0x40004a77d0 sp=0x40004a77d0 pc=0xc34ca2080004 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x140 goroutine 22 gp=0x4000102a80 m=nil [GC worker (idle)]: runtime.gopark(0xc34ca417a740?, 0x1?, 0x3f?, 0x1c?, 0x0?) runtime/proc.go:435 +0xc8 fp=0x4000098710 sp=0x40000986f0 pc=0xc34ca2077ff8 runtime.gcBgMarkWorker(0x400004d7a0) runtime/mgc.go:1423 +0xdc fp=0x40000987b0 sp=0x4000098710 pc=0xc34ca2024fdc runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x28 fp=0x40000987d0 sp=0x40000987b0 pc=0xc34ca2024ec8 runtime.goexit({}) runtime/asm_arm64.s:1223 +0x4 fp=0x40000987d0 sp=0x40000987d0 pc=0xc34ca2080004 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x140 goroutine 37 gp=0x4000504540 m=nil [GC worker (idle)]: runtime.gopark(0x2781f467033b6?, 0x1?, 0xa0?, 0x47?, 0x0?) runtime/proc.go:435 +0xc8 fp=0x40004a7f10 sp=0x40004a7ef0 pc=0xc34ca2077ff8 runtime.gcBgMarkWorker(0x400004d7a0) runtime/mgc.go:1423 +0xdc fp=0x40004a7fb0 sp=0x40004a7f10 pc=0xc34ca2024fdc runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x28 fp=0x40004a7fd0 sp=0x40004a7fb0 pc=0xc34ca2024ec8 runtime.goexit({}) runtime/asm_arm64.s:1223 +0x4 fp=0x40004a7fd0 sp=0x40004a7fd0 pc=0xc34ca2080004 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x140 goroutine 15 gp=0x4000211880 m=nil [GC worker (idle)]: runtime.gopark(0x2781f46713815?, 0x1?, 0xa0?, 0x12?, 0x0?) runtime/proc.go:435 +0xc8 fp=0x40004acf10 sp=0x40004acef0 pc=0xc34ca2077ff8 runtime.gcBgMarkWorker(0x400004d7a0) runtime/mgc.go:1423 +0xdc fp=0x40004acfb0 sp=0x40004acf10 pc=0xc34ca2024fdc runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x28 fp=0x40004acfd0 sp=0x40004acfb0 pc=0xc34ca2024ec8 runtime.goexit({}) runtime/asm_arm64.s:1223 +0x4 fp=0x40004acfd0 sp=0x40004acfd0 pc=0xc34ca2080004 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x140 goroutine 16 gp=0x4000211a40 m=nil [GC worker (idle)]: runtime.gopark(0x2781f4670c0d6?, 0x3?, 0xe0?, 0x79?, 0x0?) runtime/proc.go:435 +0xc8 fp=0x40004ad710 sp=0x40004ad6f0 pc=0xc34ca2077ff8 runtime.gcBgMarkWorker(0x400004d7a0) runtime/mgc.go:1423 +0xdc fp=0x40004ad7b0 sp=0x40004ad710 pc=0xc34ca2024fdc runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x28 fp=0x40004ad7d0 sp=0x40004ad7b0 pc=0xc34ca2024ec8 runtime.goexit({}) runtime/asm_arm64.s:1223 +0x4 fp=0x40004ad7d0 sp=0x40004ad7d0 pc=0xc34ca2080004 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x140 goroutine 50 gp=0x4000211c00 m=nil [GC worker (idle)]: runtime.gopark(0xc34ca417a740?, 0x1?, 0xdf?, 0x72?, 0x0?) runtime/proc.go:435 +0xc8 fp=0x40004adf10 sp=0x40004adef0 pc=0xc34ca2077ff8 runtime.gcBgMarkWorker(0x400004d7a0) runtime/mgc.go:1423 +0xdc fp=0x40004adfb0 sp=0x40004adf10 pc=0xc34ca2024fdc runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x28 fp=0x40004adfd0 sp=0x40004adfb0 pc=0xc34ca2024ec8 runtime.goexit({}) runtime/asm_arm64.s:1223 +0x4 fp=0x40004adfd0 sp=0x40004adfd0 pc=0xc34ca2080004 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x140 goroutine 51 gp=0x4000505180 m=nil [sync.WaitGroup.Wait]: runtime.gopark(0xc34ca40b7d20?, 0x0?, 0x80?, 0xc1?, 0x0?) runtime/proc.go:435 +0xc8 fp=0x40000ada90 sp=0x40000ada70 pc=0xc34ca2077ff8 runtime.goparkunlock(...) runtime/proc.go:441 runtime.semacquire1(0x4000255198, 0x0, 0x1, 0x0, 0x18) runtime/sema.go:188 +0x204 fp=0x40000adae0 sp=0x40000ada90 pc=0xc34ca2058634 sync.runtime_SemacquireWaitGroup(0x0?) runtime/sema.go:110 +0x2c fp=0x40000adb20 sp=0x40000adae0 pc=0xc34ca20799ac sync.(*WaitGroup).Wait(0x4000255190) sync/waitgroup.go:118 +0x70 fp=0x40000adb40 sp=0x40000adb20 pc=0xc34ca208b6e0 github.com/ollama/ollama/runner/ollamarunner.(*Server).run(0x40002550e0, {0xc34ca36bfeb0, 0x40004fe6e0}) github.com/ollama/ollama/runner/ollamarunner/runner.go:442 +0x38 fp=0x40000adfa0 sp=0x40000adb40 pc=0xc34ca25e1db8 github.com/ollama/ollama/runner/ollamarunner.Execute.gowrap1() github.com/ollama/ollama/runner/ollamarunner/runner.go:1442 +0x30 fp=0x40000adfd0 sp=0x40000adfa0 pc=0xc34ca25e9f00 runtime.goexit({}) runtime/asm_arm64.s:1223 +0x4 fp=0x40000adfd0 sp=0x40000adfd0 pc=0xc34ca2080004 created by github.com/ollama/ollama/runner/ollamarunner.Execute in goroutine 1 github.com/ollama/ollama/runner/ollamarunner/runner.go:1442 +0x448 goroutine 53 gp=0x4000505500 m=nil [IO wait]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xc8 fp=0x40004a8580 sp=0x40004a8560 pc=0xc34ca2077ff8 runtime.netpollblock(0x0?, 0xffffffff?, 0xff?) runtime/netpoll.go:575 +0x158 fp=0x40004a85c0 sp=0x40004a8580 pc=0xc34ca203cde8 internal/poll.runtime_pollWait(0xebdd7a4b7e18, 0x72) runtime/netpoll.go:351 +0xa0 fp=0x40004a85f0 sp=0x40004a85c0 pc=0xc34ca20771b0 internal/poll.(*pollDesc).wait(0x4000124800?, 0x4000129481?, 0x0) internal/poll/fd_poll_runtime.go:84 +0x28 fp=0x40004a8620 sp=0x40004a85f0 pc=0xc34ca20f99c8 internal/poll.(*pollDesc).waitRead(...) internal/poll/fd_poll_runtime.go:89 internal/poll.(*FD).Read(0x4000124800, {0x4000129481, 0x1, 0x1}) internal/poll/fd_unix.go:165 +0x1fc fp=0x40004a86c0 sp=0x40004a8620 pc=0xc34ca20fac7c net.(*netFD).Read(0x4000124800, {0x4000129481?, 0x0?, 0x0?}) net/fd_posix.go:55 +0x28 fp=0x40004a8710 sp=0x40004a86c0 pc=0xc34ca216b6d8 net.(*conn).Read(0x400009ea20, {0x4000129481?, 0x0?, 0x0?}) net/net.go:194 +0x34 fp=0x40004a8760 sp=0x40004a8710 pc=0xc34ca2178e14 net/http.(*connReader).backgroundRead(0x4000129470) net/http/server.go:690 +0x40 fp=0x40004a87b0 sp=0x40004a8760 pc=0xc34ca232ad00 net/http.(*connReader).startBackgroundRead.gowrap2() net/http/server.go:686 +0x28 fp=0x40004a87d0 sp=0x40004a87b0 pc=0xc34ca232abe8 runtime.goexit({}) runtime/asm_arm64.s:1223 +0x4 fp=0x40004a87d0 sp=0x40004a87d0 pc=0xc34ca2080004 created by net/http.(*connReader).startBackgroundRead in goroutine 52 net/http/server.go:686 +0xc4 r0 0x0 r1 0x60609 r2 0x6 r3 0xebdaddcf0140 r4 0xebddc1dfcb50 r5 0x1 r6 0x20 r7 0xebdaddcee8d0 r8 0x83 r9 0x0 r10 0x53 r11 0x101010101010101 r12 0xc r13 0x0 r14 0x0 r15 0x38 r16 0x1 r17 0xebddc17d7d0c r18 0x3b0108e r19 0x60609 r20 0xebdaddcf0140 r21 0x6 r22 0xebdd35e667a8 r23 0xebdaddcef690 r24 0xebdaf4f9da70 r25 0xc34cdfd17d68 r26 0xebdaf4f9da70 r27 0xebdade501760 r28 0x53 r29 0xebdaddcee860 lr 0xebddc18375f4 sp 0xebdaddcee850 pc 0xebddc1837608 fault 0x0 time=2026-03-13T16:34:06.149-06:00 level=ERROR source=server.go:1205 msg="do load request" error="Post \"http://127.0.0.1:38549/load\": EOF" time=2026-03-13T16:34:06.150-06:00 level=ERROR source=server.go:1205 msg="do load request" error="Post \"http://127.0.0.1:38549/load\": dial tcp 127.0.0.1:38549: connect: connection refused" time=2026-03-13T16:34:06.150-06:00 level=INFO source=sched.go:516 msg="Load failed" model=/home/neelima/.ollama/models/blobs/sha256-e7b273f9636059a689e3ddcab3716e4f65abe0143ac978e46673ad0e52d09efb error="model failed to load, this may be due to resource limitations or an internal error, check ollama server logs for details" time=2026-03-13T16:34:06.247-06:00 level=ERROR source=server.go:303 msg="llama runner terminated" error="exit status 2" [GIN] 2026/03/13 - 16:34:06 | 500 | 3.199960591s | 127.0.0.1 | POST "/api/generate" ``` ### OS Linux ### GPU Nvidia ### CPU Other ### Ollama version 0.17.6
GiteaMirror added the bug label 2026-04-29 10:14:57 -05:00
Author
Owner

@rick-github commented on GitHub (Mar 14, 2026):

#14124

<!-- gh-comment-id:4059120772 --> @rick-github commented on GitHub (Mar 14, 2026): #14124
Author
Owner

@ivaigult commented on GitHub (Apr 19, 2026):

This seem to be a duplicate of #13887. This should have been fixed by ggml-org/llama.cpp#18433.

<!-- gh-comment-id:4276479068 --> @ivaigult commented on GitHub (Apr 19, 2026): This seem to be a duplicate of #13887. This should have been fixed by ggml-org/llama.cpp#18433.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#56086