[GH-ISSUE #13887] Crash (SIGABRT) with Ministral-3 14B + Parallel 8: GGML_ASSERT(ggml_nbytes(src0) <= INT_MAX) failed in cpy.cu #9087

Open
opened 2026-04-12 21:56:11 -05:00 by GiteaMirror · 4 comments
Owner

Originally created by @eXt73 on GitHub (Jan 24, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/13887

What is the issue?

Describe the bug

I am encountering a hard crash (SIGABRT) when running the ministral-3 model (14B, 128k context) with high parallelism on a high-end multi-GPU setup.

The issue appears to be an integer overflow in the CUDA backend (ggml-cuda). The crash occurs immediately when the runner initializes the load request for 8 parallel slots.

It works perfectly fine with OLLAMA_NUM_PARALLEL=7.
It crashes consistently with OLLAMA_NUM_PARALLEL=8.

I have plenty of VRAM available (288 GB total, only ~69 GB used), so this is not an OOM issue.

Error Log

The specific error causing the abort is:
//ml/backend/ggml/ggml/src/ggml-cuda/cpy.cu:396: GGML_ASSERT(ggml_nbytes(src0) <= INT_MAX) failed

Relevant log snippet:

time=2026-01-24T19:47:00.984+01:00 level=INFO source=runner.go:1278 msg=load request="{Operation:fit LoraPath:[] Parallel:8 BatchSize:512 FlashAttention:Enabled KvSize:1048576 KvCacheType:q4_0 NumThreads:144 GPULayers:41[ID:GPU-5b045c01-81df-0a1c-140e-2b695d5501e3 Layers:41(0..40)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
...
ggml_cuda_init: found 2 CUDA devices:
  Device 0: NVIDIA GH200 144G HBM3e, compute capability 9.0, VMM: yes, ID: GPU-5b045c01-81df-0a1c-140e-2b695d5501e3
  Device 1: NVIDIA GH200 144G HBM3e, compute capability 9.0, VMM: yes, ID: GPU-2d948b27-f2d2-0ccc-cac3-c8c5aa8bb9d2
...
//ml/backend/ggml/ggml/src/ggml-cuda/cpy.cu:396: GGML_ASSERT(ggml_nbytes(src0) <= INT_MAX) failed
SIGABRT: abort
PC=0xeef264bc7608 m=8 sigcode=18446744073709551610
signal arrived during cgo execution

### Relevant log output

```shell
Jan 24 19:47:00 ai-xxx ollama[775706]: time=2026-01-24T19:47:00.408+01:00 level=INFO source=server.go:429 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 46587"
Jan 24 19:47:00 ai-xxx ollama[775706]: time=2026-01-24T19:47:00.961+01:00 level=INFO source=server.go:245 msg="enabling flash attention"
Jan 24 19:47:00 ai-xxx ollama[775706]: time=2026-01-24T19:47:00.962+01:00 level=INFO source=server.go:429 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --model /usr/share/ollama/.ollama/models/blobs/sha256-81ad7d0c2192abdf65e10db5ccb22bbf0bd71f44c6c536cb70f6b9816fad6d52 --port 43969"
Jan 24 19:47:00 ai-xxx ollama[775706]: time=2026-01-24T19:47:00.962+01:00 level=INFO source=sched.go:452 msg="system memory" total="1228.0 GiB" free="1190.5 GiB" free_swap="8.0 GiB"
Jan 24 19:47:00 ai-xxx ollama[775706]: time=2026-01-24T19:47:00.962+01:00 level=INFO source=sched.go:459 msg="gpu memory" id=GPU-5b045c01-81df-0a1c-140e-2b695d5501e3 library=CUDA available="130.3 GiB" free="130.8 GiB" minimum="457.0 MiB" overhead="0 B"
Jan 24 19:47:00 ai-xxx ollama[775706]: time=2026-01-24T19:47:00.962+01:00 level=INFO source=sched.go:459 msg="gpu memory" id=GPU-2d948b27-f2d2-0ccc-cac3-c8c5aa8bb9d2 library=CUDA available="141.8 GiB" free="142.2 GiB" minimum="457.0 MiB" overhead="0 B"
Jan 24 19:47:00 ai-xxx ollama[775706]: time=2026-01-24T19:47:00.962+01:00 level=INFO source=server.go:755 msg="loading model" "model layers"=41 requested=-1
Jan 24 19:47:00 ai-xxx ollama[775706]: time=2026-01-24T19:47:00.975+01:00 level=INFO source=runner.go:1405 msg="starting ollama engine"
Jan 24 19:47:00 ai-xxx ollama[775706]: time=2026-01-24T19:47:00.976+01:00 level=INFO source=runner.go:1440 msg="Server listening on 127.0.0.1:43969"
Jan 24 19:47:00 ai-xxx ollama[775706]: time=2026-01-24T19:47:00.984+01:00 level=INFO source=runner.go:1278 msg=load request="{Operation:fit LoraPath:[] Parallel:8 BatchSize:512 FlashAttention:Enabled KvSize:1048576 KvCacheType:q4_0 NumThreads:144 GPULayers:41[ID:GPU-5b045c01-81df-0a1c-140e-2b695d5501e3 Layers:41(0..40)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
Jan 24 19:47:01 ai-xxx ollama[775706]: time=2026-01-24T19:47:01.013+01:00 level=INFO source=ggml.go:136 msg="" architecture=mistral3 file_type=Q8_0 name="" description="" num_tensors=585 num_key_values=51
Jan 24 19:47:01 ai-xxx ollama[775706]: load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu.so
Jan 24 19:47:01 ai-xxx ollama[775706]: ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
Jan 24 19:47:01 ai-xxx ollama[775706]: ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
Jan 24 19:47:01 ai-xxx ollama[775706]: ggml_cuda_init: found 2 CUDA devices:
Jan 24 19:47:01 ai-xxx ollama[775706]:   Device 0: NVIDIA GH200 144G HBM3e, compute capability 9.0, VMM: yes, ID: GPU-5b045c01-81df-0a1c-140e-2b695d5501e3
Jan 24 19:47:01 ai-xxx ollama[775706]:   Device 1: NVIDIA GH200 144G HBM3e, compute capability 9.0, VMM: yes, ID: GPU-2d948b27-f2d2-0ccc-cac3-c8c5aa8bb9d2
Jan 24 19:47:01 ai-xxx ollama[775706]: load_backend: loaded CUDA backend from /usr/local/lib/ollama/cuda_v13/libggml-cuda.so
Jan 24 19:47:01 ai-xxx ollama[775706]: time=2026-01-24T19:47:01.146+01:00 level=INFO source=ggml.go:104 msg=system CPU.0.NEON=1 CPU.0.ARM_FMA=1 CPU.0.LLAMAFILE=1 CPU.1.NEON=1 CPU.1.ARM_FMA=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=750,800,860,870,890,900,1000,1030,1100,1200,1210 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 CUDA.1.ARCHS=750,800,860,870,890,900,1000,1030,1100,1200,1210 CUDA.1.USE_GRAPHS=1 CUDA.1.PEER_MAX_BATCH_SIZE=128 compiler=cgo(clang)
Jan 24 19:47:03 ai-xxx ollama[775706]: //ml/backend/ggml/ggml/src/ggml-cuda/cpy.cu:396: GGML_ASSERT(ggml_nbytes(src0) <= INT_MAX) failed
Jan 24 19:47:03 ai-xxx ollama[775706]: [New LWP 779534]

OS

Linux

GPU

Nvidia

CPU

Other

Ollama version

0.15.0

Describe the bug

I am encountering a hard crash (SIGABRT) when running the ministral-3 model (14B, 128k context) with high parallelism on a high-end multi-GPU setup.

The issue appears to be an integer overflow in the CUDA backend (ggml-cuda). The crash occurs immediately when the runner initializes the load request for 8 parallel slots.

It works perfectly fine with OLLAMA_NUM_PARALLEL=7.
It crashes consistently with OLLAMA_NUM_PARALLEL=8.

I have plenty of VRAM available (288 GB total, only ~69 GB used), so this is not an OOM issue.

OS and Hardware
Hardware: 2x NVIDIA GH200 144GB (Total 288GB VRAM)
Driver: CUDA 12.x / Compute Capability 9.0
Model: Ministral-3 14B (128k context window)

Configuration
OLLAMA_NUM_PARALLEL=8 (Crash)
OLLAMA_FLASH_ATTENTION=1
KvCacheType: q4_0

Steps to reproduce
Use a machine with large VRAM (capable of holding >2GB tensors).

Set OLLAMA_NUM_PARALLEL=8 (or higher).
Run ministral-3 (or any model with 128k context).
The runner crashes immediately upon loading the model/allocating KV cache.

Analysis
It seems that with Parallel:8 and KvSize:1048576 (1M tokens total context), one of the internal buffers passed to cpy.cu exceeds INT_MAX (approx 2.14 GB), triggering the assertion failure in llama.cpp backend.

Error Log

The specific error causing the abort is:
//ml/backend/ggml/ggml/src/ggml-cuda/cpy.cu:396: GGML_ASSERT(ggml_nbytes(src0) <= INT_MAX) failed

Relevant log snippet:

time=2026-01-24T19:47:00.984+01:00 level=INFO source=runner.go:1278 msg=load request="{Operation:fit LoraPath:[] Parallel:8 BatchSize:512 FlashAttention:Enabled KvSize:1048576 KvCacheType:q4_0 NumThreads:144 GPULayers:41[ID:GPU-5b045c01-81df-0a1c-140e-2b695d5501e3 Layers:41(0..40)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
...
ggml_cuda_init: found 2 CUDA devices:
  Device 0: NVIDIA GH200 144G HBM3e, compute capability 9.0, VMM: yes, ID: GPU-5b045c01-81df-0a1c-140e-2b695d5501e3
  Device 1: NVIDIA GH200 144G HBM3e, compute capability 9.0, VMM: yes, ID: GPU-2d948b27-f2d2-0ccc-cac3-c8c5aa8bb9d2
...
//ml/backend/ggml/ggml/src/ggml-cuda/cpy.cu:396: GGML_ASSERT(ggml_nbytes(src0) <= INT_MAX) failed
SIGABRT: abort
PC=0xeef264bc7608 m=8 sigcode=18446744073709551610
signal arrived during cgo execution
Originally created by @eXt73 on GitHub (Jan 24, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/13887 ### What is the issue? ### Describe the bug I am encountering a hard crash (SIGABRT) when running the `ministral-3` model (14B, 128k context) with high parallelism on a high-end multi-GPU setup. The issue appears to be an integer overflow in the CUDA backend (`ggml-cuda`). The crash occurs immediately when the runner initializes the load request for 8 parallel slots. It works perfectly fine with `OLLAMA_NUM_PARALLEL=7`. It crashes consistently with `OLLAMA_NUM_PARALLEL=8`. I have plenty of VRAM available (288 GB total, only ~69 GB used), so this is not an OOM issue. ### Error Log The specific error causing the abort is: `//ml/backend/ggml/ggml/src/ggml-cuda/cpy.cu:396: GGML_ASSERT(ggml_nbytes(src0) <= INT_MAX) failed` Relevant log snippet: ```text time=2026-01-24T19:47:00.984+01:00 level=INFO source=runner.go:1278 msg=load request="{Operation:fit LoraPath:[] Parallel:8 BatchSize:512 FlashAttention:Enabled KvSize:1048576 KvCacheType:q4_0 NumThreads:144 GPULayers:41[ID:GPU-5b045c01-81df-0a1c-140e-2b695d5501e3 Layers:41(0..40)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" ... ggml_cuda_init: found 2 CUDA devices: Device 0: NVIDIA GH200 144G HBM3e, compute capability 9.0, VMM: yes, ID: GPU-5b045c01-81df-0a1c-140e-2b695d5501e3 Device 1: NVIDIA GH200 144G HBM3e, compute capability 9.0, VMM: yes, ID: GPU-2d948b27-f2d2-0ccc-cac3-c8c5aa8bb9d2 ... //ml/backend/ggml/ggml/src/ggml-cuda/cpy.cu:396: GGML_ASSERT(ggml_nbytes(src0) <= INT_MAX) failed SIGABRT: abort PC=0xeef264bc7608 m=8 sigcode=18446744073709551610 signal arrived during cgo execution ### Relevant log output ```shell Jan 24 19:47:00 ai-xxx ollama[775706]: time=2026-01-24T19:47:00.408+01:00 level=INFO source=server.go:429 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 46587" Jan 24 19:47:00 ai-xxx ollama[775706]: time=2026-01-24T19:47:00.961+01:00 level=INFO source=server.go:245 msg="enabling flash attention" Jan 24 19:47:00 ai-xxx ollama[775706]: time=2026-01-24T19:47:00.962+01:00 level=INFO source=server.go:429 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --model /usr/share/ollama/.ollama/models/blobs/sha256-81ad7d0c2192abdf65e10db5ccb22bbf0bd71f44c6c536cb70f6b9816fad6d52 --port 43969" Jan 24 19:47:00 ai-xxx ollama[775706]: time=2026-01-24T19:47:00.962+01:00 level=INFO source=sched.go:452 msg="system memory" total="1228.0 GiB" free="1190.5 GiB" free_swap="8.0 GiB" Jan 24 19:47:00 ai-xxx ollama[775706]: time=2026-01-24T19:47:00.962+01:00 level=INFO source=sched.go:459 msg="gpu memory" id=GPU-5b045c01-81df-0a1c-140e-2b695d5501e3 library=CUDA available="130.3 GiB" free="130.8 GiB" minimum="457.0 MiB" overhead="0 B" Jan 24 19:47:00 ai-xxx ollama[775706]: time=2026-01-24T19:47:00.962+01:00 level=INFO source=sched.go:459 msg="gpu memory" id=GPU-2d948b27-f2d2-0ccc-cac3-c8c5aa8bb9d2 library=CUDA available="141.8 GiB" free="142.2 GiB" minimum="457.0 MiB" overhead="0 B" Jan 24 19:47:00 ai-xxx ollama[775706]: time=2026-01-24T19:47:00.962+01:00 level=INFO source=server.go:755 msg="loading model" "model layers"=41 requested=-1 Jan 24 19:47:00 ai-xxx ollama[775706]: time=2026-01-24T19:47:00.975+01:00 level=INFO source=runner.go:1405 msg="starting ollama engine" Jan 24 19:47:00 ai-xxx ollama[775706]: time=2026-01-24T19:47:00.976+01:00 level=INFO source=runner.go:1440 msg="Server listening on 127.0.0.1:43969" Jan 24 19:47:00 ai-xxx ollama[775706]: time=2026-01-24T19:47:00.984+01:00 level=INFO source=runner.go:1278 msg=load request="{Operation:fit LoraPath:[] Parallel:8 BatchSize:512 FlashAttention:Enabled KvSize:1048576 KvCacheType:q4_0 NumThreads:144 GPULayers:41[ID:GPU-5b045c01-81df-0a1c-140e-2b695d5501e3 Layers:41(0..40)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" Jan 24 19:47:01 ai-xxx ollama[775706]: time=2026-01-24T19:47:01.013+01:00 level=INFO source=ggml.go:136 msg="" architecture=mistral3 file_type=Q8_0 name="" description="" num_tensors=585 num_key_values=51 Jan 24 19:47:01 ai-xxx ollama[775706]: load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu.so Jan 24 19:47:01 ai-xxx ollama[775706]: ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no Jan 24 19:47:01 ai-xxx ollama[775706]: ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no Jan 24 19:47:01 ai-xxx ollama[775706]: ggml_cuda_init: found 2 CUDA devices: Jan 24 19:47:01 ai-xxx ollama[775706]: Device 0: NVIDIA GH200 144G HBM3e, compute capability 9.0, VMM: yes, ID: GPU-5b045c01-81df-0a1c-140e-2b695d5501e3 Jan 24 19:47:01 ai-xxx ollama[775706]: Device 1: NVIDIA GH200 144G HBM3e, compute capability 9.0, VMM: yes, ID: GPU-2d948b27-f2d2-0ccc-cac3-c8c5aa8bb9d2 Jan 24 19:47:01 ai-xxx ollama[775706]: load_backend: loaded CUDA backend from /usr/local/lib/ollama/cuda_v13/libggml-cuda.so Jan 24 19:47:01 ai-xxx ollama[775706]: time=2026-01-24T19:47:01.146+01:00 level=INFO source=ggml.go:104 msg=system CPU.0.NEON=1 CPU.0.ARM_FMA=1 CPU.0.LLAMAFILE=1 CPU.1.NEON=1 CPU.1.ARM_FMA=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=750,800,860,870,890,900,1000,1030,1100,1200,1210 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 CUDA.1.ARCHS=750,800,860,870,890,900,1000,1030,1100,1200,1210 CUDA.1.USE_GRAPHS=1 CUDA.1.PEER_MAX_BATCH_SIZE=128 compiler=cgo(clang) Jan 24 19:47:03 ai-xxx ollama[775706]: //ml/backend/ggml/ggml/src/ggml-cuda/cpy.cu:396: GGML_ASSERT(ggml_nbytes(src0) <= INT_MAX) failed Jan 24 19:47:03 ai-xxx ollama[775706]: [New LWP 779534] ``` ### OS Linux ### GPU Nvidia ### CPU Other ### Ollama version 0.15.0 ### Describe the bug I am encountering a hard crash (SIGABRT) when running the `ministral-3` model (14B, 128k context) with high parallelism on a high-end multi-GPU setup. The issue appears to be an integer overflow in the CUDA backend (`ggml-cuda`). The crash occurs immediately when the runner initializes the load request for 8 parallel slots. It works perfectly fine with `OLLAMA_NUM_PARALLEL=7`. It crashes consistently with `OLLAMA_NUM_PARALLEL=8`. I have plenty of VRAM available (288 GB total, only ~69 GB used), so this is not an OOM issue. OS and Hardware Hardware: 2x NVIDIA GH200 144GB (Total 288GB VRAM) Driver: CUDA 12.x / Compute Capability 9.0 Model: Ministral-3 14B (128k context window) Configuration OLLAMA_NUM_PARALLEL=8 (Crash) OLLAMA_FLASH_ATTENTION=1 KvCacheType: q4_0 Steps to reproduce Use a machine with large VRAM (capable of holding >2GB tensors). Set OLLAMA_NUM_PARALLEL=8 (or higher). Run ministral-3 (or any model with 128k context). The runner crashes immediately upon loading the model/allocating KV cache. Analysis It seems that with Parallel:8 and KvSize:1048576 (1M tokens total context), one of the internal buffers passed to cpy.cu exceeds INT_MAX (approx 2.14 GB), triggering the assertion failure in llama.cpp backend. ### Error Log The specific error causing the abort is: `//ml/backend/ggml/ggml/src/ggml-cuda/cpy.cu:396: GGML_ASSERT(ggml_nbytes(src0) <= INT_MAX) failed` Relevant log snippet: ```text time=2026-01-24T19:47:00.984+01:00 level=INFO source=runner.go:1278 msg=load request="{Operation:fit LoraPath:[] Parallel:8 BatchSize:512 FlashAttention:Enabled KvSize:1048576 KvCacheType:q4_0 NumThreads:144 GPULayers:41[ID:GPU-5b045c01-81df-0a1c-140e-2b695d5501e3 Layers:41(0..40)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" ... ggml_cuda_init: found 2 CUDA devices: Device 0: NVIDIA GH200 144G HBM3e, compute capability 9.0, VMM: yes, ID: GPU-5b045c01-81df-0a1c-140e-2b695d5501e3 Device 1: NVIDIA GH200 144G HBM3e, compute capability 9.0, VMM: yes, ID: GPU-2d948b27-f2d2-0ccc-cac3-c8c5aa8bb9d2 ... //ml/backend/ggml/ggml/src/ggml-cuda/cpy.cu:396: GGML_ASSERT(ggml_nbytes(src0) <= INT_MAX) failed SIGABRT: abort PC=0xeef264bc7608 m=8 sigcode=18446744073709551610 signal arrived during cgo execution
GiteaMirror added the bug label 2026-04-12 21:56:11 -05:00
Author
Owner

@balki commented on GitHub (Jan 26, 2026):

I got the same assert with nemotron-3-nano:latest with 1M context. It runs fine with 512K context. There is enough VRAM (96G)

❯ ollama ps
NAME                                ID              SIZE     PROCESSOR    CONTEXT    UNTIL
my-nemotron-3-nano-latest:latest    17ed20e35bbb    45 GB    100% GPU     524288     4 minutes from now
/startdir/src/ollama/ml/backend/ggml/ggml/src/ggml-cuda/cpy.cu:396: GGML_ASSERT(ggml_nbytes(src0) <= INT_MAX) failed
<!-- gh-comment-id:3799175595 --> @balki commented on GitHub (Jan 26, 2026): I got the same assert with `nemotron-3-nano:latest` with 1M context. It runs fine with 512K context. There is enough VRAM (96G) ``` ❯ ollama ps NAME ID SIZE PROCESSOR CONTEXT UNTIL my-nemotron-3-nano-latest:latest 17ed20e35bbb 45 GB 100% GPU 524288 4 minutes from now ``` ``` /startdir/src/ollama/ml/backend/ggml/ggml/src/ggml-cuda/cpy.cu:396: GGML_ASSERT(ggml_nbytes(src0) <= INT_MAX) failed ```
Author
Owner

@zolgear commented on GitHub (Feb 6, 2026):

I encountered the same error when loading the glm-ocr model on DGX Spark (128GB).

While investigating Issue #14073, I confirmed a SIGABRT error.

I tested OLLAMA_NUM_PARALLEL with values of 1, 4, and 7, but the system crashed in all cases.

Ollama v0.15.5

Logs
Started ollama.service - Ollama Service.
time=2026-02-06T20:45:39.207+09:00 level=INFO source=routes.go:1636 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:0 OLLAMA_DEBUG:INFO OLLAMA_FLASH_ATTENTION:true OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_KEEP_ALIVE:1h0m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:3 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/usr/share/ollama/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:7 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false OLLAMA_VULKAN:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
time=2026-02-06T20:45:39.208+09:00 level=INFO source=images.go:473 msg="total blobs: 8"
time=2026-02-06T20:45:39.208+09:00 level=INFO source=images.go:480 msg="total unused blobs removed: 0"
time=2026-02-06T20:45:39.208+09:00 level=INFO source=routes.go:1689 msg="Listening on [::]:11434 (version 0.15.5)"
time=2026-02-06T20:45:39.208+09:00 level=INFO source=runner.go:67 msg="discovering available GPUs..."
time=2026-02-06T20:45:39.209+09:00 level=INFO source=server.go:430 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 41705"
time=2026-02-06T20:45:39.607+09:00 level=INFO source=server.go:430 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 42563"
time=2026-02-06T20:45:40.019+09:00 level=INFO source=server.go:430 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 34141"
time=2026-02-06T20:45:40.020+09:00 level=INFO source=server.go:430 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 38617"
time=2026-02-06T20:45:40.338+09:00 level=INFO source=types.go:42 msg="inference compute" id=GPU-6e49fe13-bd59-6a45-70d8-508ed1ad24ec filter_id="" library=CUDA compute=12.1 name=CUDA0 description="NVIDIA GB10" libdirs=ollama,cuda_v13 driver=13.0 pci_id=000f:01:00.0 type=iGPU total="119.7 GiB" available="115.1 GiB"
time=2026-02-06T20:45:40.338+09:00 level=INFO source=routes.go:1739 msg="vram-based default context" total_vram="119.7 GiB" default_num_ctx=262144
[GIN] 2026/02/06 - 20:45:41 | 200 |      34.512µs |       127.0.0.1 | HEAD     "/"
[GIN] 2026/02/06 - 20:45:41 | 200 |   62.190236ms |       127.0.0.1 | POST     "/api/show"
time=2026-02-06T20:45:41.687+09:00 level=INFO source=server.go:430 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 40799"
time=2026-02-06T20:45:42.052+09:00 level=WARN source=server.go:168 msg="requested context size too large for model" num_ctx=262144 n_ctx_train=131072
time=2026-02-06T20:45:42.052+09:00 level=INFO source=server.go:246 msg="enabling flash attention"
time=2026-02-06T20:45:42.052+09:00 level=INFO source=server.go:430 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --model /usr/share/ollama/.ollama/models/blobs/sha256-65493e1f85b9ea4ba3ed793515fde13cbdbea7d74ad2c662b566b146eab0081e --port 40175"
time=2026-02-06T20:45:42.052+09:00 level=INFO source=sched.go:463 msg="system memory" total="119.7 GiB" free="115.0 GiB" free_swap="16.0 GiB"
time=2026-02-06T20:45:42.052+09:00 level=INFO source=sched.go:470 msg="gpu memory" id=GPU-6e49fe13-bd59-6a45-70d8-508ed1ad24ec library=CUDA available="114.6 GiB" free="115.0 GiB" minimum="457.0 MiB" overhead="0 B"
time=2026-02-06T20:45:42.052+09:00 level=INFO source=server.go:756 msg="loading model" "model layers"=17 requested=-1
time=2026-02-06T20:45:42.060+09:00 level=INFO source=runner.go:1410 msg="starting ollama engine"
time=2026-02-06T20:45:42.060+09:00 level=INFO source=runner.go:1445 msg="Server listening on 127.0.0.1:40175"
time=2026-02-06T20:45:42.064+09:00 level=INFO source=runner.go:1283 msg=load request="{Operation:fit LoraPath:[] Parallel:7 BatchSize:512 FlashAttention:Enabled KvSize:917504 KvCacheType: NumThreads:20 GPULayers:17[ID:GPU-6e49fe13-bd59-6a45-70d8-508ed1ad24ec Layers:17(0..16)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2026-02-06T20:45:42.076+09:00 level=INFO source=ggml.go:136 msg="" architecture=glmocr file_type=F16 name="" description="" num_tensors=527 num_key_values=47
load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu.so
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GB10, compute capability 12.1, VMM: yes, ID: GPU-6e49fe13-bd59-6a45-70d8-508ed1ad24ec
load_backend: loaded CUDA backend from /usr/local/lib/ollama/cuda_v13/libggml-cuda.so
time=2026-02-06T20:45:42.409+09:00 level=INFO source=ggml.go:104 msg=system CPU.0.NEON=1 CPU.0.ARM_FMA=1 CPU.0.LLAMAFILE=1 CPU.1.NEON=1 CPU.1.ARM_FMA=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=750,800,860,870,890,900,1000,1030,1100,1200,1210 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(clang)
//ml/backend/ggml/ggml/src/ggml-cuda/cpy.cu:396: GGML_ASSERT(ggml_nbytes(src0) <= INT_MAX) failed
...
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/aarch64-linux-gnu/libthread_db.so.1".
0x0000c4bb2008f120 in ?? ()
#0  0x0000c4bb2008f120 in ?? ()
#1  0x0000000000000004 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
[Inferior 1 (process 19273) detached]
SIGABRT: abort
PC=0xf0af09157608 m=8 sigcode=18446744073709551610
signal arrived during cgo execution
goroutine 13 gp=0x4000103a40 m=8 mp=0x4000339808 [syscall]:
runtime.cgocall(0xc4bb20d55594, 0x4000a030a8)
        runtime/cgocall.go:167 +0x44 fp=0x4000a03060 sp=0x4000a03020 pc=0xc4bb200f4914
github.com/ollama/ollama/ml/backend/ggml._Cfunc_ggml_backend_sched_reserve(0xf0ae950bab50, 0xf0ac32d49be0)
        _cgo_gotypes.go:1012 +0x34 fp=0x4000a030a0 sp=0x4000a03060 pc=0xc4bb2051e6c4
github.com/ollama/ollama/ml/backend/ggml.(*Context).Reserve.func2(...)
        github.com/ollama/ollama/ml/backend/ggml/ggml.go:850
github.com/ollama/ollama/ml/backend/ggml.(*Context).Reserve(0x40005b6080)
        github.com/ollama/ollama/ml/backend/ggml/ggml.go:850 +0xe0 fp=0x4000a03330 sp=0x4000a030a0 pc=0xc4bb20528e70
github.com/ollama/ollama/runner/ollamarunner.(*Server).reserveWorstCaseGraph(0x400059a3c0, 0x1)
        github.com/ollama/ollama/runner/ollamarunner/runner.go:1168 +0x834 fp=0x4000a03660 sp=0x4000a03330 pc=0xc4bb205f1644
github.com/ollama/ollama/runner/ollamarunner.(*Server).allocModel(0x400059a3c0, {0xfffff92e5b36?, 0x0?}, {0x0, 0x14, {0x40000b9580, 0x1, 0x1}, 0x1}, {0x0?, ...}, ...)
        github.com/ollama/ollama/runner/ollamarunner/runner.go:1231 +0x2e4 fp=0x4000a03710 sp=0x4000a03660 pc=0xc4bb205f1d14
github.com/ollama/ollama/runner/ollamarunner.(*Server).load(0x400059a3c0, {0xc4bb215eaa00, 0x4000252000}, 0x4000196000)
        github.com/ollama/ollama/runner/ollamarunner/runner.go:1310 +0x460 fp=0x4000a03aa0 sp=0x4000a03710 pc=0xc4bb205f2620
github.com/ollama/ollama/runner/ollamarunner.(*Server).load-fm({0xc4bb215eaa00?, 0x4000252000?}, 0x4000165b28?)
        <autogenerated>:1 +0x40 fp=0x4000a03ad0 sp=0x4000a03aa0 pc=0xc4bb205f44d0
net/http.HandlerFunc.ServeHTTP(0x40005b58c0?, {0xc4bb215eaa00?, 0x4000252000?}, 0x4000165b10?)
        net/http/server.go:2294 +0x38 fp=0x4000a03b00 sp=0x4000a03ad0 pc=0xc4bb203b14c8
net/http.(*ServeMux).ServeHTTP(0x10?, {0xc4bb215eaa00, 0x4000252000}, 0x4000196000)
        net/http/server.go:2822 +0x1b4 fp=0x4000a03b50 sp=0x4000a03b00 pc=0xc4bb203b3054
net/http.serverHandler.ServeHTTP({0xc4bb215e6e10?}, {0xc4bb215eaa00?, 0x4000252000?}, 0x1?)
        net/http/server.go:3301 +0xbc fp=0x4000a03b80 sp=0x4000a03b50 pc=0xc4bb203ced3c
net/http.(*conn).serve(0x4000128480, {0xc4bb215ecec8, 0x4000124bd0})
        net/http/server.go:2102 +0x52c fp=0x4000a03fa0 sp=0x4000a03b80 pc=0xc4bb203afc6c
net/http.(*Server).Serve.gowrap3()
        net/http/server.go:3454 +0x30 fp=0x4000a03fd0 sp=0x4000a03fa0 pc=0xc4bb203b4e30
runtime.goexit({})
        runtime/asm_arm64.s:1223 +0x4 fp=0x4000a03fd0 sp=0x4000a03fd0 pc=0xc4bb200ffd14
created by net/http.(*Server).Serve in goroutine 1
        net/http/server.go:3454 +0x3d8
goroutine 1 gp=0x40000021c0 m=nil [IO wait]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:435 +0xc8 fp=0x4000a05720 sp=0x4000a05700 pc=0xc4bb200f7e28
runtime.netpollblock(0x7000000000?, 0x6?, 0x0?)
        runtime/netpoll.go:575 +0x158 fp=0x4000a05760 sp=0x4000a05720 pc=0xc4bb200bcde8
internal/poll.runtime_pollWait(0xf0af08e17f30, 0x72)
        runtime/netpoll.go:351 +0xa0 fp=0x4000a05790 sp=0x4000a05760 pc=0xc4bb200f6fe0
internal/poll.(*pollDesc).wait(0x40001f5e00?, 0xc4bb20180058?, 0x0)
        internal/poll/fd_poll_runtime.go:84 +0x28 fp=0x4000a057c0 sp=0x4000a05790 pc=0xc4bb201795f8
internal/poll.(*pollDesc).waitRead(...)
        internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Accept(0x40001f5e00)
        internal/poll/fd_unix.go:620 +0x24c fp=0x4000a05870 sp=0x4000a057c0 pc=0xc4bb2017decc
net.(*netFD).accept(0x40001f5e00)
        net/fd_unix.go:172 +0x28 fp=0x4000a05930 sp=0x4000a05870 pc=0xc4bb201ec9d8
net.(*TCPListener).accept(0x40005b7340)
        net/tcpsock_posix.go:159 +0x24 fp=0x4000a05980 sp=0x4000a05930 pc=0xc4bb20201e74
net.(*TCPListener).Accept(0x40005b7340)
        net/tcpsock.go:380 +0x2c fp=0x4000a059c0 sp=0x4000a05980 pc=0xc4bb20200e0c
net/http.(*onceCloseListener).Accept(0x4000128480?)
        <autogenerated>:1 +0x30 fp=0x4000a059e0 sp=0x4000a059c0 pc=0xc4bb203db360
net/http.(*Server).Serve(0x4000117400, {0xc4bb215ea820, 0x40005b7340})
        net/http/server.go:3424 +0x290 fp=0x4000a05b10 sp=0x4000a059e0 pc=0xc4bb203b4aa0
github.com/ollama/ollama/runner/ollamarunner.Execute({0x40000320a0, 0x4, 0x4})
        github.com/ollama/ollama/runner/ollamarunner/runner.go:1446 +0x7fc fp=0x4000a05ce0 sp=0x4000a05b10 pc=0xc4bb205f3efc
github.com/ollama/ollama/runner.Execute({0x4000032080?, 0x0?, 0x0?})
        github.com/ollama/ollama/runner/runner.go:28 +0x190 fp=0x4000a05d10 sp=0x4000a05ce0 pc=0xc4bb206485b0
github.com/ollama/ollama/cmd.NewCLI.func3(0x4000117200?, {0xc4bb210610ff?, 0x4?, 0xc4bb21061103?})
        github.com/ollama/ollama/cmd/cmd.go:1979 +0x54 fp=0x4000a05d40 sp=0x4000a05d10 pc=0xc4bb20d05554
github.com/spf13/cobra.(*Command).execute(0x400012d508, {0x40005ad040, 0x5, 0x5})
        github.com/spf13/cobra@v1.7.0/command.go:940 +0x648 fp=0x4000a05e60 sp=0x4000a05d40 pc=0xc4bb2025c6d8
github.com/spf13/cobra.(*Command).ExecuteC(0x40005bb208)
        github.com/spf13/cobra@v1.7.0/command.go:1068 +0x320 fp=0x4000a05f20 sp=0x4000a05e60 pc=0xc4bb2025ce20
github.com/spf13/cobra.(*Command).Execute(...)
        github.com/spf13/cobra@v1.7.0/command.go:992
github.com/spf13/cobra.(*Command).ExecuteContext(...)
        github.com/spf13/cobra@v1.7.0/command.go:985
main.main()
        github.com/ollama/ollama/main.go:12 +0x54 fp=0x4000a05f40 sp=0x4000a05f20 pc=0xc4bb20d060a4
runtime.main()
        runtime/proc.go:283 +0x284 fp=0x4000a05fd0 sp=0x4000a05f40 pc=0xc4bb200c4194
runtime.goexit({})
        runtime/asm_arm64.s:1223 +0x4 fp=0x4000a05fd0 sp=0x4000a05fd0 pc=0xc4bb200ffd14
goroutine 2 gp=0x4000002c40 m=nil [force gc (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:435 +0xc8 fp=0x400008af90 sp=0x400008af70 pc=0xc4bb200f7e28
runtime.goparkunlock(...)
        runtime/proc.go:441
runtime.forcegchelper()
        runtime/proc.go:348 +0xb8 fp=0x400008afd0 sp=0x400008af90 pc=0xc4bb200c44e8
runtime.goexit({})
        runtime/asm_arm64.s:1223 +0x4 fp=0x400008afd0 sp=0x400008afd0 pc=0xc4bb200ffd14
created by runtime.init.7 in goroutine 1
        runtime/proc.go:336 +0x24
goroutine 3 gp=0x4000003180 m=nil [GC sweep wait]:
runtime.gopark(0x1?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:435 +0xc8 fp=0x400008b760 sp=0x400008b740 pc=0xc4bb200f7e28
runtime.goparkunlock(...)
        runtime/proc.go:441
runtime.bgsweep(0x40000b6000)
        runtime/mgcsweep.go:316 +0x108 fp=0x400008b7b0 sp=0x400008b760 pc=0xc4bb200aed18
runtime.gcenable.gowrap1()
        runtime/mgc.go:204 +0x28 fp=0x400008b7d0 sp=0x400008b7b0 pc=0xc4bb200a2b48
runtime.goexit({})
        runtime/asm_arm64.s:1223 +0x4 fp=0x400008b7d0 sp=0x400008b7d0 pc=0xc4bb200ffd14
created by runtime.gcenable in goroutine 1
        runtime/mgc.go:204 +0x6c
goroutine 4 gp=0x4000003340 m=nil [GC scavenge wait]:
runtime.gopark(0x71a226?, 0x3d576e20?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:435 +0xc8 fp=0x400008bf60 sp=0x400008bf40 pc=0xc4bb200f7e28
runtime.goparkunlock(...)
        runtime/proc.go:441
runtime.(*scavengerState).park(0xc4bb21ef2540)
        runtime/mgcscavenge.go:425 +0x5c fp=0x400008bf90 sp=0x400008bf60 pc=0xc4bb200ac7dc
runtime.bgscavenge(0x40000b6000)
        runtime/mgcscavenge.go:658 +0xac fp=0x400008bfb0 sp=0x400008bf90 pc=0xc4bb200acd5c
runtime.gcenable.gowrap2()
        runtime/mgc.go:205 +0x28 fp=0x400008bfd0 sp=0x400008bfb0 pc=0xc4bb200a2ae8
runtime.goexit({})
        runtime/asm_arm64.s:1223 +0x4 fp=0x400008bfd0 sp=0x400008bfd0 pc=0xc4bb200ffd14
created by runtime.gcenable in goroutine 1
        runtime/mgc.go:205 +0xac
goroutine 5 gp=0x4000003c00 m=nil [finalizer wait]:
runtime.gopark(0x18000001b8?, 0x1000000000000?, 0xf8?, 0xa5?, 0xc4bb203dddac?)
        runtime/proc.go:435 +0xc8 fp=0x400008a590 sp=0x400008a570 pc=0xc4bb200f7e28
runtime.runfinq()
        runtime/mfinal.go:196 +0x108 fp=0x400008a7d0 sp=0x400008a590 pc=0xc4bb200a1b48
runtime.goexit({})
        runtime/asm_arm64.s:1223 +0x4 fp=0x400008a7d0 sp=0x400008a7d0 pc=0xc4bb200ffd14
created by runtime.createfing in goroutine 1
        runtime/mfinal.go:166 +0x80
goroutine 6 gp=0x4000208700 m=nil [chan receive]:
runtime.gopark(0x40001a1e00?, 0x40005f6018?, 0x48?, 0xc7?, 0xc4bb201c4f98?)
        runtime/proc.go:435 +0xc8 fp=0x400008c6f0 sp=0x400008c6d0 pc=0xc4bb200f7e28
runtime.chanrecv(0x40000c4310, 0x0, 0x1)
        runtime/chan.go:664 +0x42c fp=0x400008c770 sp=0x400008c6f0 pc=0xc4bb20093bac
runtime.chanrecv1(0x0?, 0x0?)
        runtime/chan.go:506 +0x14 fp=0x400008c7a0 sp=0x400008c770 pc=0xc4bb20093744
runtime.unique_runtime_registerUniqueMapCleanup.func2(...)
        runtime/mgc.go:1796
runtime.unique_runtime_registerUniqueMapCleanup.gowrap1()
        runtime/mgc.go:1799 +0x3c fp=0x400008c7d0 sp=0x400008c7a0 pc=0xc4bb200a5d6c
runtime.goexit({})
        runtime/asm_arm64.s:1223 +0x4 fp=0x400008c7d0 sp=0x400008c7d0 pc=0xc4bb200ffd14
created by unique.runtime_registerUniqueMapCleanup in goroutine 1
        runtime/mgc.go:1794 +0x78
goroutine 7 gp=0x4000208c40 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:435 +0xc8 fp=0x400008cf10 sp=0x400008cef0 pc=0xc4bb200f7e28
runtime.gcBgMarkWorker(0x40000c5730)
        runtime/mgc.go:1423 +0xdc fp=0x400008cfb0 sp=0x400008cf10 pc=0xc4bb200a4fdc
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x28 fp=0x400008cfd0 sp=0x400008cfb0 pc=0xc4bb200a4ec8
runtime.goexit({})
        runtime/asm_arm64.s:1223 +0x4 fp=0x400008cfd0 sp=0x400008cfd0 pc=0xc4bb200ffd14
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x140
goroutine 8 gp=0x4000208e00 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:435 +0xc8 fp=0x400008d710 sp=0x400008d6f0 pc=0xc4bb200f7e28
runtime.gcBgMarkWorker(0x40000c5730)
        runtime/mgc.go:1423 +0xdc fp=0x400008d7b0 sp=0x400008d710 pc=0xc4bb200a4fdc
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x28 fp=0x400008d7d0 sp=0x400008d7b0 pc=0xc4bb200a4ec8
runtime.goexit({})
        runtime/asm_arm64.s:1223 +0x4 fp=0x400008d7d0 sp=0x400008d7d0 pc=0xc4bb200ffd14
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x140
goroutine 18 gp=0x4000102380 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:435 +0xc8 fp=0x4000086710 sp=0x40000866f0 pc=0xc4bb200f7e28
runtime.gcBgMarkWorker(0x40000c5730)
        runtime/mgc.go:1423 +0xdc fp=0x40000867b0 sp=0x4000086710 pc=0xc4bb200a4fdc
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x28 fp=0x40000867d0 sp=0x40000867b0 pc=0xc4bb200a4ec8
runtime.goexit({})
        runtime/asm_arm64.s:1223 +0x4 fp=0x40000867d0 sp=0x40000867d0 pc=0xc4bb200ffd14
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x140
goroutine 19 gp=0x4000102540 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:435 +0xc8 fp=0x4000086f10 sp=0x4000086ef0 pc=0xc4bb200f7e28
runtime.gcBgMarkWorker(0x40000c5730)
        runtime/mgc.go:1423 +0xdc fp=0x4000086fb0 sp=0x4000086f10 pc=0xc4bb200a4fdc
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x28 fp=0x4000086fd0 sp=0x4000086fb0 pc=0xc4bb200a4ec8
runtime.goexit({})
        runtime/asm_arm64.s:1223 +0x4 fp=0x4000086fd0 sp=0x4000086fd0 pc=0xc4bb200ffd14
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x140
goroutine 34 gp=0x4000504000 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:435 +0xc8 fp=0x400050a710 sp=0x400050a6f0 pc=0xc4bb200f7e28
runtime.gcBgMarkWorker(0x40000c5730)
        runtime/mgc.go:1423 +0xdc fp=0x400050a7b0 sp=0x400050a710 pc=0xc4bb200a4fdc
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x28 fp=0x400050a7d0 sp=0x400050a7b0 pc=0xc4bb200a4ec8
runtime.goexit({})
        runtime/asm_arm64.s:1223 +0x4 fp=0x400050a7d0 sp=0x400050a7d0 pc=0xc4bb200ffd14
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x140
goroutine 35 gp=0x40005041c0 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:435 +0xc8 fp=0x400050af10 sp=0x400050aef0 pc=0xc4bb200f7e28
runtime.gcBgMarkWorker(0x40000c5730)
        runtime/mgc.go:1423 +0xdc fp=0x400050afb0 sp=0x400050af10 pc=0xc4bb200a4fdc
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x28 fp=0x400050afd0 sp=0x400050afb0 pc=0xc4bb200a4ec8
runtime.goexit({})
        runtime/asm_arm64.s:1223 +0x4 fp=0x400050afd0 sp=0x400050afd0 pc=0xc4bb200ffd14
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x140
goroutine 36 gp=0x4000504380 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:435 +0xc8 fp=0x400050b710 sp=0x400050b6f0 pc=0xc4bb200f7e28
runtime.gcBgMarkWorker(0x40000c5730)
        runtime/mgc.go:1423 +0xdc fp=0x400050b7b0 sp=0x400050b710 pc=0xc4bb200a4fdc
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x28 fp=0x400050b7d0 sp=0x400050b7b0 pc=0xc4bb200a4ec8
runtime.goexit({})
        runtime/asm_arm64.s:1223 +0x4 fp=0x400050b7d0 sp=0x400050b7d0 pc=0xc4bb200ffd14
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x140
goroutine 37 gp=0x4000504540 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:435 +0xc8 fp=0x400050bf10 sp=0x400050bef0 pc=0xc4bb200f7e28
runtime.gcBgMarkWorker(0x40000c5730)
        runtime/mgc.go:1423 +0xdc fp=0x400050bfb0 sp=0x400050bf10 pc=0xc4bb200a4fdc
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x28 fp=0x400050bfd0 sp=0x400050bfb0 pc=0xc4bb200a4ec8
runtime.goexit({})
        runtime/asm_arm64.s:1223 +0x4 fp=0x400050bfd0 sp=0x400050bfd0 pc=0xc4bb200ffd14
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x140
goroutine 9 gp=0x4000208fc0 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:435 +0xc8 fp=0x400008df10 sp=0x400008def0 pc=0xc4bb200f7e28
runtime.gcBgMarkWorker(0x40000c5730)
        runtime/mgc.go:1423 +0xdc fp=0x400008dfb0 sp=0x400008df10 pc=0xc4bb200a4fdc
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x28 fp=0x400008dfd0 sp=0x400008dfb0 pc=0xc4bb200a4ec8
runtime.goexit({})
        runtime/asm_arm64.s:1223 +0x4 fp=0x400008dfd0 sp=0x400008dfd0 pc=0xc4bb200ffd14
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x140
goroutine 20 gp=0x4000102700 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:435 +0xc8 fp=0x4000087710 sp=0x40000876f0 pc=0xc4bb200f7e28
runtime.gcBgMarkWorker(0x40000c5730)
        runtime/mgc.go:1423 +0xdc fp=0x40000877b0 sp=0x4000087710 pc=0xc4bb200a4fdc
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x28 fp=0x40000877d0 sp=0x40000877b0 pc=0xc4bb200a4ec8
runtime.goexit({})
        runtime/asm_arm64.s:1223 +0x4 fp=0x40000877d0 sp=0x40000877d0 pc=0xc4bb200ffd14
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x140
goroutine 21 gp=0x40001028c0 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:435 +0xc8 fp=0x4000087f10 sp=0x4000087ef0 pc=0xc4bb200f7e28
runtime.gcBgMarkWorker(0x40000c5730)
        runtime/mgc.go:1423 +0xdc fp=0x4000087fb0 sp=0x4000087f10 pc=0xc4bb200a4fdc
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x28 fp=0x4000087fd0 sp=0x4000087fb0 pc=0xc4bb200a4ec8
runtime.goexit({})
        runtime/asm_arm64.s:1223 +0x4 fp=0x4000087fd0 sp=0x4000087fd0 pc=0xc4bb200ffd14
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x140
goroutine 22 gp=0x4000102a80 m=nil [GC worker (idle)]:
runtime.gopark(0xc4bb21fc80c0?, 0x1?, 0x10?, 0x3d?, 0x0?)
        runtime/proc.go:435 +0xc8 fp=0x4000088710 sp=0x40000886f0 pc=0xc4bb200f7e28
runtime.gcBgMarkWorker(0x40000c5730)
        runtime/mgc.go:1423 +0xdc fp=0x40000887b0 sp=0x4000088710 pc=0xc4bb200a4fdc
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x28 fp=0x40000887d0 sp=0x40000887b0 pc=0xc4bb200a4ec8
runtime.goexit({})
        runtime/asm_arm64.s:1223 +0x4 fp=0x40000887d0 sp=0x40000887d0 pc=0xc4bb200ffd14
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x140
goroutine 23 gp=0x4000102c40 m=nil [GC worker (idle)]:
runtime.gopark(0x6cd9536b0767?, 0x1?, 0x70?, 0x3c?, 0x0?)
        runtime/proc.go:435 +0xc8 fp=0x4000088f10 sp=0x4000088ef0 pc=0xc4bb200f7e28
runtime.gcBgMarkWorker(0x40000c5730)
        runtime/mgc.go:1423 +0xdc fp=0x4000088fb0 sp=0x4000088f10 pc=0xc4bb200a4fdc
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x28 fp=0x4000088fd0 sp=0x4000088fb0 pc=0xc4bb200a4ec8
runtime.goexit({})
        runtime/asm_arm64.s:1223 +0x4 fp=0x4000088fd0 sp=0x4000088fd0 pc=0xc4bb200ffd14
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x140
goroutine 24 gp=0x4000102e00 m=nil [GC worker (idle)]:
runtime.gopark(0x6cd9536b3607?, 0x1?, 0xd1?, 0x72?, 0x0?)
        runtime/proc.go:435 +0xc8 fp=0x4000157f10 sp=0x4000157ef0 pc=0xc4bb200f7e28
runtime.gcBgMarkWorker(0x40000c5730)
        runtime/mgc.go:1423 +0xdc fp=0x4000157fb0 sp=0x4000157f10 pc=0xc4bb200a4fdc
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x28 fp=0x4000157fd0 sp=0x4000157fb0 pc=0xc4bb200a4ec8
runtime.goexit({})
        runtime/asm_arm64.s:1223 +0x4 fp=0x4000157fd0 sp=0x4000157fd0 pc=0xc4bb200ffd14
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x140
goroutine 25 gp=0x4000102fc0 m=nil [GC worker (idle)]:
runtime.gopark(0x6cd9536b2fd7?, 0x1?, 0x17?, 0x74?, 0x0?)
        runtime/proc.go:435 +0xc8 fp=0x4000089f10 sp=0x4000089ef0 pc=0xc4bb200f7e28
runtime.gcBgMarkWorker(0x40000c5730)
        runtime/mgc.go:1423 +0xdc fp=0x4000089fb0 sp=0x4000089f10 pc=0xc4bb200a4fdc
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x28 fp=0x4000089fd0 sp=0x4000089fb0 pc=0xc4bb200a4ec8
runtime.goexit({})
        runtime/asm_arm64.s:1223 +0x4 fp=0x4000089fd0 sp=0x4000089fd0 pc=0xc4bb200ffd14
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x140
goroutine 38 gp=0x4000504700 m=nil [GC worker (idle)]:
runtime.gopark(0x6cd9536b3997?, 0x1?, 0x54?, 0x24?, 0x0?)
        runtime/proc.go:435 +0xc8 fp=0x400050c710 sp=0x400050c6f0 pc=0xc4bb200f7e28
runtime.gcBgMarkWorker(0x40000c5730)
        runtime/mgc.go:1423 +0xdc fp=0x400050c7b0 sp=0x400050c710 pc=0xc4bb200a4fdc
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x28 fp=0x400050c7d0 sp=0x400050c7b0 pc=0xc4bb200a4ec8
runtime.goexit({})
        runtime/asm_arm64.s:1223 +0x4 fp=0x400050c7d0 sp=0x400050c7d0 pc=0xc4bb200ffd14
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x140
goroutine 10 gp=0x4000209180 m=nil [GC worker (idle)]:
runtime.gopark(0x6cd9536badc7?, 0x1?, 0xa1?, 0x33?, 0x0?)
        runtime/proc.go:435 +0xc8 fp=0x4000506710 sp=0x40005066f0 pc=0xc4bb200f7e28
runtime.gcBgMarkWorker(0x40000c5730)
        runtime/mgc.go:1423 +0xdc fp=0x40005067b0 sp=0x4000506710 pc=0xc4bb200a4fdc
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x28 fp=0x40005067d0 sp=0x40005067b0 pc=0xc4bb200a4ec8
runtime.goexit({})
        runtime/asm_arm64.s:1223 +0x4 fp=0x40005067d0 sp=0x40005067d0 pc=0xc4bb200ffd14
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x140
goroutine 39 gp=0x40005048c0 m=nil [GC worker (idle)]:
runtime.gopark(0x6cd9536bd5c7?, 0x1?, 0x22?, 0x45?, 0x0?)
        runtime/proc.go:435 +0xc8 fp=0x400050cf10 sp=0x400050cef0 pc=0xc4bb200f7e28
runtime.gcBgMarkWorker(0x40000c5730)
        runtime/mgc.go:1423 +0xdc fp=0x400050cfb0 sp=0x400050cf10 pc=0xc4bb200a4fdc
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x28 fp=0x400050cfd0 sp=0x400050cfb0 pc=0xc4bb200a4ec8
runtime.goexit({})
        runtime/asm_arm64.s:1223 +0x4 fp=0x400050cfd0 sp=0x400050cfd0 pc=0xc4bb200ffd14
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x140
goroutine 26 gp=0x4000103180 m=nil [GC worker (idle)]:
runtime.gopark(0x6cd9537bd43b?, 0x1?, 0x92?, 0x47?, 0x0?)
        runtime/proc.go:435 +0xc8 fp=0x400011c710 sp=0x400011c6f0 pc=0xc4bb200f7e28
runtime.gcBgMarkWorker(0x40000c5730)
        runtime/mgc.go:1423 +0xdc fp=0x400011c7b0 sp=0x400011c710 pc=0xc4bb200a4fdc
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x28 fp=0x400011c7d0 sp=0x400011c7b0 pc=0xc4bb200a4ec8
runtime.goexit({})
        runtime/asm_arm64.s:1223 +0x4 fp=0x400011c7d0 sp=0x400011c7d0 pc=0xc4bb200ffd14
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x140
goroutine 11 gp=0x4000209340 m=nil [GC worker (idle)]:
runtime.gopark(0xc4bb21fc80c0?, 0x1?, 0x25?, 0xba?, 0x0?)
        runtime/proc.go:435 +0xc8 fp=0x4000506f10 sp=0x4000506ef0 pc=0xc4bb200f7e28
runtime.gcBgMarkWorker(0x40000c5730)
        runtime/mgc.go:1423 +0xdc fp=0x4000506fb0 sp=0x4000506f10 pc=0xc4bb200a4fdc
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x28 fp=0x4000506fd0 sp=0x4000506fb0 pc=0xc4bb200a4ec8
runtime.goexit({})
        runtime/asm_arm64.s:1223 +0x4 fp=0x4000506fd0 sp=0x4000506fd0 pc=0xc4bb200ffd14
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x140
goroutine 12 gp=0x4000103880 m=nil [sync.WaitGroup.Wait]:
runtime.gopark(0xc4bb21f045c0?, 0x0?, 0xc0?, 0x20?, 0x0?)
        runtime/proc.go:435 +0xc8 fp=0x400009da90 sp=0x400009da70 pc=0xc4bb200f7e28
runtime.goparkunlock(...)
        runtime/proc.go:441
runtime.semacquire1(0x400059a478, 0x0, 0x1, 0x0, 0x18)
        runtime/sema.go:188 +0x204 fp=0x400009dae0 sp=0x400009da90 pc=0xc4bb200d8634
sync.runtime_SemacquireWaitGroup(0x0?)
        runtime/sema.go:110 +0x2c fp=0x400009db20 sp=0x400009dae0 pc=0xc4bb200f97dc
sync.(*WaitGroup).Wait(0x400059a470)
        sync/waitgroup.go:118 +0x70 fp=0x400009db40 sp=0x400009db20 pc=0xc4bb2010b3f0
github.com/ollama/ollama/runner/ollamarunner.(*Server).run(0x400059a3c0, {0xc4bb215ecf00, 0x40005ad0e0})
        github.com/ollama/ollama/runner/ollamarunner/runner.go:441 +0x38 fp=0x400009dfa0 sp=0x400009db40 pc=0xc4bb205ec2c8
github.com/ollama/ollama/runner/ollamarunner.Execute.gowrap1()
        github.com/ollama/ollama/runner/ollamarunner/runner.go:1423 +0x30 fp=0x400009dfd0 sp=0x400009dfa0 pc=0xc4bb205f4120
runtime.goexit({})
        runtime/asm_arm64.s:1223 +0x4 fp=0x400009dfd0 sp=0x400009dfd0 pc=0xc4bb200ffd14
created by github.com/ollama/ollama/runner/ollamarunner.Execute in goroutine 1
        github.com/ollama/ollama/runner/ollamarunner/runner.go:1423 +0x448
goroutine 40 gp=0x4000103c00 m=nil [IO wait]:
runtime.gopark(0x0?, 0x0?, 0xc8?, 0xdd?, 0xc4bb200981f0?)
        runtime/proc.go:435 +0xc8 fp=0x400050dd80 sp=0x400050dd60 pc=0xc4bb200f7e28
runtime.netpollblock(0x0?, 0xffffffff?, 0xff?)
        runtime/netpoll.go:575 +0x158 fp=0x400050ddc0 sp=0x400050dd80 pc=0xc4bb200bcde8
internal/poll.runtime_pollWait(0xf0af08e17e18, 0x72)
        runtime/netpoll.go:351 +0xa0 fp=0x400050ddf0 sp=0x400050ddc0 pc=0xc4bb200f6fe0
internal/poll.(*pollDesc).wait(0x40001f5e80?, 0x4000124cd1?, 0x0)
        internal/poll/fd_poll_runtime.go:84 +0x28 fp=0x400050de20 sp=0x400050ddf0 pc=0xc4bb201795f8
internal/poll.(*pollDesc).waitRead(...)
        internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Read(0x40001f5e80, {0x4000124cd1, 0x1, 0x1})
        internal/poll/fd_unix.go:165 +0x1fc fp=0x400050dec0 sp=0x400050de20 pc=0xc4bb2017a8ac
net.(*netFD).Read(0x40001f5e80, {0x4000124cd1?, 0x400050df58?, 0xc4bb203aa6e4?})
        net/fd_posix.go:55 +0x28 fp=0x400050df10 sp=0x400050dec0 pc=0xc4bb201eafa8
net.(*conn).Read(0x400008e940, {0x4000124cd1?, 0x0?, 0x0?})
        net/net.go:194 +0x34 fp=0x400050df60 sp=0x400050df10 pc=0xc4bb201f86e4
net/http.(*connReader).backgroundRead(0x4000124cc0)
        net/http/server.go:690 +0x40 fp=0x400050dfb0 sp=0x400050df60 pc=0xc4bb203aa5e0
net/http.(*connReader).startBackgroundRead.gowrap2()
        net/http/server.go:686 +0x28 fp=0x400050dfd0 sp=0x400050dfb0 pc=0xc4bb203aa4c8
runtime.goexit({})
        runtime/asm_arm64.s:1223 +0x4 fp=0x400050dfd0 sp=0x400050dfd0 pc=0xc4bb200ffd14
created by net/http.(*connReader).startBackgroundRead in goroutine 13
        net/http/server.go:686 +0xc4
r0      0x0
r1      0x4b50
r2      0x6
r3      0xf0aebafdf140
r4      0xf0af09718b50
r5      0x1
r6      0x20
r7      0xf0aebafdd8d0
r8      0x83
r9      0x0
r10     0x70
r11     0x101010101010101
r12     0xf0aebafdd960
r13     0x0
r14     0x1
r15     0x1
r16     0x1
r17     0xf0af090f7d0c
r18     0xf8122e0c
r19     0x4b50
r20     0xf0aebafdf140
r21     0x6
r22     0xf0ae71e867a8
r23     0xf0aebafde690
r24     0xf0ac32d49790
r25     0xf0ae950bc2e8
r26     0xf0ac32d49790
r27     0xf0ac17ac15f0
r28     0x18
r29     0xf0aebafdd860
lr      0xf0af091575f4
sp      0xf0aebafdd850
pc      0xf0af09157608
fault   0x0
time=2026-02-06T20:45:52.864+09:00 level=ERROR source=server.go:1204 msg="do load request" error="Post \"http://127.0.0.1:40175/load\": EOF"
time=2026-02-06T20:45:52.865+09:00 level=ERROR source=server.go:1204 msg="do load request" error="Post \"http://127.0.0.1:40175/load\": dial tcp 127.0.0.1:40175: connect: connection refused"
time=2026-02-06T20:45:52.865+09:00 level=INFO source=sched.go:490 msg="Load failed" model=/usr/share/ollama/.ollama/models/blobs/sha256-65493e1f85b9ea4ba3ed793515fde13cbdbea7d74ad2c662b566b146eab0081e error="model failed to load, this may be due to resource limitations or an internal error, check ollama server logs for details"
time=2026-02-06T20:45:52.950+09:00 level=ERROR source=server.go:303 msg="llama runner terminated" error="exit status 2"
[GIN] 2026/02/06 - 20:45:52 | 500 | 11.332028252s |       127.0.0.1 | POST     "/api/generate"
<!-- gh-comment-id:3860040165 --> @zolgear commented on GitHub (Feb 6, 2026): I encountered the same error when loading the **glm-ocr** model on DGX Spark (128GB). While investigating Issue #14073, I confirmed a SIGABRT error. I tested OLLAMA_NUM_PARALLEL with values of 1, 4, and 7, but the system crashed in all cases. Ollama v0.15.5 <details><summary>Logs</summary> ``` Started ollama.service - Ollama Service. time=2026-02-06T20:45:39.207+09:00 level=INFO source=routes.go:1636 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:0 OLLAMA_DEBUG:INFO OLLAMA_FLASH_ATTENTION:true OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_KEEP_ALIVE:1h0m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:3 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/usr/share/ollama/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:7 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false OLLAMA_VULKAN:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]" time=2026-02-06T20:45:39.208+09:00 level=INFO source=images.go:473 msg="total blobs: 8" time=2026-02-06T20:45:39.208+09:00 level=INFO source=images.go:480 msg="total unused blobs removed: 0" time=2026-02-06T20:45:39.208+09:00 level=INFO source=routes.go:1689 msg="Listening on [::]:11434 (version 0.15.5)" time=2026-02-06T20:45:39.208+09:00 level=INFO source=runner.go:67 msg="discovering available GPUs..." time=2026-02-06T20:45:39.209+09:00 level=INFO source=server.go:430 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 41705" time=2026-02-06T20:45:39.607+09:00 level=INFO source=server.go:430 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 42563" time=2026-02-06T20:45:40.019+09:00 level=INFO source=server.go:430 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 34141" time=2026-02-06T20:45:40.020+09:00 level=INFO source=server.go:430 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 38617" time=2026-02-06T20:45:40.338+09:00 level=INFO source=types.go:42 msg="inference compute" id=GPU-6e49fe13-bd59-6a45-70d8-508ed1ad24ec filter_id="" library=CUDA compute=12.1 name=CUDA0 description="NVIDIA GB10" libdirs=ollama,cuda_v13 driver=13.0 pci_id=000f:01:00.0 type=iGPU total="119.7 GiB" available="115.1 GiB" time=2026-02-06T20:45:40.338+09:00 level=INFO source=routes.go:1739 msg="vram-based default context" total_vram="119.7 GiB" default_num_ctx=262144 [GIN] 2026/02/06 - 20:45:41 | 200 | 34.512µs | 127.0.0.1 | HEAD "/" [GIN] 2026/02/06 - 20:45:41 | 200 | 62.190236ms | 127.0.0.1 | POST "/api/show" time=2026-02-06T20:45:41.687+09:00 level=INFO source=server.go:430 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 40799" time=2026-02-06T20:45:42.052+09:00 level=WARN source=server.go:168 msg="requested context size too large for model" num_ctx=262144 n_ctx_train=131072 time=2026-02-06T20:45:42.052+09:00 level=INFO source=server.go:246 msg="enabling flash attention" time=2026-02-06T20:45:42.052+09:00 level=INFO source=server.go:430 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --model /usr/share/ollama/.ollama/models/blobs/sha256-65493e1f85b9ea4ba3ed793515fde13cbdbea7d74ad2c662b566b146eab0081e --port 40175" time=2026-02-06T20:45:42.052+09:00 level=INFO source=sched.go:463 msg="system memory" total="119.7 GiB" free="115.0 GiB" free_swap="16.0 GiB" time=2026-02-06T20:45:42.052+09:00 level=INFO source=sched.go:470 msg="gpu memory" id=GPU-6e49fe13-bd59-6a45-70d8-508ed1ad24ec library=CUDA available="114.6 GiB" free="115.0 GiB" minimum="457.0 MiB" overhead="0 B" time=2026-02-06T20:45:42.052+09:00 level=INFO source=server.go:756 msg="loading model" "model layers"=17 requested=-1 time=2026-02-06T20:45:42.060+09:00 level=INFO source=runner.go:1410 msg="starting ollama engine" time=2026-02-06T20:45:42.060+09:00 level=INFO source=runner.go:1445 msg="Server listening on 127.0.0.1:40175" time=2026-02-06T20:45:42.064+09:00 level=INFO source=runner.go:1283 msg=load request="{Operation:fit LoraPath:[] Parallel:7 BatchSize:512 FlashAttention:Enabled KvSize:917504 KvCacheType: NumThreads:20 GPULayers:17[ID:GPU-6e49fe13-bd59-6a45-70d8-508ed1ad24ec Layers:17(0..16)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2026-02-06T20:45:42.076+09:00 level=INFO source=ggml.go:136 msg="" architecture=glmocr file_type=F16 name="" description="" num_tensors=527 num_key_values=47 load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu.so ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 CUDA devices: Device 0: NVIDIA GB10, compute capability 12.1, VMM: yes, ID: GPU-6e49fe13-bd59-6a45-70d8-508ed1ad24ec load_backend: loaded CUDA backend from /usr/local/lib/ollama/cuda_v13/libggml-cuda.so time=2026-02-06T20:45:42.409+09:00 level=INFO source=ggml.go:104 msg=system CPU.0.NEON=1 CPU.0.ARM_FMA=1 CPU.0.LLAMAFILE=1 CPU.1.NEON=1 CPU.1.ARM_FMA=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=750,800,860,870,890,900,1000,1030,1100,1200,1210 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(clang) //ml/backend/ggml/ggml/src/ggml-cuda/cpy.cu:396: GGML_ASSERT(ggml_nbytes(src0) <= INT_MAX) failed ... [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/aarch64-linux-gnu/libthread_db.so.1". 0x0000c4bb2008f120 in ?? () #0 0x0000c4bb2008f120 in ?? () #1 0x0000000000000004 in ?? () Backtrace stopped: previous frame identical to this frame (corrupt stack?) [Inferior 1 (process 19273) detached] SIGABRT: abort PC=0xf0af09157608 m=8 sigcode=18446744073709551610 signal arrived during cgo execution goroutine 13 gp=0x4000103a40 m=8 mp=0x4000339808 [syscall]: runtime.cgocall(0xc4bb20d55594, 0x4000a030a8) runtime/cgocall.go:167 +0x44 fp=0x4000a03060 sp=0x4000a03020 pc=0xc4bb200f4914 github.com/ollama/ollama/ml/backend/ggml._Cfunc_ggml_backend_sched_reserve(0xf0ae950bab50, 0xf0ac32d49be0) _cgo_gotypes.go:1012 +0x34 fp=0x4000a030a0 sp=0x4000a03060 pc=0xc4bb2051e6c4 github.com/ollama/ollama/ml/backend/ggml.(*Context).Reserve.func2(...) github.com/ollama/ollama/ml/backend/ggml/ggml.go:850 github.com/ollama/ollama/ml/backend/ggml.(*Context).Reserve(0x40005b6080) github.com/ollama/ollama/ml/backend/ggml/ggml.go:850 +0xe0 fp=0x4000a03330 sp=0x4000a030a0 pc=0xc4bb20528e70 github.com/ollama/ollama/runner/ollamarunner.(*Server).reserveWorstCaseGraph(0x400059a3c0, 0x1) github.com/ollama/ollama/runner/ollamarunner/runner.go:1168 +0x834 fp=0x4000a03660 sp=0x4000a03330 pc=0xc4bb205f1644 github.com/ollama/ollama/runner/ollamarunner.(*Server).allocModel(0x400059a3c0, {0xfffff92e5b36?, 0x0?}, {0x0, 0x14, {0x40000b9580, 0x1, 0x1}, 0x1}, {0x0?, ...}, ...) github.com/ollama/ollama/runner/ollamarunner/runner.go:1231 +0x2e4 fp=0x4000a03710 sp=0x4000a03660 pc=0xc4bb205f1d14 github.com/ollama/ollama/runner/ollamarunner.(*Server).load(0x400059a3c0, {0xc4bb215eaa00, 0x4000252000}, 0x4000196000) github.com/ollama/ollama/runner/ollamarunner/runner.go:1310 +0x460 fp=0x4000a03aa0 sp=0x4000a03710 pc=0xc4bb205f2620 github.com/ollama/ollama/runner/ollamarunner.(*Server).load-fm({0xc4bb215eaa00?, 0x4000252000?}, 0x4000165b28?) <autogenerated>:1 +0x40 fp=0x4000a03ad0 sp=0x4000a03aa0 pc=0xc4bb205f44d0 net/http.HandlerFunc.ServeHTTP(0x40005b58c0?, {0xc4bb215eaa00?, 0x4000252000?}, 0x4000165b10?) net/http/server.go:2294 +0x38 fp=0x4000a03b00 sp=0x4000a03ad0 pc=0xc4bb203b14c8 net/http.(*ServeMux).ServeHTTP(0x10?, {0xc4bb215eaa00, 0x4000252000}, 0x4000196000) net/http/server.go:2822 +0x1b4 fp=0x4000a03b50 sp=0x4000a03b00 pc=0xc4bb203b3054 net/http.serverHandler.ServeHTTP({0xc4bb215e6e10?}, {0xc4bb215eaa00?, 0x4000252000?}, 0x1?) net/http/server.go:3301 +0xbc fp=0x4000a03b80 sp=0x4000a03b50 pc=0xc4bb203ced3c net/http.(*conn).serve(0x4000128480, {0xc4bb215ecec8, 0x4000124bd0}) net/http/server.go:2102 +0x52c fp=0x4000a03fa0 sp=0x4000a03b80 pc=0xc4bb203afc6c net/http.(*Server).Serve.gowrap3() net/http/server.go:3454 +0x30 fp=0x4000a03fd0 sp=0x4000a03fa0 pc=0xc4bb203b4e30 runtime.goexit({}) runtime/asm_arm64.s:1223 +0x4 fp=0x4000a03fd0 sp=0x4000a03fd0 pc=0xc4bb200ffd14 created by net/http.(*Server).Serve in goroutine 1 net/http/server.go:3454 +0x3d8 goroutine 1 gp=0x40000021c0 m=nil [IO wait]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xc8 fp=0x4000a05720 sp=0x4000a05700 pc=0xc4bb200f7e28 runtime.netpollblock(0x7000000000?, 0x6?, 0x0?) runtime/netpoll.go:575 +0x158 fp=0x4000a05760 sp=0x4000a05720 pc=0xc4bb200bcde8 internal/poll.runtime_pollWait(0xf0af08e17f30, 0x72) runtime/netpoll.go:351 +0xa0 fp=0x4000a05790 sp=0x4000a05760 pc=0xc4bb200f6fe0 internal/poll.(*pollDesc).wait(0x40001f5e00?, 0xc4bb20180058?, 0x0) internal/poll/fd_poll_runtime.go:84 +0x28 fp=0x4000a057c0 sp=0x4000a05790 pc=0xc4bb201795f8 internal/poll.(*pollDesc).waitRead(...) internal/poll/fd_poll_runtime.go:89 internal/poll.(*FD).Accept(0x40001f5e00) internal/poll/fd_unix.go:620 +0x24c fp=0x4000a05870 sp=0x4000a057c0 pc=0xc4bb2017decc net.(*netFD).accept(0x40001f5e00) net/fd_unix.go:172 +0x28 fp=0x4000a05930 sp=0x4000a05870 pc=0xc4bb201ec9d8 net.(*TCPListener).accept(0x40005b7340) net/tcpsock_posix.go:159 +0x24 fp=0x4000a05980 sp=0x4000a05930 pc=0xc4bb20201e74 net.(*TCPListener).Accept(0x40005b7340) net/tcpsock.go:380 +0x2c fp=0x4000a059c0 sp=0x4000a05980 pc=0xc4bb20200e0c net/http.(*onceCloseListener).Accept(0x4000128480?) <autogenerated>:1 +0x30 fp=0x4000a059e0 sp=0x4000a059c0 pc=0xc4bb203db360 net/http.(*Server).Serve(0x4000117400, {0xc4bb215ea820, 0x40005b7340}) net/http/server.go:3424 +0x290 fp=0x4000a05b10 sp=0x4000a059e0 pc=0xc4bb203b4aa0 github.com/ollama/ollama/runner/ollamarunner.Execute({0x40000320a0, 0x4, 0x4}) github.com/ollama/ollama/runner/ollamarunner/runner.go:1446 +0x7fc fp=0x4000a05ce0 sp=0x4000a05b10 pc=0xc4bb205f3efc github.com/ollama/ollama/runner.Execute({0x4000032080?, 0x0?, 0x0?}) github.com/ollama/ollama/runner/runner.go:28 +0x190 fp=0x4000a05d10 sp=0x4000a05ce0 pc=0xc4bb206485b0 github.com/ollama/ollama/cmd.NewCLI.func3(0x4000117200?, {0xc4bb210610ff?, 0x4?, 0xc4bb21061103?}) github.com/ollama/ollama/cmd/cmd.go:1979 +0x54 fp=0x4000a05d40 sp=0x4000a05d10 pc=0xc4bb20d05554 github.com/spf13/cobra.(*Command).execute(0x400012d508, {0x40005ad040, 0x5, 0x5}) github.com/spf13/cobra@v1.7.0/command.go:940 +0x648 fp=0x4000a05e60 sp=0x4000a05d40 pc=0xc4bb2025c6d8 github.com/spf13/cobra.(*Command).ExecuteC(0x40005bb208) github.com/spf13/cobra@v1.7.0/command.go:1068 +0x320 fp=0x4000a05f20 sp=0x4000a05e60 pc=0xc4bb2025ce20 github.com/spf13/cobra.(*Command).Execute(...) github.com/spf13/cobra@v1.7.0/command.go:992 github.com/spf13/cobra.(*Command).ExecuteContext(...) github.com/spf13/cobra@v1.7.0/command.go:985 main.main() github.com/ollama/ollama/main.go:12 +0x54 fp=0x4000a05f40 sp=0x4000a05f20 pc=0xc4bb20d060a4 runtime.main() runtime/proc.go:283 +0x284 fp=0x4000a05fd0 sp=0x4000a05f40 pc=0xc4bb200c4194 runtime.goexit({}) runtime/asm_arm64.s:1223 +0x4 fp=0x4000a05fd0 sp=0x4000a05fd0 pc=0xc4bb200ffd14 goroutine 2 gp=0x4000002c40 m=nil [force gc (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xc8 fp=0x400008af90 sp=0x400008af70 pc=0xc4bb200f7e28 runtime.goparkunlock(...) runtime/proc.go:441 runtime.forcegchelper() runtime/proc.go:348 +0xb8 fp=0x400008afd0 sp=0x400008af90 pc=0xc4bb200c44e8 runtime.goexit({}) runtime/asm_arm64.s:1223 +0x4 fp=0x400008afd0 sp=0x400008afd0 pc=0xc4bb200ffd14 created by runtime.init.7 in goroutine 1 runtime/proc.go:336 +0x24 goroutine 3 gp=0x4000003180 m=nil [GC sweep wait]: runtime.gopark(0x1?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xc8 fp=0x400008b760 sp=0x400008b740 pc=0xc4bb200f7e28 runtime.goparkunlock(...) runtime/proc.go:441 runtime.bgsweep(0x40000b6000) runtime/mgcsweep.go:316 +0x108 fp=0x400008b7b0 sp=0x400008b760 pc=0xc4bb200aed18 runtime.gcenable.gowrap1() runtime/mgc.go:204 +0x28 fp=0x400008b7d0 sp=0x400008b7b0 pc=0xc4bb200a2b48 runtime.goexit({}) runtime/asm_arm64.s:1223 +0x4 fp=0x400008b7d0 sp=0x400008b7d0 pc=0xc4bb200ffd14 created by runtime.gcenable in goroutine 1 runtime/mgc.go:204 +0x6c goroutine 4 gp=0x4000003340 m=nil [GC scavenge wait]: runtime.gopark(0x71a226?, 0x3d576e20?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xc8 fp=0x400008bf60 sp=0x400008bf40 pc=0xc4bb200f7e28 runtime.goparkunlock(...) runtime/proc.go:441 runtime.(*scavengerState).park(0xc4bb21ef2540) runtime/mgcscavenge.go:425 +0x5c fp=0x400008bf90 sp=0x400008bf60 pc=0xc4bb200ac7dc runtime.bgscavenge(0x40000b6000) runtime/mgcscavenge.go:658 +0xac fp=0x400008bfb0 sp=0x400008bf90 pc=0xc4bb200acd5c runtime.gcenable.gowrap2() runtime/mgc.go:205 +0x28 fp=0x400008bfd0 sp=0x400008bfb0 pc=0xc4bb200a2ae8 runtime.goexit({}) runtime/asm_arm64.s:1223 +0x4 fp=0x400008bfd0 sp=0x400008bfd0 pc=0xc4bb200ffd14 created by runtime.gcenable in goroutine 1 runtime/mgc.go:205 +0xac goroutine 5 gp=0x4000003c00 m=nil [finalizer wait]: runtime.gopark(0x18000001b8?, 0x1000000000000?, 0xf8?, 0xa5?, 0xc4bb203dddac?) runtime/proc.go:435 +0xc8 fp=0x400008a590 sp=0x400008a570 pc=0xc4bb200f7e28 runtime.runfinq() runtime/mfinal.go:196 +0x108 fp=0x400008a7d0 sp=0x400008a590 pc=0xc4bb200a1b48 runtime.goexit({}) runtime/asm_arm64.s:1223 +0x4 fp=0x400008a7d0 sp=0x400008a7d0 pc=0xc4bb200ffd14 created by runtime.createfing in goroutine 1 runtime/mfinal.go:166 +0x80 goroutine 6 gp=0x4000208700 m=nil [chan receive]: runtime.gopark(0x40001a1e00?, 0x40005f6018?, 0x48?, 0xc7?, 0xc4bb201c4f98?) runtime/proc.go:435 +0xc8 fp=0x400008c6f0 sp=0x400008c6d0 pc=0xc4bb200f7e28 runtime.chanrecv(0x40000c4310, 0x0, 0x1) runtime/chan.go:664 +0x42c fp=0x400008c770 sp=0x400008c6f0 pc=0xc4bb20093bac runtime.chanrecv1(0x0?, 0x0?) runtime/chan.go:506 +0x14 fp=0x400008c7a0 sp=0x400008c770 pc=0xc4bb20093744 runtime.unique_runtime_registerUniqueMapCleanup.func2(...) runtime/mgc.go:1796 runtime.unique_runtime_registerUniqueMapCleanup.gowrap1() runtime/mgc.go:1799 +0x3c fp=0x400008c7d0 sp=0x400008c7a0 pc=0xc4bb200a5d6c runtime.goexit({}) runtime/asm_arm64.s:1223 +0x4 fp=0x400008c7d0 sp=0x400008c7d0 pc=0xc4bb200ffd14 created by unique.runtime_registerUniqueMapCleanup in goroutine 1 runtime/mgc.go:1794 +0x78 goroutine 7 gp=0x4000208c40 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xc8 fp=0x400008cf10 sp=0x400008cef0 pc=0xc4bb200f7e28 runtime.gcBgMarkWorker(0x40000c5730) runtime/mgc.go:1423 +0xdc fp=0x400008cfb0 sp=0x400008cf10 pc=0xc4bb200a4fdc runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x28 fp=0x400008cfd0 sp=0x400008cfb0 pc=0xc4bb200a4ec8 runtime.goexit({}) runtime/asm_arm64.s:1223 +0x4 fp=0x400008cfd0 sp=0x400008cfd0 pc=0xc4bb200ffd14 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x140 goroutine 8 gp=0x4000208e00 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xc8 fp=0x400008d710 sp=0x400008d6f0 pc=0xc4bb200f7e28 runtime.gcBgMarkWorker(0x40000c5730) runtime/mgc.go:1423 +0xdc fp=0x400008d7b0 sp=0x400008d710 pc=0xc4bb200a4fdc runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x28 fp=0x400008d7d0 sp=0x400008d7b0 pc=0xc4bb200a4ec8 runtime.goexit({}) runtime/asm_arm64.s:1223 +0x4 fp=0x400008d7d0 sp=0x400008d7d0 pc=0xc4bb200ffd14 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x140 goroutine 18 gp=0x4000102380 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xc8 fp=0x4000086710 sp=0x40000866f0 pc=0xc4bb200f7e28 runtime.gcBgMarkWorker(0x40000c5730) runtime/mgc.go:1423 +0xdc fp=0x40000867b0 sp=0x4000086710 pc=0xc4bb200a4fdc runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x28 fp=0x40000867d0 sp=0x40000867b0 pc=0xc4bb200a4ec8 runtime.goexit({}) runtime/asm_arm64.s:1223 +0x4 fp=0x40000867d0 sp=0x40000867d0 pc=0xc4bb200ffd14 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x140 goroutine 19 gp=0x4000102540 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xc8 fp=0x4000086f10 sp=0x4000086ef0 pc=0xc4bb200f7e28 runtime.gcBgMarkWorker(0x40000c5730) runtime/mgc.go:1423 +0xdc fp=0x4000086fb0 sp=0x4000086f10 pc=0xc4bb200a4fdc runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x28 fp=0x4000086fd0 sp=0x4000086fb0 pc=0xc4bb200a4ec8 runtime.goexit({}) runtime/asm_arm64.s:1223 +0x4 fp=0x4000086fd0 sp=0x4000086fd0 pc=0xc4bb200ffd14 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x140 goroutine 34 gp=0x4000504000 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xc8 fp=0x400050a710 sp=0x400050a6f0 pc=0xc4bb200f7e28 runtime.gcBgMarkWorker(0x40000c5730) runtime/mgc.go:1423 +0xdc fp=0x400050a7b0 sp=0x400050a710 pc=0xc4bb200a4fdc runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x28 fp=0x400050a7d0 sp=0x400050a7b0 pc=0xc4bb200a4ec8 runtime.goexit({}) runtime/asm_arm64.s:1223 +0x4 fp=0x400050a7d0 sp=0x400050a7d0 pc=0xc4bb200ffd14 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x140 goroutine 35 gp=0x40005041c0 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xc8 fp=0x400050af10 sp=0x400050aef0 pc=0xc4bb200f7e28 runtime.gcBgMarkWorker(0x40000c5730) runtime/mgc.go:1423 +0xdc fp=0x400050afb0 sp=0x400050af10 pc=0xc4bb200a4fdc runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x28 fp=0x400050afd0 sp=0x400050afb0 pc=0xc4bb200a4ec8 runtime.goexit({}) runtime/asm_arm64.s:1223 +0x4 fp=0x400050afd0 sp=0x400050afd0 pc=0xc4bb200ffd14 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x140 goroutine 36 gp=0x4000504380 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xc8 fp=0x400050b710 sp=0x400050b6f0 pc=0xc4bb200f7e28 runtime.gcBgMarkWorker(0x40000c5730) runtime/mgc.go:1423 +0xdc fp=0x400050b7b0 sp=0x400050b710 pc=0xc4bb200a4fdc runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x28 fp=0x400050b7d0 sp=0x400050b7b0 pc=0xc4bb200a4ec8 runtime.goexit({}) runtime/asm_arm64.s:1223 +0x4 fp=0x400050b7d0 sp=0x400050b7d0 pc=0xc4bb200ffd14 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x140 goroutine 37 gp=0x4000504540 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xc8 fp=0x400050bf10 sp=0x400050bef0 pc=0xc4bb200f7e28 runtime.gcBgMarkWorker(0x40000c5730) runtime/mgc.go:1423 +0xdc fp=0x400050bfb0 sp=0x400050bf10 pc=0xc4bb200a4fdc runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x28 fp=0x400050bfd0 sp=0x400050bfb0 pc=0xc4bb200a4ec8 runtime.goexit({}) runtime/asm_arm64.s:1223 +0x4 fp=0x400050bfd0 sp=0x400050bfd0 pc=0xc4bb200ffd14 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x140 goroutine 9 gp=0x4000208fc0 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xc8 fp=0x400008df10 sp=0x400008def0 pc=0xc4bb200f7e28 runtime.gcBgMarkWorker(0x40000c5730) runtime/mgc.go:1423 +0xdc fp=0x400008dfb0 sp=0x400008df10 pc=0xc4bb200a4fdc runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x28 fp=0x400008dfd0 sp=0x400008dfb0 pc=0xc4bb200a4ec8 runtime.goexit({}) runtime/asm_arm64.s:1223 +0x4 fp=0x400008dfd0 sp=0x400008dfd0 pc=0xc4bb200ffd14 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x140 goroutine 20 gp=0x4000102700 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xc8 fp=0x4000087710 sp=0x40000876f0 pc=0xc4bb200f7e28 runtime.gcBgMarkWorker(0x40000c5730) runtime/mgc.go:1423 +0xdc fp=0x40000877b0 sp=0x4000087710 pc=0xc4bb200a4fdc runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x28 fp=0x40000877d0 sp=0x40000877b0 pc=0xc4bb200a4ec8 runtime.goexit({}) runtime/asm_arm64.s:1223 +0x4 fp=0x40000877d0 sp=0x40000877d0 pc=0xc4bb200ffd14 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x140 goroutine 21 gp=0x40001028c0 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xc8 fp=0x4000087f10 sp=0x4000087ef0 pc=0xc4bb200f7e28 runtime.gcBgMarkWorker(0x40000c5730) runtime/mgc.go:1423 +0xdc fp=0x4000087fb0 sp=0x4000087f10 pc=0xc4bb200a4fdc runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x28 fp=0x4000087fd0 sp=0x4000087fb0 pc=0xc4bb200a4ec8 runtime.goexit({}) runtime/asm_arm64.s:1223 +0x4 fp=0x4000087fd0 sp=0x4000087fd0 pc=0xc4bb200ffd14 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x140 goroutine 22 gp=0x4000102a80 m=nil [GC worker (idle)]: runtime.gopark(0xc4bb21fc80c0?, 0x1?, 0x10?, 0x3d?, 0x0?) runtime/proc.go:435 +0xc8 fp=0x4000088710 sp=0x40000886f0 pc=0xc4bb200f7e28 runtime.gcBgMarkWorker(0x40000c5730) runtime/mgc.go:1423 +0xdc fp=0x40000887b0 sp=0x4000088710 pc=0xc4bb200a4fdc runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x28 fp=0x40000887d0 sp=0x40000887b0 pc=0xc4bb200a4ec8 runtime.goexit({}) runtime/asm_arm64.s:1223 +0x4 fp=0x40000887d0 sp=0x40000887d0 pc=0xc4bb200ffd14 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x140 goroutine 23 gp=0x4000102c40 m=nil [GC worker (idle)]: runtime.gopark(0x6cd9536b0767?, 0x1?, 0x70?, 0x3c?, 0x0?) runtime/proc.go:435 +0xc8 fp=0x4000088f10 sp=0x4000088ef0 pc=0xc4bb200f7e28 runtime.gcBgMarkWorker(0x40000c5730) runtime/mgc.go:1423 +0xdc fp=0x4000088fb0 sp=0x4000088f10 pc=0xc4bb200a4fdc runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x28 fp=0x4000088fd0 sp=0x4000088fb0 pc=0xc4bb200a4ec8 runtime.goexit({}) runtime/asm_arm64.s:1223 +0x4 fp=0x4000088fd0 sp=0x4000088fd0 pc=0xc4bb200ffd14 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x140 goroutine 24 gp=0x4000102e00 m=nil [GC worker (idle)]: runtime.gopark(0x6cd9536b3607?, 0x1?, 0xd1?, 0x72?, 0x0?) runtime/proc.go:435 +0xc8 fp=0x4000157f10 sp=0x4000157ef0 pc=0xc4bb200f7e28 runtime.gcBgMarkWorker(0x40000c5730) runtime/mgc.go:1423 +0xdc fp=0x4000157fb0 sp=0x4000157f10 pc=0xc4bb200a4fdc runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x28 fp=0x4000157fd0 sp=0x4000157fb0 pc=0xc4bb200a4ec8 runtime.goexit({}) runtime/asm_arm64.s:1223 +0x4 fp=0x4000157fd0 sp=0x4000157fd0 pc=0xc4bb200ffd14 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x140 goroutine 25 gp=0x4000102fc0 m=nil [GC worker (idle)]: runtime.gopark(0x6cd9536b2fd7?, 0x1?, 0x17?, 0x74?, 0x0?) runtime/proc.go:435 +0xc8 fp=0x4000089f10 sp=0x4000089ef0 pc=0xc4bb200f7e28 runtime.gcBgMarkWorker(0x40000c5730) runtime/mgc.go:1423 +0xdc fp=0x4000089fb0 sp=0x4000089f10 pc=0xc4bb200a4fdc runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x28 fp=0x4000089fd0 sp=0x4000089fb0 pc=0xc4bb200a4ec8 runtime.goexit({}) runtime/asm_arm64.s:1223 +0x4 fp=0x4000089fd0 sp=0x4000089fd0 pc=0xc4bb200ffd14 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x140 goroutine 38 gp=0x4000504700 m=nil [GC worker (idle)]: runtime.gopark(0x6cd9536b3997?, 0x1?, 0x54?, 0x24?, 0x0?) runtime/proc.go:435 +0xc8 fp=0x400050c710 sp=0x400050c6f0 pc=0xc4bb200f7e28 runtime.gcBgMarkWorker(0x40000c5730) runtime/mgc.go:1423 +0xdc fp=0x400050c7b0 sp=0x400050c710 pc=0xc4bb200a4fdc runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x28 fp=0x400050c7d0 sp=0x400050c7b0 pc=0xc4bb200a4ec8 runtime.goexit({}) runtime/asm_arm64.s:1223 +0x4 fp=0x400050c7d0 sp=0x400050c7d0 pc=0xc4bb200ffd14 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x140 goroutine 10 gp=0x4000209180 m=nil [GC worker (idle)]: runtime.gopark(0x6cd9536badc7?, 0x1?, 0xa1?, 0x33?, 0x0?) runtime/proc.go:435 +0xc8 fp=0x4000506710 sp=0x40005066f0 pc=0xc4bb200f7e28 runtime.gcBgMarkWorker(0x40000c5730) runtime/mgc.go:1423 +0xdc fp=0x40005067b0 sp=0x4000506710 pc=0xc4bb200a4fdc runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x28 fp=0x40005067d0 sp=0x40005067b0 pc=0xc4bb200a4ec8 runtime.goexit({}) runtime/asm_arm64.s:1223 +0x4 fp=0x40005067d0 sp=0x40005067d0 pc=0xc4bb200ffd14 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x140 goroutine 39 gp=0x40005048c0 m=nil [GC worker (idle)]: runtime.gopark(0x6cd9536bd5c7?, 0x1?, 0x22?, 0x45?, 0x0?) runtime/proc.go:435 +0xc8 fp=0x400050cf10 sp=0x400050cef0 pc=0xc4bb200f7e28 runtime.gcBgMarkWorker(0x40000c5730) runtime/mgc.go:1423 +0xdc fp=0x400050cfb0 sp=0x400050cf10 pc=0xc4bb200a4fdc runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x28 fp=0x400050cfd0 sp=0x400050cfb0 pc=0xc4bb200a4ec8 runtime.goexit({}) runtime/asm_arm64.s:1223 +0x4 fp=0x400050cfd0 sp=0x400050cfd0 pc=0xc4bb200ffd14 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x140 goroutine 26 gp=0x4000103180 m=nil [GC worker (idle)]: runtime.gopark(0x6cd9537bd43b?, 0x1?, 0x92?, 0x47?, 0x0?) runtime/proc.go:435 +0xc8 fp=0x400011c710 sp=0x400011c6f0 pc=0xc4bb200f7e28 runtime.gcBgMarkWorker(0x40000c5730) runtime/mgc.go:1423 +0xdc fp=0x400011c7b0 sp=0x400011c710 pc=0xc4bb200a4fdc runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x28 fp=0x400011c7d0 sp=0x400011c7b0 pc=0xc4bb200a4ec8 runtime.goexit({}) runtime/asm_arm64.s:1223 +0x4 fp=0x400011c7d0 sp=0x400011c7d0 pc=0xc4bb200ffd14 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x140 goroutine 11 gp=0x4000209340 m=nil [GC worker (idle)]: runtime.gopark(0xc4bb21fc80c0?, 0x1?, 0x25?, 0xba?, 0x0?) runtime/proc.go:435 +0xc8 fp=0x4000506f10 sp=0x4000506ef0 pc=0xc4bb200f7e28 runtime.gcBgMarkWorker(0x40000c5730) runtime/mgc.go:1423 +0xdc fp=0x4000506fb0 sp=0x4000506f10 pc=0xc4bb200a4fdc runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x28 fp=0x4000506fd0 sp=0x4000506fb0 pc=0xc4bb200a4ec8 runtime.goexit({}) runtime/asm_arm64.s:1223 +0x4 fp=0x4000506fd0 sp=0x4000506fd0 pc=0xc4bb200ffd14 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x140 goroutine 12 gp=0x4000103880 m=nil [sync.WaitGroup.Wait]: runtime.gopark(0xc4bb21f045c0?, 0x0?, 0xc0?, 0x20?, 0x0?) runtime/proc.go:435 +0xc8 fp=0x400009da90 sp=0x400009da70 pc=0xc4bb200f7e28 runtime.goparkunlock(...) runtime/proc.go:441 runtime.semacquire1(0x400059a478, 0x0, 0x1, 0x0, 0x18) runtime/sema.go:188 +0x204 fp=0x400009dae0 sp=0x400009da90 pc=0xc4bb200d8634 sync.runtime_SemacquireWaitGroup(0x0?) runtime/sema.go:110 +0x2c fp=0x400009db20 sp=0x400009dae0 pc=0xc4bb200f97dc sync.(*WaitGroup).Wait(0x400059a470) sync/waitgroup.go:118 +0x70 fp=0x400009db40 sp=0x400009db20 pc=0xc4bb2010b3f0 github.com/ollama/ollama/runner/ollamarunner.(*Server).run(0x400059a3c0, {0xc4bb215ecf00, 0x40005ad0e0}) github.com/ollama/ollama/runner/ollamarunner/runner.go:441 +0x38 fp=0x400009dfa0 sp=0x400009db40 pc=0xc4bb205ec2c8 github.com/ollama/ollama/runner/ollamarunner.Execute.gowrap1() github.com/ollama/ollama/runner/ollamarunner/runner.go:1423 +0x30 fp=0x400009dfd0 sp=0x400009dfa0 pc=0xc4bb205f4120 runtime.goexit({}) runtime/asm_arm64.s:1223 +0x4 fp=0x400009dfd0 sp=0x400009dfd0 pc=0xc4bb200ffd14 created by github.com/ollama/ollama/runner/ollamarunner.Execute in goroutine 1 github.com/ollama/ollama/runner/ollamarunner/runner.go:1423 +0x448 goroutine 40 gp=0x4000103c00 m=nil [IO wait]: runtime.gopark(0x0?, 0x0?, 0xc8?, 0xdd?, 0xc4bb200981f0?) runtime/proc.go:435 +0xc8 fp=0x400050dd80 sp=0x400050dd60 pc=0xc4bb200f7e28 runtime.netpollblock(0x0?, 0xffffffff?, 0xff?) runtime/netpoll.go:575 +0x158 fp=0x400050ddc0 sp=0x400050dd80 pc=0xc4bb200bcde8 internal/poll.runtime_pollWait(0xf0af08e17e18, 0x72) runtime/netpoll.go:351 +0xa0 fp=0x400050ddf0 sp=0x400050ddc0 pc=0xc4bb200f6fe0 internal/poll.(*pollDesc).wait(0x40001f5e80?, 0x4000124cd1?, 0x0) internal/poll/fd_poll_runtime.go:84 +0x28 fp=0x400050de20 sp=0x400050ddf0 pc=0xc4bb201795f8 internal/poll.(*pollDesc).waitRead(...) internal/poll/fd_poll_runtime.go:89 internal/poll.(*FD).Read(0x40001f5e80, {0x4000124cd1, 0x1, 0x1}) internal/poll/fd_unix.go:165 +0x1fc fp=0x400050dec0 sp=0x400050de20 pc=0xc4bb2017a8ac net.(*netFD).Read(0x40001f5e80, {0x4000124cd1?, 0x400050df58?, 0xc4bb203aa6e4?}) net/fd_posix.go:55 +0x28 fp=0x400050df10 sp=0x400050dec0 pc=0xc4bb201eafa8 net.(*conn).Read(0x400008e940, {0x4000124cd1?, 0x0?, 0x0?}) net/net.go:194 +0x34 fp=0x400050df60 sp=0x400050df10 pc=0xc4bb201f86e4 net/http.(*connReader).backgroundRead(0x4000124cc0) net/http/server.go:690 +0x40 fp=0x400050dfb0 sp=0x400050df60 pc=0xc4bb203aa5e0 net/http.(*connReader).startBackgroundRead.gowrap2() net/http/server.go:686 +0x28 fp=0x400050dfd0 sp=0x400050dfb0 pc=0xc4bb203aa4c8 runtime.goexit({}) runtime/asm_arm64.s:1223 +0x4 fp=0x400050dfd0 sp=0x400050dfd0 pc=0xc4bb200ffd14 created by net/http.(*connReader).startBackgroundRead in goroutine 13 net/http/server.go:686 +0xc4 r0 0x0 r1 0x4b50 r2 0x6 r3 0xf0aebafdf140 r4 0xf0af09718b50 r5 0x1 r6 0x20 r7 0xf0aebafdd8d0 r8 0x83 r9 0x0 r10 0x70 r11 0x101010101010101 r12 0xf0aebafdd960 r13 0x0 r14 0x1 r15 0x1 r16 0x1 r17 0xf0af090f7d0c r18 0xf8122e0c r19 0x4b50 r20 0xf0aebafdf140 r21 0x6 r22 0xf0ae71e867a8 r23 0xf0aebafde690 r24 0xf0ac32d49790 r25 0xf0ae950bc2e8 r26 0xf0ac32d49790 r27 0xf0ac17ac15f0 r28 0x18 r29 0xf0aebafdd860 lr 0xf0af091575f4 sp 0xf0aebafdd850 pc 0xf0af09157608 fault 0x0 time=2026-02-06T20:45:52.864+09:00 level=ERROR source=server.go:1204 msg="do load request" error="Post \"http://127.0.0.1:40175/load\": EOF" time=2026-02-06T20:45:52.865+09:00 level=ERROR source=server.go:1204 msg="do load request" error="Post \"http://127.0.0.1:40175/load\": dial tcp 127.0.0.1:40175: connect: connection refused" time=2026-02-06T20:45:52.865+09:00 level=INFO source=sched.go:490 msg="Load failed" model=/usr/share/ollama/.ollama/models/blobs/sha256-65493e1f85b9ea4ba3ed793515fde13cbdbea7d74ad2c662b566b146eab0081e error="model failed to load, this may be due to resource limitations or an internal error, check ollama server logs for details" time=2026-02-06T20:45:52.950+09:00 level=ERROR source=server.go:303 msg="llama runner terminated" error="exit status 2" [GIN] 2026/02/06 - 20:45:52 | 500 | 11.332028252s | 127.0.0.1 | POST "/api/generate" ``` </details>
Author
Owner

@xuancong84 commented on GitHub (Feb 19, 2026):

I have also encountered the same issue for gpt-oss-120b: OLLAMA_NUM_PARALLEL=7 works, OLLAMA_NUM_PARALLEL=8 crashes.

I wonder why the developer uses signed 32-bit integer for tensor maximum size. If cannot change to 64-bit integer, they should store block_count instead, e.g., for the tensor size signed 32-bit integer, negative number means NUM_OF_1K_BLOCKS. That would allow a maximum of 2TB single tensor copy, should be enough for some time.

Below is the analysis from ChatGPT 5.2 which (I believe) gives significant insight for the developers:

Your log basically answers it:

Ollama sees both GPUs and sums VRAM: total_vram="285.8 GiB" → so it picks a huge default context: default_num_ctx=262144 (256k)

Then it clamps because the model was trained for less:
requested context size too large for model num_ctx=262144 n_ctx_train=131072

With OLLAMA_NUM_PARALLEL=8, Ollama multiplies the KV reservation:
Parallel:8 … KvSize:1048576

And 1048576 = 131072 × 8 (your model’s max ctx × parallel). That means it’s trying to reserve KV/cache tensors for ~1 million tokens total.

Why it fails at parallel=8 (even though you have 2×144GB)

It’s not an out-of-VRAM failure. It’s a ggml-cuda kernel limit/bug:

ggml-cuda/cpy.cu:396: GGML_ASSERT(ggml_nbytes(src0) <= INT_MAX) failed

That assert triggers when a tensor copy exceeds the signed 32-bit limit (~2.14GB). With KV for ~1M tokens, some buffers become >2GB, so the runner aborts. This exact crash has been reported on GH200 144GB systems too.
Upstream llama.cpp/ggml has a PR specifically addressing INT_MAX overflow in the CUDA copy kernels, which is the same root cause.

So the bottleneck is: “single tensor >2GB”, not “total VRAM available”.

<!-- gh-comment-id:3924572081 --> @xuancong84 commented on GitHub (Feb 19, 2026): I have also encountered the same issue for gpt-oss-120b: `OLLAMA_NUM_PARALLEL=7` works, `OLLAMA_NUM_PARALLEL=8` crashes. I wonder why the developer uses signed 32-bit integer for tensor maximum size. If cannot change to 64-bit integer, they should store `block_count` instead, e.g., for the tensor size signed 32-bit integer, negative number means NUM_OF_1K_BLOCKS. That would allow a maximum of 2TB single tensor copy, should be enough for some time. **Below is the analysis from ChatGPT 5.2 which (I believe) gives significant insight for the developers:** Your log basically answers it: Ollama sees both GPUs and sums VRAM: total_vram="285.8 GiB" → so it picks a huge default context: default_num_ctx=262144 (256k) Then it clamps because the model was trained for less: requested context size too large for model num_ctx=262144 n_ctx_train=131072 With OLLAMA_NUM_PARALLEL=8, Ollama multiplies the KV reservation: Parallel:8 … KvSize:1048576 And 1048576 = 131072 × 8 (your model’s max ctx × parallel). That means it’s trying to reserve KV/cache tensors for ~1 million tokens total. Why it fails at parallel=8 (even though you have 2×144GB) It’s not an out-of-VRAM failure. It’s a ggml-cuda kernel limit/bug: ggml-cuda/cpy.cu:396: GGML_ASSERT(ggml_nbytes(src0) <= INT_MAX) failed That assert triggers when a tensor copy exceeds the signed 32-bit limit (~2.14GB). With KV for ~1M tokens, some buffers become >2GB, so the runner aborts. This exact crash has been reported on GH200 144GB systems too. Upstream llama.cpp/ggml has a PR specifically addressing INT_MAX overflow in the CUDA copy kernels, which is the same root cause. So the bottleneck is: “single tensor >2GB”, not “total VRAM available”.
Author
Owner

@without-ordinary commented on GitHub (Feb 23, 2026):

I have also encountered the same issue for gpt-oss-120b: OLLAMA_NUM_PARALLEL=7 works, OLLAMA_NUM_PARALLEL=8 crashes.

I'm encountering this issue as well. The fun part is that it was working fine post install for many hours and only suddenly started later. I think the cause for that was due to smaller models already running when the larger context models were loaded resulting it not triggering this bug, but once all of the vram was available, the issue could occur.

So I guess my workaround options are either force load another model first that has a far smaller max context to burn some vram, or decrease parallel by -1, or set OLLAMA_CONTEXT_LENGTH to some number slightly less than n_ctx_train (eg. 131000 vs 131072).

<!-- gh-comment-id:3943173896 --> @without-ordinary commented on GitHub (Feb 23, 2026): > I have also encountered the same issue for gpt-oss-120b: `OLLAMA_NUM_PARALLEL=7` works, `OLLAMA_NUM_PARALLEL=8` crashes. I'm encountering this issue as well. The fun part is that it was working fine post install for many hours and only suddenly started later. I think the cause for that was due to smaller models already running when the larger context models were loaded resulting it not triggering this bug, but once all of the vram was available, the issue could occur. So I guess my workaround options are either force load another model first that has a far smaller max context to burn some vram, or decrease parallel by -1, or set `OLLAMA_CONTEXT_LENGTH` to some number slightly less than `n_ctx_train` (eg. 131000 vs 131072).
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#9087