[GH-ISSUE #13630] Regression in Ollama ≥0.13.x: GGML_ASSERT crash with scb10x/typhoon-ocr1.5-3b:latest / Qwen2.5-VL models on CUDA (works on 0.12.x) #34727

Open
opened 2026-04-22 18:32:58 -05:00 by GiteaMirror · 7 comments
Owner

Originally created by @klemonade on GitHub (Jan 6, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/13630

What is the issue?

Description

After upgrading Ollama from 0.12.x to 0.13.x (including latest), GPU inference becomes unstable and frequently crashes with a GGML assertion error when using scb10x/typhoon-ocr1.5-3b:latest (Qwen2.5-VL–based models)

Docker compose

services:
  ollama:
    image: ollama/ollama:latest
    container_name: ollama
    restart: unless-stopped
    labels:
      - traefik.http.services.home-server-ollamatyphoon-eqj74q-15-web.loadbalancer.forwardingTimeouts.responseHeaderTimeout=1h
    ports:
      - "11434:11434"       # Ollama API port
    volumes:
      - ollama-data:/root/.ollama
    environment:
      OLLAMA_API_KEY: "your_api_key_here"  # Optional for self-hosted setups
      OLLAMA_LOAD_TIMEOUT: 1800000 # 30 minutes
      # OLLAMA_GPU_LAYERS: 30

    deploy:
      resources:
        limits:
          cpus: "8.0"
          memory: 16G
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]
    runtime: nvidia

  openwebui:
    image: ghcr.io/open-webui/open-webui:0.6.40
    container_name: openwebui
    restart: unless-stopped
    ports:
      - "8090:8080"
    environment:
      OLLAMA_API_URL: "http://ollama:11434"
      ENABLE_WEBSOCKET_SUPPORT: false
      
    depends_on:
      - ollama

volumes:
  ollama-data:

Tried

  • Change ollama version from latest to 0.13.4 => not working
  • Change openwebui from latest to 0.6.40 (I think this does not relevant)
  • Downgrade GPU Driver from 580 to 535 => not working
  • Delete volume and image and recreate docker with fresh model => not working

Current workaround

  • Downgrade ollama to 0.12.11

Host Machine Detail

Operating System: Ubuntu 22.04.5 LTS              
Kernel: Linux 6.8.0-90-generic
Hardware Model: B450 AORUS PRO WIFI
CPU: AMD Ryzen 5 3600 6-Core Processor
GPU: NVIDIA GeForce RTX 3060 (12G)
GPU Driver Version: 535.274.02
CUDA Version: 12.2
Memory Detail:
     *-bank:0
          description: DIMM DDR4 Synchronous Unbuffered (Unregistered) 2133 MHz (0.5 ns)
          product: CMK16GX4M2A2666C16
     *-bank:1
          description: DIMM DDR4 Synchronous Unbuffered (Unregistered) 2133 MHz (0.5 ns)
          product: KHX2666C16/8G
     *-bank:2
          description: DIMM DDR4 Synchronous Unbuffered (Unregistered) 2133 MHz (0.5 ns)
          product: CMK16GX4M2A2666C16
     *-bank:3
          description: DIMM DDR4 Synchronous Unbuffered (Unregistered) 2133 MHz (0.5 ns)
          product: KHX2666C16/8G
NVIDIA Docker runtime: io.containerd.runc.v2 nvidia runc

Happy to test fixes or provide additional logs if needed. Thank you

Relevant log output

ggml.c:4081: GGML_ASSERT(a->ne[2] * 4 == b->ne[0]) failed
/usr/bin/ollama(+0x1103378)[0x6141b09b6378]
/usr/bin/ollama(+0x1103757)[0x6141b09b6757]
/usr/bin/ollama(+0x11038dd)[0x6141b09b68dd]
/usr/bin/ollama(+0x110b9be)[0x6141b09be9be]
/usr/bin/ollama(+0x10bae51)[0x6141b096de51]
/usr/bin/ollama(+0x37e461)[0x6141afc31461]
SIGABRT: abort
PC=0x70ed7656bb2c m=13 sigcode=18446744073709551610
signal arrived during cgo execution
goroutine 6 gp=0xc0005048c0 m=13 mp=0xc0002f8808 [syscall]:
runtime.cgocall(0x6141b096de00, 0xc000d83138)
runtime/cgocall.go:167 +0x4b fp=0xc000d83110 sp=0xc000d830d8 pc=0x6141afc264cb
github.com/ollama/ollama/ml/backend/ggml._Cfunc_ggml_rope_multi(0x70ecfc003090, 0x70ecfc8f9750, 0x70ecfc8f95e0, 0x0, 0x80, 0xc0007470a0, 0x8, 0x20000, 0x49742400, 0x3f800000, ...)
_cgo_gotypes.go:2067 +0x4b fp=0xc000d83138 sp=0xc000d83110 pc=0x6141b006092b
github.com/ollama/ollama/ml/backend/ggml.(*Tensor).RoPE.func2(...)
github.com/ollama/ollama/ml/backend/ggml/ggml.go:1543
github.com/ollama/ollama/ml/backend/ggml.(*Tensor).RoPE(0xc0002d07c8, {0x6141b11bcff0, 0xc0002f6940}, {0x6141b11c77a0, 0xc0002d07b0}, 0x80, 0x49742400, 0x3f800000, {0xc0005f8e50, 0x1, ...})
github.com/ollama/ollama/ml/backend/ggml/ggml.go:1543 +0x61a fp=0xc000d83290 sp=0xc000d83138 pc=0x6141b00709fa
github.com/ollama/ollama/ml/nn.RoPE({0x6141b11bcff0?, 0xc0002f6940?}, {0x6141b11c77a0?, 0xc0002d07c8?}, {0x6141b11c77a0?, 0xc0002d07b0?}, 0x6141b0070116?, 0xfc003090?, 0x70ec?, {0xc0005f8e50, ...})
github.com/ollama/ollama/ml/nn/rope.go:16 +0x86 fp=0xc000d832f0 sp=0xc000d83290 pc=0x6141b00a3a86
github.com/ollama/ollama/model/models/qwen25vl.TextOptions.applyRotaryPositionEmbeddings({0x800, 0x10, 0x2, 0x80, 0x1f400, 0x358637bd, 0x49742400, 0x3f800000, {0xc0042e1848, 0x3, ...}}, ...)
github.com/ollama/ollama/model/models/qwen25vl/model_text.go:21 +0x172 fp=0xc000d83378 sp=0xc000d832f0 pc=0x6141b012a772
github.com/ollama/ollama/model/models/qwen25vl.(*TextModel).Shift(...)
github.com/ollama/ollama/model/models/qwen25vl/model_text.go:94
github.com/ollama/ollama/model/models/qwen25vl.(*TextModel).Shift-fm({0x6141b11bcff0?, 0xc0002f6940?}, 0xc0002f6940?, {0x6141b11c77a0?, 0xc0002d07c8?}, {0x6141b11c77a0?, 0xc0002d07b0?})
<autogenerated>:1 +0x14f fp=0xc000d83460 sp=0xc000d83378 pc=0x6141b013020f
github.com/ollama/ollama/kvcache.(*Causal).shift(0xc0001f9700, 0x0, 0x4, 0xfffff4c4)
github.com/ollama/ollama/kvcache/causal.go:612 +0x507 fp=0xc000d835c0 sp=0xc000d83460 pc=0x6141b0057707
github.com/ollama/ollama/kvcache.(*Causal).Remove(0xc0001f9700, 0x0, 0x4, 0xb40)
github.com/ollama/ollama/kvcache/causal.go:672 +0x285 fp=0xc000d83658 sp=0xc000d835c0 pc=0x6141b0057ae5
github.com/ollama/ollama/runner/ollamarunner.(*InputCache).ShiftCacheSlot(0xc0043bd140, 0xc0043bd100, 0x4)
github.com/ollama/ollama/runner/ollamarunner/cache.go:290 +0x34c fp=0xc000d837f0 sp=0xc000d83658 pc=0x6141b013ebac
github.com/ollama/ollama/runner/ollamarunner.(*Server).forwardBatch(_, {0x4bd, {0x6141b11bcff0, 0xc0002f6780}, {0x6141b11c77a0, 0xc0002d0750}, {0xc0005f8bf8, 0x1, 0x1}, {{0x6141b11c77a0, ...}, ...}, ...})
github.com/ollama/ollama/runner/ollamarunner/runner.go:565 +0xec5 fp=0xc000d83b58 sp=0xc000d837f0 pc=0x6141b0142345
github.com/ollama/ollama/runner/ollamarunner.(*Server).run(0xc00022b0e0, {0x6141b11b2310, 0xc0003865f0})
github.com/ollama/ollama/runner/ollamarunner/runner.go:452 +0x18c fp=0xc000d83fb8 sp=0xc000d83b58 pc=0x6141b014122c
github.com/ollama/ollama/runner/ollamarunner.Execute.gowrap1()
github.com/ollama/ollama/runner/ollamarunner/runner.go:1418 +0x28 fp=0xc000d83fe0 sp=0xc000d83fb8 pc=0x6141b014a928
runtime.goexit({})
.
.
.
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc0005f9fe8 sp=0xc0005f9fe0 pc=0x5d94503b0a01
created by github.com/ollama/ollama/runner/ollamarunner.(*Server).run in goroutine 23
github.com/ollama/ollama/runner/ollamarunner/runner.go:458 +0x2cd
rax    0x0
rbx    0xa25
rcx    0x70e73559bb2c
rdx    0x6
rdi    0xa1f
rsi    0xa25
rbp    0x70e6d7ffe330
rsp    0x70e6d7ffe2f0
r8     0x0
r9     0x7
r10    0x8
r11    0x246
r12    0x6
r13    0x5d945163dfdc
r14    0x16
r15    0x49742400
rip    0x70e73559bb2c
rflags 0x246
cs     0x33
fs     0x0
gs     0x0
time=2026-01-05T23:37:22.020Z level=ERROR source=server.go:302 msg="llama runner terminated" error="exit status 2"

OS

Docker

GPU

Nvidia

CPU

AMD

Ollama version

0.13.5

Originally created by @klemonade on GitHub (Jan 6, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/13630 ### What is the issue? ## Description After upgrading Ollama from 0.12.x to 0.13.x (including latest), GPU inference becomes unstable and frequently crashes with a GGML assertion error when using scb10x/typhoon-ocr1.5-3b:latest (Qwen2.5-VL–based models) ## Docker compose ``` services: ollama: image: ollama/ollama:latest container_name: ollama restart: unless-stopped labels: - traefik.http.services.home-server-ollamatyphoon-eqj74q-15-web.loadbalancer.forwardingTimeouts.responseHeaderTimeout=1h ports: - "11434:11434" # Ollama API port volumes: - ollama-data:/root/.ollama environment: OLLAMA_API_KEY: "your_api_key_here" # Optional for self-hosted setups OLLAMA_LOAD_TIMEOUT: 1800000 # 30 minutes # OLLAMA_GPU_LAYERS: 30 deploy: resources: limits: cpus: "8.0" memory: 16G reservations: devices: - driver: nvidia count: all capabilities: [gpu] runtime: nvidia openwebui: image: ghcr.io/open-webui/open-webui:0.6.40 container_name: openwebui restart: unless-stopped ports: - "8090:8080" environment: OLLAMA_API_URL: "http://ollama:11434" ENABLE_WEBSOCKET_SUPPORT: false depends_on: - ollama volumes: ollama-data: ``` ## Tried - Change ollama version from latest to 0.13.4 => not working - Change openwebui from latest to 0.6.40 (I think this does not relevant) - Downgrade GPU Driver from `580` to `535` => not working - Delete volume and image and recreate docker with fresh model => not working ## Current workaround - Downgrade ollama to `0.12.11` ## Host Machine Detail ``` Operating System: Ubuntu 22.04.5 LTS Kernel: Linux 6.8.0-90-generic Hardware Model: B450 AORUS PRO WIFI CPU: AMD Ryzen 5 3600 6-Core Processor GPU: NVIDIA GeForce RTX 3060 (12G) GPU Driver Version: 535.274.02 CUDA Version: 12.2 Memory Detail: *-bank:0 description: DIMM DDR4 Synchronous Unbuffered (Unregistered) 2133 MHz (0.5 ns) product: CMK16GX4M2A2666C16 *-bank:1 description: DIMM DDR4 Synchronous Unbuffered (Unregistered) 2133 MHz (0.5 ns) product: KHX2666C16/8G *-bank:2 description: DIMM DDR4 Synchronous Unbuffered (Unregistered) 2133 MHz (0.5 ns) product: CMK16GX4M2A2666C16 *-bank:3 description: DIMM DDR4 Synchronous Unbuffered (Unregistered) 2133 MHz (0.5 ns) product: KHX2666C16/8G NVIDIA Docker runtime: io.containerd.runc.v2 nvidia runc ``` Happy to test fixes or provide additional logs if needed. Thank you ### Relevant log output ```shell ggml.c:4081: GGML_ASSERT(a->ne[2] * 4 == b->ne[0]) failed /usr/bin/ollama(+0x1103378)[0x6141b09b6378] /usr/bin/ollama(+0x1103757)[0x6141b09b6757] /usr/bin/ollama(+0x11038dd)[0x6141b09b68dd] /usr/bin/ollama(+0x110b9be)[0x6141b09be9be] /usr/bin/ollama(+0x10bae51)[0x6141b096de51] /usr/bin/ollama(+0x37e461)[0x6141afc31461] SIGABRT: abort PC=0x70ed7656bb2c m=13 sigcode=18446744073709551610 signal arrived during cgo execution goroutine 6 gp=0xc0005048c0 m=13 mp=0xc0002f8808 [syscall]: runtime.cgocall(0x6141b096de00, 0xc000d83138) runtime/cgocall.go:167 +0x4b fp=0xc000d83110 sp=0xc000d830d8 pc=0x6141afc264cb github.com/ollama/ollama/ml/backend/ggml._Cfunc_ggml_rope_multi(0x70ecfc003090, 0x70ecfc8f9750, 0x70ecfc8f95e0, 0x0, 0x80, 0xc0007470a0, 0x8, 0x20000, 0x49742400, 0x3f800000, ...) _cgo_gotypes.go:2067 +0x4b fp=0xc000d83138 sp=0xc000d83110 pc=0x6141b006092b github.com/ollama/ollama/ml/backend/ggml.(*Tensor).RoPE.func2(...) github.com/ollama/ollama/ml/backend/ggml/ggml.go:1543 github.com/ollama/ollama/ml/backend/ggml.(*Tensor).RoPE(0xc0002d07c8, {0x6141b11bcff0, 0xc0002f6940}, {0x6141b11c77a0, 0xc0002d07b0}, 0x80, 0x49742400, 0x3f800000, {0xc0005f8e50, 0x1, ...}) github.com/ollama/ollama/ml/backend/ggml/ggml.go:1543 +0x61a fp=0xc000d83290 sp=0xc000d83138 pc=0x6141b00709fa github.com/ollama/ollama/ml/nn.RoPE({0x6141b11bcff0?, 0xc0002f6940?}, {0x6141b11c77a0?, 0xc0002d07c8?}, {0x6141b11c77a0?, 0xc0002d07b0?}, 0x6141b0070116?, 0xfc003090?, 0x70ec?, {0xc0005f8e50, ...}) github.com/ollama/ollama/ml/nn/rope.go:16 +0x86 fp=0xc000d832f0 sp=0xc000d83290 pc=0x6141b00a3a86 github.com/ollama/ollama/model/models/qwen25vl.TextOptions.applyRotaryPositionEmbeddings({0x800, 0x10, 0x2, 0x80, 0x1f400, 0x358637bd, 0x49742400, 0x3f800000, {0xc0042e1848, 0x3, ...}}, ...) github.com/ollama/ollama/model/models/qwen25vl/model_text.go:21 +0x172 fp=0xc000d83378 sp=0xc000d832f0 pc=0x6141b012a772 github.com/ollama/ollama/model/models/qwen25vl.(*TextModel).Shift(...) github.com/ollama/ollama/model/models/qwen25vl/model_text.go:94 github.com/ollama/ollama/model/models/qwen25vl.(*TextModel).Shift-fm({0x6141b11bcff0?, 0xc0002f6940?}, 0xc0002f6940?, {0x6141b11c77a0?, 0xc0002d07c8?}, {0x6141b11c77a0?, 0xc0002d07b0?}) <autogenerated>:1 +0x14f fp=0xc000d83460 sp=0xc000d83378 pc=0x6141b013020f github.com/ollama/ollama/kvcache.(*Causal).shift(0xc0001f9700, 0x0, 0x4, 0xfffff4c4) github.com/ollama/ollama/kvcache/causal.go:612 +0x507 fp=0xc000d835c0 sp=0xc000d83460 pc=0x6141b0057707 github.com/ollama/ollama/kvcache.(*Causal).Remove(0xc0001f9700, 0x0, 0x4, 0xb40) github.com/ollama/ollama/kvcache/causal.go:672 +0x285 fp=0xc000d83658 sp=0xc000d835c0 pc=0x6141b0057ae5 github.com/ollama/ollama/runner/ollamarunner.(*InputCache).ShiftCacheSlot(0xc0043bd140, 0xc0043bd100, 0x4) github.com/ollama/ollama/runner/ollamarunner/cache.go:290 +0x34c fp=0xc000d837f0 sp=0xc000d83658 pc=0x6141b013ebac github.com/ollama/ollama/runner/ollamarunner.(*Server).forwardBatch(_, {0x4bd, {0x6141b11bcff0, 0xc0002f6780}, {0x6141b11c77a0, 0xc0002d0750}, {0xc0005f8bf8, 0x1, 0x1}, {{0x6141b11c77a0, ...}, ...}, ...}) github.com/ollama/ollama/runner/ollamarunner/runner.go:565 +0xec5 fp=0xc000d83b58 sp=0xc000d837f0 pc=0x6141b0142345 github.com/ollama/ollama/runner/ollamarunner.(*Server).run(0xc00022b0e0, {0x6141b11b2310, 0xc0003865f0}) github.com/ollama/ollama/runner/ollamarunner/runner.go:452 +0x18c fp=0xc000d83fb8 sp=0xc000d83b58 pc=0x6141b014122c github.com/ollama/ollama/runner/ollamarunner.Execute.gowrap1() github.com/ollama/ollama/runner/ollamarunner/runner.go:1418 +0x28 fp=0xc000d83fe0 sp=0xc000d83fb8 pc=0x6141b014a928 runtime.goexit({}) . . . runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0005f9fe8 sp=0xc0005f9fe0 pc=0x5d94503b0a01 created by github.com/ollama/ollama/runner/ollamarunner.(*Server).run in goroutine 23 github.com/ollama/ollama/runner/ollamarunner/runner.go:458 +0x2cd rax 0x0 rbx 0xa25 rcx 0x70e73559bb2c rdx 0x6 rdi 0xa1f rsi 0xa25 rbp 0x70e6d7ffe330 rsp 0x70e6d7ffe2f0 r8 0x0 r9 0x7 r10 0x8 r11 0x246 r12 0x6 r13 0x5d945163dfdc r14 0x16 r15 0x49742400 rip 0x70e73559bb2c rflags 0x246 cs 0x33 fs 0x0 gs 0x0 time=2026-01-05T23:37:22.020Z level=ERROR source=server.go:302 msg="llama runner terminated" error="exit status 2" ``` ### OS Docker ### GPU Nvidia ### CPU AMD ### Ollama version 0.13.5
GiteaMirror added the bug label 2026-04-22 18:32:58 -05:00
Author
Owner

@rick-github commented on GitHub (Jan 6, 2026):

Post the full server log.

<!-- gh-comment-id:3714409483 --> @rick-github commented on GitHub (Jan 6, 2026): Post the full server log.
Author
Owner

@klemonade commented on GitHub (Jan 6, 2026):

Here is full log I have reproduced.

2026-01-06T13:55:54.804Z Couldn't find '/root/.ollama/id_ed25519'. Generating new private key.
2026-01-06T13:55:54.806Z Your new public key is:
2026-01-06T13:55:54.806Z ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIM+Cw6oyZnUE6gThwPnQyAoRmHq995u4zW+EfZDOM1cM
2026-01-06T13:55:54.806Z time=2026-01-06T13:55:54.806Z level=INFO source=routes.go:1554 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:INFO OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:500h0m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false OLLAMA_VULKAN:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
2026-01-06T13:55:54.806Z time=2026-01-06T13:55:54.806Z level=INFO source=images.go:493 msg="total blobs: 0"
2026-01-06T13:55:54.807Z time=2026-01-06T13:55:54.807Z level=INFO source=images.go:500 msg="total unused blobs removed: 0"
2026-01-06T13:55:54.807Z time=2026-01-06T13:55:54.807Z level=INFO source=routes.go:1607 msg="Listening on [::]:11434 (version 0.13.5)"
2026-01-06T13:55:54.807Z time=2026-01-06T13:55:54.807Z level=INFO source=runner.go:67 msg="discovering available GPUs..."
2026-01-06T13:55:54.808Z time=2026-01-06T13:55:54.808Z level=INFO source=server.go:429 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 35083"
2026-01-06T13:55:54.928Z time=2026-01-06T13:55:54.928Z level=INFO source=server.go:429 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 41097"
2026-01-06T13:55:55.021Z time=2026-01-06T13:55:55.020Z level=INFO source=runner.go:106 msg="experimental Vulkan support disabled.  To enable, set OLLAMA_VULKAN=1"
2026-01-06T13:55:55.021Z time=2026-01-06T13:55:55.021Z level=INFO source=server.go:429 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 33135"
2026-01-06T13:55:55.159Z time=2026-01-06T13:55:55.159Z level=INFO source=types.go:42 msg="inference compute" id=GPU-2324d0b0-9a1d-8bb1-6241-75acd1170ede filter_id="" library=CUDA compute=8.6 name=CUDA0 description="NVIDIA GeForce RTX 3060" libdirs=ollama,cuda_v12 driver=12.2 pci_id=0000:07:00.0 type=discrete total="12.0 GiB" available="11.4 GiB"
2026-01-06T13:55:55.159Z time=2026-01-06T13:55:55.159Z level=INFO source=routes.go:1648 msg="entering low vram mode" "total vram"="12.0 GiB" threshold="20.0 GiB"
2026-01-06T13:57:33.748Z [GIN] 2026/01/06 - 13:57:33 | 200 |      45.586µs |       127.0.0.1 | HEAD     "/"
2026-01-06T13:57:35.620Z time=2026-01-06T13:57:35.620Z level=INFO source=download.go:177 msg="downloading df8b6415ce11 in 16 200 MB part(s)"
2026-01-06T13:58:07.620Z time=2026-01-06T13:58:07.619Z level=INFO source=download.go:177 msg="downloading a242d8dfdc8f in 1 487 B part(s)"
2026-01-06T13:58:09.208Z time=2026-01-06T13:58:09.208Z level=INFO source=download.go:177 msg="downloading 75357d685f23 in 1 28 B part(s)"
2026-01-06T13:58:11.500Z time=2026-01-06T13:58:11.500Z level=INFO source=download.go:177 msg="downloading 832dd9e00a68 in 1 11 KB part(s)"
2026-01-06T13:58:13.090Z time=2026-01-06T13:58:13.090Z level=INFO source=download.go:177 msg="downloading 401a79d3fd09 in 1 41 B part(s)"
2026-01-06T13:58:14.764Z time=2026-01-06T13:58:14.764Z level=INFO source=download.go:177 msg="downloading 9e7b6c15f976 in 1 567 B part(s)"
2026-01-06T13:58:18.121Z [GIN] 2026/01/06 - 13:58:18 | 200 | 44.372395303s |       127.0.0.1 | POST     "/api/pull"
2026-01-06T13:59:29.038Z [GIN] 2026/01/06 - 13:59:29 | 200 |      52.729µs |      10.0.1.116 | GET      "/api/version"
2026-01-06T13:59:30.259Z [GIN] 2026/01/06 - 13:59:30 | 200 |     603.037µs |      10.0.1.116 | GET      "/api/tags"
2026-01-06T13:59:30.260Z [GIN] 2026/01/06 - 13:59:30 | 200 |      72.116µs |      10.0.1.116 | GET      "/api/ps"
2026-01-06T13:59:33.027Z [GIN] 2026/01/06 - 13:59:33 | 200 |     461.259µs |      10.0.1.116 | GET      "/api/tags"
2026-01-06T13:59:33.029Z [GIN] 2026/01/06 - 13:59:33 | 200 |      22.692µs |      10.0.1.116 | GET      "/api/ps"
2026-01-06T13:59:37.455Z [GIN] 2026/01/06 - 13:59:37 | 200 |      498.91µs |      10.0.1.116 | GET      "/api/tags"
2026-01-06T13:59:37.457Z [GIN] 2026/01/06 - 13:59:37 | 200 |      24.917µs |      10.0.1.116 | GET      "/api/ps"
2026-01-06T14:00:32.583Z time=2026-01-06T14:00:32.583Z level=INFO source=routes.go:1554 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:INFO OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:500h0m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false OLLAMA_VULKAN:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
2026-01-06T14:00:32.583Z time=2026-01-06T14:00:32.583Z level=INFO source=images.go:493 msg="total blobs: 6"
2026-01-06T14:00:32.583Z time=2026-01-06T14:00:32.583Z level=INFO source=images.go:500 msg="total unused blobs removed: 0"
2026-01-06T14:00:32.584Z time=2026-01-06T14:00:32.584Z level=INFO source=routes.go:1607 msg="Listening on [::]:11434 (version 0.13.5)"
2026-01-06T14:00:32.584Z time=2026-01-06T14:00:32.584Z level=INFO source=runner.go:67 msg="discovering available GPUs..."
2026-01-06T14:00:32.585Z time=2026-01-06T14:00:32.585Z level=INFO source=server.go:429 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 40247"
2026-01-06T14:00:32.714Z time=2026-01-06T14:00:32.714Z level=INFO source=server.go:429 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 45287"
2026-01-06T14:00:32.814Z time=2026-01-06T14:00:32.814Z level=INFO source=runner.go:106 msg="experimental Vulkan support disabled.  To enable, set OLLAMA_VULKAN=1"
2026-01-06T14:00:32.814Z time=2026-01-06T14:00:32.814Z level=INFO source=server.go:429 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 44955"
2026-01-06T14:00:32.977Z time=2026-01-06T14:00:32.977Z level=INFO source=types.go:42 msg="inference compute" id=GPU-2324d0b0-9a1d-8bb1-6241-75acd1170ede filter_id="" library=CUDA compute=8.6 name=CUDA0 description="NVIDIA GeForce RTX 3060" libdirs=ollama,cuda_v12 driver=12.2 pci_id=0000:07:00.0 type=discrete total="12.0 GiB" available="11.4 GiB"
2026-01-06T14:00:32.977Z time=2026-01-06T14:00:32.977Z level=INFO source=routes.go:1648 msg="entering low vram mode" "total vram"="12.0 GiB" threshold="20.0 GiB"
2026-01-06T14:15:14.773Z [GIN] 2026/01/06 - 14:15:14 | 200 |       58.99µs |   161.35.58.159 | GET      "/"
2026-01-06T14:15:46.587Z [GIN] 2026/01/06 - 14:15:46 | 200 |     116.718µs |   161.35.58.159 | GET      "/api/ps"
2026-01-06T14:15:47.190Z [GIN] 2026/01/06 - 14:15:47 | 200 |     487.411µs |   161.35.58.159 | GET      "/v1/models"
2026-01-06T14:15:47.815Z [GIN] 2026/01/06 - 14:15:47 | 200 |   81.399516ms |   161.35.58.159 | POST     "/api/show"
2026-01-06T14:41:22.726Z [GIN] 2026/01/06 - 14:41:22 | 200 |     423.094µs |      10.0.1.184 | GET      "/api/tags"
2026-01-06T14:43:03.394Z time=2026-01-06T14:43:03.394Z level=INFO source=download.go:177 msg="downloading b36530292268 in 16 156 MB part(s)"
2026-01-06T14:43:26.906Z time=2026-01-06T14:43:26.906Z level=INFO source=download.go:177 msg="downloading 636353bf6b2f in 1 1.4 KB part(s)"
2026-01-06T14:43:28.598Z time=2026-01-06T14:43:28.598Z level=INFO source=download.go:177 msg="downloading d18a5cc71b84 in 1 11 KB part(s)"
2026-01-06T14:43:30.301Z time=2026-01-06T14:43:30.301Z level=INFO source=download.go:177 msg="downloading 25b023c48a6b in 1 111 B part(s)"
2026-01-06T14:43:34.370Z time=2026-01-06T14:43:34.370Z level=INFO source=download.go:177 msg="downloading 9d085367cf15 in 1 487 B part(s)"
2026-01-06T14:43:37.713Z [GIN] 2026/01/06 - 14:43:37 | 200 | 35.904312205s |      10.0.1.184 | POST     "/api/pull"
2026-01-06T14:43:37.748Z [GIN] 2026/01/06 - 14:43:37 | 200 |     654.441µs |      10.0.1.184 | GET      "/api/tags"
2026-01-06T14:43:37.750Z [GIN] 2026/01/06 - 14:43:37 | 200 |      20.609µs |      10.0.1.184 | GET      "/api/ps"
2026-01-06T14:44:30.040Z [GIN] 2026/01/06 - 14:44:30 | 200 |     607.752µs |      10.0.1.184 | GET      "/api/tags"
2026-01-06T14:44:30.042Z [GIN] 2026/01/06 - 14:44:30 | 200 |      22.673µs |      10.0.1.184 | GET      "/api/ps"
2026-01-06T14:44:32.370Z time=2026-01-06T14:44:32.370Z level=INFO source=server.go:429 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 42529"
2026-01-06T14:44:32.581Z time=2026-01-06T14:44:32.580Z level=INFO source=server.go:245 msg="enabling flash attention"
2026-01-06T14:44:32.581Z time=2026-01-06T14:44:32.581Z level=INFO source=server.go:429 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --model /root/.ollama/models/blobs/sha256-b365302922688f3a3c9ac8e3c00ab97a152cac0cdbf4eb5a734ecb483ae3e511 --port 39805"
2026-01-06T14:44:32.581Z time=2026-01-06T14:44:32.581Z level=INFO source=sched.go:443 msg="system memory" total="16.0 GiB" free="13.5 GiB" free_swap="18.0 GiB"
2026-01-06T14:44:32.581Z time=2026-01-06T14:44:32.581Z level=INFO source=sched.go:450 msg="gpu memory" id=GPU-2324d0b0-9a1d-8bb1-6241-75acd1170ede library=CUDA available="11.1 GiB" free="11.6 GiB" minimum="457.0 MiB" overhead="0 B"
2026-01-06T14:44:32.581Z time=2026-01-06T14:44:32.581Z level=INFO source=server.go:746 msg="loading model" "model layers"=37 requested=-1
2026-01-06T14:44:32.594Z time=2026-01-06T14:44:32.594Z level=INFO source=runner.go:1405 msg="starting ollama engine"
2026-01-06T14:44:32.594Z time=2026-01-06T14:44:32.594Z level=INFO source=runner.go:1440 msg="Server listening on 127.0.0.1:39805"
2026-01-06T14:44:32.604Z time=2026-01-06T14:44:32.604Z level=INFO source=runner.go:1278 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:4096 KvCacheType: NumThreads:8 GPULayers:37[ID:GPU-2324d0b0-9a1d-8bb1-6241-75acd1170ede Layers:37(0..36)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
2026-01-06T14:44:32.637Z time=2026-01-06T14:44:32.637Z level=INFO source=ggml.go:136 msg="" architecture=qwen3 file_type=Q4_K_M name=scb10x/typhoon2.5-qwen3-4b-preview description="" num_tensors=398 num_key_values=28
2026-01-06T14:44:32.642Z load_backend: loaded CPU backend from /usr/lib/ollama/libggml-cpu-haswell.so
2026-01-06T14:44:32.713Z ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
2026-01-06T14:44:32.713Z ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
2026-01-06T14:44:32.713Z ggml_cuda_init: found 1 CUDA devices:
2026-01-06T14:44:32.713Z Device 0: NVIDIA GeForce RTX 3060, compute capability 8.6, VMM: yes, ID: GPU-2324d0b0-9a1d-8bb1-6241-75acd1170ede
2026-01-06T14:44:32.713Z load_backend: loaded CUDA backend from /usr/lib/ollama/cuda_v12/libggml-cuda.so
2026-01-06T14:44:32.713Z time=2026-01-06T14:44:32.713Z level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=500,520,600,610,700,750,800,860,890,900,1200 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc)
2026-01-06T14:44:32.849Z time=2026-01-06T14:44:32.848Z level=INFO source=runner.go:1278 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:4096 KvCacheType: NumThreads:8 GPULayers:37[ID:GPU-2324d0b0-9a1d-8bb1-6241-75acd1170ede Layers:37(0..36)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
2026-01-06T14:44:32.967Z time=2026-01-06T14:44:32.966Z level=INFO source=runner.go:1278 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:4096 KvCacheType: NumThreads:8 GPULayers:37[ID:GPU-2324d0b0-9a1d-8bb1-6241-75acd1170ede Layers:37(0..36)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
2026-01-06T14:44:32.967Z time=2026-01-06T14:44:32.966Z level=INFO source=ggml.go:482 msg="offloading 36 repeating layers to GPU"
2026-01-06T14:44:32.967Z time=2026-01-06T14:44:32.967Z level=INFO source=ggml.go:489 msg="offloading output layer to GPU"
2026-01-06T14:44:32.967Z time=2026-01-06T14:44:32.967Z level=INFO source=ggml.go:494 msg="offloaded 37/37 layers to GPU"
2026-01-06T14:44:32.967Z time=2026-01-06T14:44:32.967Z level=INFO source=device.go:240 msg="model weights" device=CUDA0 size="2.3 GiB"
2026-01-06T14:44:32.967Z time=2026-01-06T14:44:32.967Z level=INFO source=device.go:245 msg="model weights" device=CPU size="304.3 MiB"
2026-01-06T14:44:32.967Z time=2026-01-06T14:44:32.967Z level=INFO source=device.go:251 msg="kv cache" device=CUDA0 size="576.0 MiB"
2026-01-06T14:44:32.967Z time=2026-01-06T14:44:32.967Z level=INFO source=device.go:262 msg="compute graph" device=CUDA0 size="145.0 MiB"
2026-01-06T14:44:32.967Z time=2026-01-06T14:44:32.967Z level=INFO source=device.go:267 msg="compute graph" device=CPU size="5.0 MiB"
2026-01-06T14:44:32.967Z time=2026-01-06T14:44:32.967Z level=INFO source=device.go:272 msg="total memory" size="3.3 GiB"
2026-01-06T14:44:32.967Z time=2026-01-06T14:44:32.967Z level=INFO source=sched.go:517 msg="loaded runners" count=1
2026-01-06T14:44:32.967Z time=2026-01-06T14:44:32.967Z level=INFO source=server.go:1338 msg="waiting for llama runner to start responding"
2026-01-06T14:44:32.981Z time=2026-01-06T14:44:32.981Z level=INFO source=server.go:1372 msg="waiting for server to become available" status="llm server loading model"
2026-01-06T14:44:33.482Z time=2026-01-06T14:44:33.482Z level=INFO source=server.go:1376 msg="llama runner started in 0.90 seconds"
2026-01-06T14:44:33.666Z [GIN] 2026/01/06 - 14:44:33 | 200 |  1.433989555s |      10.0.1.184 | POST     "/api/chat"
2026-01-06T14:44:34.359Z [GIN] 2026/01/06 - 14:44:34 | 200 |  679.295647ms |      10.0.1.184 | POST     "/api/chat"
2026-01-06T14:44:34.713Z [GIN] 2026/01/06 - 14:44:34 | 200 |  351.121022ms |      10.0.1.184 | POST     "/api/chat"
2026-01-06T14:44:35.016Z [GIN] 2026/01/06 - 14:44:35 | 200 |   288.62871ms |      10.0.1.184 | POST     "/api/chat"
2026-01-06T14:45:32.440Z ggml_backend_cuda_device_get_memory device GPU-2324d0b0-9a1d-8bb1-6241-75acd1170ede utilizing NVML memory reporting free: 8989769728 total: 12884901888
2026-01-06T14:45:32.468Z time=2026-01-06T14:45:32.468Z level=INFO source=sched.go:583 msg="updated VRAM based on existing loaded models" gpu=GPU-2324d0b0-9a1d-8bb1-6241-75acd1170ede library=CUDA total="12.0 GiB" available="8.4 GiB"
2026-01-06T14:45:32.549Z time=2026-01-06T14:45:32.548Z level=INFO source=server.go:429 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --model /root/.ollama/models/blobs/sha256-df8b6415ce11eeaa85d11f8c4288c489aa3818354d9691d71523bcdffb5f2fa8 --port 34109"
2026-01-06T14:45:32.549Z time=2026-01-06T14:45:32.549Z level=INFO source=sched.go:443 msg="system memory" total="16.0 GiB" free="13.0 GiB" free_swap="18.0 GiB"
2026-01-06T14:45:32.549Z time=2026-01-06T14:45:32.549Z level=INFO source=sched.go:450 msg="gpu memory" id=GPU-2324d0b0-9a1d-8bb1-6241-75acd1170ede library=CUDA available="7.9 GiB" free="8.4 GiB" minimum="457.0 MiB" overhead="0 B"
2026-01-06T14:45:32.549Z time=2026-01-06T14:45:32.549Z level=INFO source=server.go:746 msg="loading model" "model layers"=37 requested=-1
2026-01-06T14:45:32.563Z time=2026-01-06T14:45:32.563Z level=INFO source=runner.go:1405 msg="starting ollama engine"
2026-01-06T14:45:32.563Z time=2026-01-06T14:45:32.563Z level=INFO source=runner.go:1440 msg="Server listening on 127.0.0.1:34109"
2026-01-06T14:45:32.571Z time=2026-01-06T14:45:32.571Z level=INFO source=runner.go:1278 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Disabled KvSize:4096 KvCacheType: NumThreads:8 GPULayers:37[ID:GPU-2324d0b0-9a1d-8bb1-6241-75acd1170ede Layers:37(0..36)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
2026-01-06T14:45:32.609Z time=2026-01-06T14:45:32.609Z level=INFO source=ggml.go:136 msg="" architecture=qwen25vl file_type=Q4_K_M name="" description="" num_tensors=953 num_key_values=36
2026-01-06T14:45:32.615Z load_backend: loaded CPU backend from /usr/lib/ollama/libggml-cpu-haswell.so
2026-01-06T14:45:32.675Z ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
2026-01-06T14:45:32.676Z ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
2026-01-06T14:45:32.676Z ggml_cuda_init: found 1 CUDA devices:
2026-01-06T14:45:32.676Z Device 0: NVIDIA GeForce RTX 3060, compute capability 8.6, VMM: yes, ID: GPU-2324d0b0-9a1d-8bb1-6241-75acd1170ede
2026-01-06T14:45:32.676Z load_backend: loaded CUDA backend from /usr/lib/ollama/cuda_v12/libggml-cuda.so
2026-01-06T14:45:32.676Z time=2026-01-06T14:45:32.676Z level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=500,520,600,610,700,750,800,860,890,900,1200 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc)
2026-01-06T14:45:33.856Z time=2026-01-06T14:45:33.856Z level=INFO source=server.go:1018 msg="model requires more gpu memory than is currently available, evicting a model to make space" "loaded layers"=9
2026-01-06T14:45:33.856Z time=2026-01-06T14:45:33.856Z level=INFO source=runner.go:1278 msg=load request="{Operation:close LoraPath:[] Parallel:0 BatchSize:0 FlashAttention:Disabled KvSize:0 KvCacheType: NumThreads:0 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
2026-01-06T14:45:33.856Z time=2026-01-06T14:45:33.856Z level=INFO source=device.go:240 msg="model weights" device=CUDA0 size="3.0 GiB"
2026-01-06T14:45:33.856Z time=2026-01-06T14:45:33.856Z level=INFO source=device.go:245 msg="model weights" device=CPU size="243.4 MiB"
2026-01-06T14:45:33.856Z time=2026-01-06T14:45:33.856Z level=INFO source=device.go:251 msg="kv cache" device=CUDA0 size="144.0 MiB"
2026-01-06T14:45:33.856Z time=2026-01-06T14:45:33.856Z level=INFO source=device.go:262 msg="compute graph" device=CUDA0 size="7.5 GiB"
2026-01-06T14:45:33.856Z time=2026-01-06T14:45:33.856Z level=INFO source=device.go:267 msg="compute graph" device=CPU size="20.3 MiB"
2026-01-06T14:45:33.856Z time=2026-01-06T14:45:33.856Z level=INFO source=device.go:272 msg="total memory" size="10.8 GiB"
2026-01-06T14:45:33.879Z ggml_backend_cuda_device_get_memory device GPU-2324d0b0-9a1d-8bb1-6241-75acd1170ede utilizing NVML memory reporting free: 8851292160 total: 12884901888
2026-01-06T14:45:34.133Z time=2026-01-06T14:45:34.133Z level=INFO source=server.go:429 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 34319"
2026-01-06T14:45:34.237Z time=2026-01-06T14:45:34.237Z level=INFO source=server.go:429 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 40637"
2026-01-06T14:45:34.423Z time=2026-01-06T14:45:34.423Z level=INFO source=sched.go:443 msg="system memory" total="16.0 GiB" free="12.3 GiB" free_swap="18.0 GiB"
2026-01-06T14:45:34.423Z time=2026-01-06T14:45:34.423Z level=INFO source=sched.go:450 msg="gpu memory" id=GPU-2324d0b0-9a1d-8bb1-6241-75acd1170ede library=CUDA available="11.0 GiB" free="11.4 GiB" minimum="457.0 MiB" overhead="0 B"
2026-01-06T14:45:34.423Z time=2026-01-06T14:45:34.423Z level=INFO source=server.go:746 msg="loading model" "model layers"=37 requested=-1
2026-01-06T14:45:34.424Z time=2026-01-06T14:45:34.424Z level=INFO source=runner.go:1278 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Disabled KvSize:4096 KvCacheType: NumThreads:8 GPULayers:37[ID:GPU-2324d0b0-9a1d-8bb1-6241-75acd1170ede Layers:37(0..36)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
2026-01-06T14:45:35.022Z time=2026-01-06T14:45:35.022Z level=INFO source=runner.go:1278 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Disabled KvSize:4096 KvCacheType: NumThreads:8 GPULayers:37[ID:GPU-2324d0b0-9a1d-8bb1-6241-75acd1170ede Layers:37(0..36)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
2026-01-06T14:45:35.929Z time=2026-01-06T14:45:35.929Z level=INFO source=runner.go:1278 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Disabled KvSize:4096 KvCacheType: NumThreads:8 GPULayers:37[ID:GPU-2324d0b0-9a1d-8bb1-6241-75acd1170ede Layers:37(0..36)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
2026-01-06T14:45:35.929Z time=2026-01-06T14:45:35.929Z level=INFO source=ggml.go:482 msg="offloading 36 repeating layers to GPU"
2026-01-06T14:45:35.929Z time=2026-01-06T14:45:35.929Z level=INFO source=ggml.go:489 msg="offloading output layer to GPU"
2026-01-06T14:45:35.929Z time=2026-01-06T14:45:35.929Z level=INFO source=ggml.go:494 msg="offloaded 37/37 layers to GPU"
2026-01-06T14:45:35.930Z time=2026-01-06T14:45:35.929Z level=INFO source=device.go:240 msg="model weights" device=CUDA0 size="3.0 GiB"
2026-01-06T14:45:35.930Z time=2026-01-06T14:45:35.929Z level=INFO source=device.go:245 msg="model weights" device=CPU size="243.4 MiB"
2026-01-06T14:45:35.930Z time=2026-01-06T14:45:35.929Z level=INFO source=device.go:251 msg="kv cache" device=CUDA0 size="144.0 MiB"
2026-01-06T14:45:35.930Z time=2026-01-06T14:45:35.929Z level=INFO source=device.go:262 msg="compute graph" device=CUDA0 size="7.5 GiB"
2026-01-06T14:45:35.930Z time=2026-01-06T14:45:35.929Z level=INFO source=device.go:267 msg="compute graph" device=CPU size="20.3 MiB"
2026-01-06T14:45:35.930Z time=2026-01-06T14:45:35.929Z level=INFO source=device.go:272 msg="total memory" size="10.8 GiB"
2026-01-06T14:45:35.930Z time=2026-01-06T14:45:35.929Z level=INFO source=sched.go:517 msg="loaded runners" count=1
2026-01-06T14:45:35.930Z time=2026-01-06T14:45:35.929Z level=INFO source=server.go:1338 msg="waiting for llama runner to start responding"
2026-01-06T14:45:35.930Z time=2026-01-06T14:45:35.930Z level=INFO source=server.go:1372 msg="waiting for server to become available" status="llm server loading model"
2026-01-06T14:45:36.683Z time=2026-01-06T14:45:36.683Z level=INFO source=server.go:1376 msg="llama runner started in 4.13 seconds"
2026-01-06T14:46:00.369Z ggml.c:4081: GGML_ASSERT(a->ne[2] * 4 == b->ne[0]) failed
2026-01-06T14:46:00.400Z /usr/bin/ollama(+0x110c8d8)[0x57b24d04d8d8]
2026-01-06T14:46:00.400Z /usr/bin/ollama(+0x110ccb7)[0x57b24d04dcb7]
2026-01-06T14:46:00.400Z /usr/bin/ollama(+0x110ce3d)[0x57b24d04de3d]
2026-01-06T14:46:00.400Z /usr/bin/ollama(+0x1114f1e)[0x57b24d055f1e]
2026-01-06T14:46:00.400Z /usr/bin/ollama(+0x10c4011)[0x57b24d005011]
2026-01-06T14:46:00.400Z /usr/bin/ollama(+0x37e681)[0x57b24c2bf681]
2026-01-06T14:46:00.428Z SIGABRT: abort
2026-01-06T14:46:00.428Z PC=0x759a3b1c3b2c m=7 sigcode=18446744073709551610
2026-01-06T14:46:00.428Z signal arrived during cgo execution
2026-01-06T14:46:00.428Z goroutine 8 gp=0xc000582700 m=7 mp=0xc000580008 [syscall]:
2026-01-06T14:46:00.428Z runtime.cgocall(0x57b24d004fc0, 0xc000b03138)
2026-01-06T14:46:00.428Z runtime/cgocall.go:167 +0x4b fp=0xc000b03110 sp=0xc000b030d8 pc=0x57b24c2b46eb
2026-01-06T14:46:00.428Z github.com/ollama/ollama/ml/backend/ggml._Cfunc_ggml_rope_multi(0x7599d85fa540, 0x7599d88f5e10, 0x7599d88f5ca0, 0x0, 0x80, 0xc0174882d0, 0x8, 0x20000, 0x49742400, 0x3f800000, ...)
2026-01-06T14:46:00.428Z _cgo_gotypes.go:2066 +0x4b fp=0xc000b03138 sp=0xc000b03110 pc=0x57b24c6ee8ab
2026-01-06T14:46:00.428Z github.com/ollama/ollama/ml/backend/ggml.(*Tensor).RoPE.func2(...)
2026-01-06T14:46:00.428Z github.com/ollama/ollama/ml/backend/ggml/ggml.go:1543
2026-01-06T14:46:00.428Z github.com/ollama/ollama/ml/backend/ggml.(*Tensor).RoPE(0xc0006622a0, {0x57b24d878250, 0xc0002e2e40}, {0x57b24d882b20, 0xc000662288}, 0x80, 0x49742400, 0x3f800000, {0xc00100f530, 0x1, ...})
2026-01-06T14:46:00.428Z github.com/ollama/ollama/ml/backend/ggml/ggml.go:1543 +0x61a fp=0xc000b03290 sp=0xc000b03138 pc=0x57b24c6fe97a
2026-01-06T14:46:00.428Z github.com/ollama/ollama/ml/nn.RoPE({0x57b24d878250?, 0xc0002e2e40?}, {0x57b24d882b20?, 0xc0006622a0?}, {0x57b24d882b20?, 0xc000662288?}, 0x57b24c6fe096?, 0xd85fa540?, 0x7599?, {0xc00100f530, ...})
2026-01-06T14:46:00.428Z github.com/ollama/ollama/ml/nn/rope.go:16 +0x86 fp=0xc000b032f0 sp=0xc000b03290 pc=0x57b24c7317c6
2026-01-06T14:46:00.428Z github.com/ollama/ollama/model/models/qwen25vl.TextOptions.applyRotaryPositionEmbeddings({0x800, 0x10, 0x2, 0x80, 0x1f400, 0x358637bd, 0x49742400, 0x3f800000, {0xc003e67860, 0x3, ...}}, ...)
2026-01-06T14:46:00.428Z github.com/ollama/ollama/model/models/qwen25vl/model_text.go:21 +0x172 fp=0xc000b03378 sp=0xc000b032f0 pc=0x57b24c7b84b2
2026-01-06T14:46:00.428Z github.com/ollama/ollama/model/models/qwen25vl.(*TextModel).Shift(...)
2026-01-06T14:46:00.428Z github.com/ollama/ollama/model/models/qwen25vl/model_text.go:94
2026-01-06T14:46:00.428Z github.com/ollama/ollama/model/models/qwen25vl.(*TextModel).Shift-fm({0x57b24d878250?, 0xc0002e2e40?}, 0xc0002e2e40?, {0x57b24d882b20?, 0xc0006622a0?}, {0x57b24d882b20?, 0xc000662288?})
2026-01-06T14:46:00.428Z <autogenerated>:1 +0x14f fp=0xc000b03460 sp=0xc000b03378 pc=0x57b24c7bdf4f
2026-01-06T14:46:00.428Z github.com/ollama/ollama/kvcache.(*Causal).shift(0xc0001f0800, 0x0, 0x4, 0xfffff4c4)
2026-01-06T14:46:00.428Z github.com/ollama/ollama/kvcache/causal.go:599 +0x507 fp=0xc000b035c0 sp=0xc000b03460 pc=0x57b24c6e56c7
2026-01-06T14:46:00.428Z github.com/ollama/ollama/kvcache.(*Causal).Remove(0xc0001f0800, 0x0, 0x4, 0xb40)
2026-01-06T14:46:00.428Z github.com/ollama/ollama/kvcache/causal.go:659 +0x285 fp=0xc000b03658 sp=0xc000b035c0 pc=0x57b24c6e5aa5
2026-01-06T14:46:00.428Z github.com/ollama/ollama/runner/ollamarunner.(*InputCache).ShiftCacheSlot(0xc0043ad140, 0xc0043ad100, 0x4)
2026-01-06T14:46:00.428Z github.com/ollama/ollama/runner/ollamarunner/cache.go:290 +0x34c fp=0xc000b037f0 sp=0xc000b03658 pc=0x57b24c7cc8ec
2026-01-06T14:46:00.428Z github.com/ollama/ollama/runner/ollamarunner.(*Server).forwardBatch(_, {0x4bd, {0x57b24d878250, 0xc0002e2c80}, {0x57b24d882b20, 0xc000662228}, {0xc00100f2d8, 0x1, 0x1}, {{0x57b24d882b20, ...}, ...}, ...})
2026-01-06T14:46:00.429Z github.com/ollama/ollama/runner/ollamarunner/runner.go:565 +0xec5 fp=0xc000b03b58 sp=0xc000b037f0 pc=0x57b24c7d0085
2026-01-06T14:46:00.429Z github.com/ollama/ollama/runner/ollamarunner.(*Server).run(0xc000226f00, {0x57b24d86d410, 0xc000533540})
2026-01-06T14:46:00.429Z github.com/ollama/ollama/runner/ollamarunner/runner.go:452 +0x18c fp=0xc000b03fb8 sp=0xc000b03b58 pc=0x57b24c7cef6c
2026-01-06T14:46:00.429Z github.com/ollama/ollama/runner/ollamarunner.Execute.gowrap1()
2026-01-06T14:46:00.429Z github.com/ollama/ollama/runner/ollamarunner/runner.go:1418 +0x28 fp=0xc000b03fe0 sp=0xc000b03fb8 pc=0x57b24c7d8668
2026-01-06T14:46:00.429Z runtime.goexit({})
2026-01-06T14:46:00.429Z runtime/asm_amd64.s:1700 +0x1 fp=0xc000b03fe8 sp=0xc000b03fe0 pc=0x57b24c2bfa01
2026-01-06T14:46:00.429Z created by github.com/ollama/ollama/runner/ollamarunner.Execute in goroutine 1
2026-01-06T14:46:00.429Z github.com/ollama/ollama/runner/ollamarunner/runner.go:1418 +0x4c9
2026-01-06T14:46:00.429Z goroutine 1 gp=0xc000002380 m=nil [IO wait]:
2026-01-06T14:46:00.429Z runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
2026-01-06T14:46:00.429Z runtime/proc.go:435 +0xce fp=0xc000b05790 sp=0xc000b05770 pc=0x57b24c2b7b6e
2026-01-06T14:46:00.429Z runtime.netpollblock(0xc00051f7e0?, 0x4c2512a6?, 0xb2?)
2026-01-06T14:46:00.429Z runtime/netpoll.go:575 +0xf7 fp=0xc000b057c8 sp=0xc000b05790 pc=0x57b24c27ce97
2026-01-06T14:46:00.429Z internal/poll.runtime_pollWait(0x759a3ae56eb0, 0x72)
2026-01-06T14:46:00.429Z runtime/netpoll.go:351 +0x85 fp=0xc000b057e8 sp=0xc000b057c8 pc=0x57b24c2b6d85
2026-01-06T14:46:00.429Z internal/poll.(*pollDesc).wait(0xc00062ff00?, 0x900000036?, 0x0)
2026-01-06T14:46:00.429Z internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc000b05810 sp=0xc000b057e8 pc=0x57b24c33ef07
2026-01-06T14:46:00.429Z internal/poll.(*pollDesc).waitRead(...)
2026-01-06T14:46:00.429Z internal/poll/fd_poll_runtime.go:89
2026-01-06T14:46:00.429Z internal/poll.(*FD).Accept(0xc00062ff00)
2026-01-06T14:46:00.429Z internal/poll/fd_unix.go:620 +0x295 fp=0xc000b058b8 sp=0xc000b05810 pc=0x57b24c3442d5
2026-01-06T14:46:00.429Z net.(*netFD).accept(0xc00062ff00)
2026-01-06T14:46:00.429Z net/fd_unix.go:172 +0x29 fp=0xc000b05970 sp=0xc000b058b8 pc=0x57b24c3b71a9
2026-01-06T14:46:00.429Z net.(*TCPListener).accept(0xc000415480)
2026-01-06T14:46:00.429Z net/tcpsock_posix.go:159 +0x1b fp=0xc000b059c0 sp=0xc000b05970 pc=0x57b24c3ccb5b
2026-01-06T14:46:00.429Z net.(*TCPListener).Accept(0xc000415480)
2026-01-06T14:46:00.429Z net/tcpsock.go:380 +0x30 fp=0xc000b059f0 sp=0xc000b059c0 pc=0x57b24c3cba10
2026-01-06T14:46:00.429Z net/http.(*onceCloseListener).Accept(0xc0004b43f0?)
2026-01-06T14:46:00.429Z <autogenerated>:1 +0x24 fp=0xc000b05a08 sp=0xc000b059f0 pc=0x57b24c5e31e4
2026-01-06T14:46:00.429Z net/http.(*Server).Serve(0xc00050ef00, {0x57b24d86adc0, 0xc000415480})
2026-01-06T14:46:00.429Z net/http/server.go:3424 +0x30c fp=0xc000b05b38 sp=0xc000b05a08 pc=0x57b24c5baaac
2026-01-06T14:46:00.429Z github.com/ollama/ollama/runner/ollamarunner.Execute({0xc0000340a0, 0x4, 0x4})
2026-01-06T14:46:00.429Z github.com/ollama/ollama/runner/ollamarunner/runner.go:1441 +0x94e fp=0xc000b05d08 sp=0xc000b05b38 pc=0x57b24c7d83ee
2026-01-06T14:46:00.429Z github.com/ollama/ollama/runner.Execute({0xc000034080?, 0x0?, 0x0?})
2026-01-06T14:46:00.429Z github.com/ollama/ollama/runner/runner.go:20 +0xc9 fp=0xc000b05d30 sp=0xc000b05d08 pc=0x57b24c7d8ce9
2026-01-06T14:46:00.429Z github.com/ollama/ollama/cmd.NewCLI.func2(0xc00050ed00?, {0x57b24d34d0ad?, 0x4?, 0x57b24d34d0b1?})
2026-01-06T14:46:00.429Z github.com/ollama/ollama/cmd/cmd.go:1841 +0x45 fp=0xc000b05d58 sp=0xc000b05d30 pc=0x57b24cf95f25
2026-01-06T14:46:00.429Z github.com/spf13/cobra.(*Command).execute(0xc00067b808, {0xc0005334a0, 0x5, 0x5})
2026-01-06T14:46:00.429Z github.com/spf13/cobra@v1.7.0/command.go:940 +0x85c fp=0xc000b05e78 sp=0xc000b05d58 pc=0x57b24c4307fc
2026-01-06T14:46:00.429Z github.com/spf13/cobra.(*Command).ExecuteC(0xc00054c908)
2026-01-06T14:46:00.429Z github.com/spf13/cobra@v1.7.0/command.go:1068 +0x3a5 fp=0xc000b05f30 sp=0xc000b05e78 pc=0x57b24c431045
2026-01-06T14:46:00.429Z github.com/spf13/cobra.(*Command).Execute(...)
2026-01-06T14:46:00.429Z github.com/spf13/cobra@v1.7.0/command.go:992
2026-01-06T14:46:00.429Z github.com/spf13/cobra.(*Command).ExecuteContext(...)
2026-01-06T14:46:00.429Z github.com/spf13/cobra@v1.7.0/command.go:985
2026-01-06T14:46:00.429Z main.main()
2026-01-06T14:46:00.429Z github.com/ollama/ollama/main.go:12 +0x4d fp=0xc000b05f50 sp=0xc000b05f30 pc=0x57b24cf96a0d
2026-01-06T14:46:00.429Z runtime.main()
2026-01-06T14:46:00.429Z runtime/proc.go:283 +0x29d fp=0xc000b05fe0 sp=0xc000b05f50 pc=0x57b24c28451d
2026-01-06T14:46:00.429Z runtime.goexit({})
2026-01-06T14:46:00.429Z runtime/asm_amd64.s:1700 +0x1 fp=0xc000b05fe8 sp=0xc000b05fe0 pc=0x57b24c2bfa01
2026-01-06T14:46:00.429Z goroutine 2 gp=0xc000002e00 m=nil [force gc (idle)]:
2026-01-06T14:46:00.429Z runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
2026-01-06T14:46:00.429Z runtime/proc.go:435 +0xce fp=0xc000072fa8 sp=0xc000072f88 pc=0x57b24c2b7b6e
2026-01-06T14:46:00.429Z runtime.goparkunlock(...)
2026-01-06T14:46:00.429Z runtime/proc.go:441
2026-01-06T14:46:00.429Z runtime.forcegchelper()
2026-01-06T14:46:00.429Z runtime/proc.go:348 +0xb8 fp=0xc000072fe0 sp=0xc000072fa8 pc=0x57b24c284858
2026-01-06T14:46:00.429Z runtime.goexit({})
2026-01-06T14:46:00.429Z runtime/asm_amd64.s:1700 +0x1 fp=0xc000072fe8 sp=0xc000072fe0 pc=0x57b24c2bfa01
2026-01-06T14:46:00.429Z created by runtime.init.7 in goroutine 1
2026-01-06T14:46:00.429Z runtime/proc.go:336 +0x1a
2026-01-06T14:46:00.429Z goroutine 3 gp=0xc000003340 m=nil [GC sweep wait]:
2026-01-06T14:46:00.429Z runtime.gopark(0x57b24e13f701?, 0x0?, 0x0?, 0x0?, 0x0?)
2026-01-06T14:46:00.429Z runtime/proc.go:435 +0xce fp=0xc000073780 sp=0xc000073760 pc=0x57b24c2b7b6e
2026-01-06T14:46:00.429Z runtime.goparkunlock(...)
2026-01-06T14:46:00.429Z runtime/proc.go:441
2026-01-06T14:46:00.429Z runtime.bgsweep(0xc00007e000)
2026-01-06T14:46:00.429Z runtime/mgcsweep.go:316 +0xdf fp=0xc0000737c8 sp=0xc000073780 pc=0x57b24c26efff
2026-01-06T14:46:00.429Z runtime.gcenable.gowrap1()
2026-01-06T14:46:00.429Z runtime/mgc.go:204 +0x25 fp=0xc0000737e0 sp=0xc0000737c8 pc=0x57b24c2633e5
2026-01-06T14:46:00.429Z runtime.goexit({})
2026-01-06T14:46:00.429Z runtime/asm_amd64.s:1700 +0x1 fp=0xc0000737e8 sp=0xc0000737e0 pc=0x57b24c2bfa01
2026-01-06T14:46:00.429Z created by runtime.gcenable in goroutine 1
2026-01-06T14:46:00.429Z runtime/mgc.go:204 +0x66
2026-01-06T14:46:00.429Z goroutine 4 gp=0xc000003500 m=nil [GC scavenge wait]:
2026-01-06T14:46:00.429Z runtime.gopark(0x6cfcb6?, 0x6931a1?, 0x0?, 0x0?, 0x0?)
2026-01-06T14:46:00.429Z runtime/proc.go:435 +0xce fp=0xc000073f78 sp=0xc000073f58 pc=0x57b24c2b7b6e
2026-01-06T14:46:00.429Z runtime.goparkunlock(...)
2026-01-06T14:46:00.429Z runtime/proc.go:441
2026-01-06T14:46:00.429Z runtime.(*scavengerState).park(0x57b24e141280)
2026-01-06T14:46:00.429Z runtime/mgcscavenge.go:425 +0x49 fp=0xc000073fa8 sp=0xc000073f78 pc=0x57b24c26ca49
2026-01-06T14:46:00.429Z runtime.bgscavenge(0xc00007e000)
2026-01-06T14:46:00.429Z runtime/mgcscavenge.go:658 +0x59 fp=0xc000073fc8 sp=0xc000073fa8 pc=0x57b24c26cfd9
2026-01-06T14:46:00.429Z runtime.gcenable.gowrap2()
2026-01-06T14:46:00.429Z runtime/mgc.go:205 +0x25 fp=0xc000073fe0 sp=0xc000073fc8 pc=0x57b24c263385
2026-01-06T14:46:00.430Z runtime.goexit({})
2026-01-06T14:46:00.430Z runtime/asm_amd64.s:1700 +0x1 fp=0xc000073fe8 sp=0xc000073fe0 pc=0x57b24c2bfa01
2026-01-06T14:46:00.430Z created by runtime.gcenable in goroutine 1
2026-01-06T14:46:00.430Z runtime/mgc.go:205 +0xa5
2026-01-06T14:46:00.430Z goroutine 5 gp=0xc000003dc0 m=nil [finalizer wait]:
2026-01-06T14:46:00.430Z runtime.gopark(0x1b8?, 0x57b24d849020?, 0x1?, 0x23?, 0x57b24c2bda14?)
2026-01-06T14:46:00.430Z runtime/proc.go:435 +0xce fp=0xc000072630 sp=0xc000072610 pc=0x57b24c2b7b6e
2026-01-06T14:46:00.430Z runtime.runfinq()
2026-01-06T14:46:00.430Z runtime/mfinal.go:196 +0x107 fp=0xc0000727e0 sp=0xc000072630 pc=0x57b24c2623a7
2026-01-06T14:46:00.430Z runtime.goexit({})
2026-01-06T14:46:00.430Z runtime/asm_amd64.s:1700 +0x1 fp=0xc0000727e8 sp=0xc0000727e0 pc=0x57b24c2bfa01
2026-01-06T14:46:00.430Z created by runtime.createfing in goroutine 1
2026-01-06T14:46:00.430Z runtime/mfinal.go:166 +0x3d
2026-01-06T14:46:00.430Z goroutine 6 gp=0xc0001ce8c0 m=nil [chan receive]:
2026-01-06T14:46:00.430Z runtime.gopark(0xc000223680?, 0xc001002018?, 0x60?, 0x47?, 0x57b24c39dde8?)
2026-01-06T14:46:00.430Z runtime/proc.go:435 +0xce fp=0xc000074718 sp=0xc0000746f8 pc=0x57b24c2b7b6e
2026-01-06T14:46:00.430Z runtime.chanrecv(0xc00003e380, 0x0, 0x1)
2026-01-06T14:46:00.430Z runtime/chan.go:664 +0x445 fp=0xc000074790 sp=0xc000074718 pc=0x57b24c253e85
2026-01-06T14:46:00.430Z runtime.chanrecv1(0x0?, 0x0?)
2026-01-06T14:46:00.430Z runtime/chan.go:506 +0x12 fp=0xc0000747b8 sp=0xc000074790 pc=0x57b24c253a12
2026-01-06T14:46:00.430Z runtime.unique_runtime_registerUniqueMapCleanup.func2(...)
2026-01-06T14:46:00.430Z runtime/mgc.go:1796
2026-01-06T14:46:00.430Z runtime.unique_runtime_registerUniqueMapCleanup.gowrap1()
2026-01-06T14:46:00.430Z runtime/mgc.go:1799 +0x2f fp=0xc0000747e0 sp=0xc0000747b8 pc=0x57b24c26658f
2026-01-06T14:46:00.430Z runtime.goexit({})
2026-01-06T14:46:00.430Z runtime/asm_amd64.s:1700 +0x1 fp=0xc0000747e8 sp=0xc0000747e0 pc=0x57b24c2bfa01
2026-01-06T14:46:00.430Z created by unique.runtime_registerUniqueMapCleanup in goroutine 1
2026-01-06T14:46:00.430Z runtime/mgc.go:1794 +0x85
2026-01-06T14:46:00.430Z goroutine 7 gp=0xc0001cee00 m=nil [GC worker (idle)]:
2026-01-06T14:46:00.430Z runtime.gopark(0x2c05bd5c2bf5?, 0x3?, 0x6d?, 0x2?, 0x0?)
2026-01-06T14:46:00.430Z runtime/proc.go:435 +0xce fp=0xc000074f38 sp=0xc000074f18 pc=0x57b24c2b7b6e
2026-01-06T14:46:00.430Z runtime.gcBgMarkWorker(0xc00003f960)
2026-01-06T14:46:00.430Z runtime/mgc.go:1423 +0xe9 fp=0xc000074fc8 sp=0xc000074f38 pc=0x57b24c2658a9
2026-01-06T14:46:00.430Z runtime.gcBgMarkStartWorkers.gowrap1()
2026-01-06T14:46:00.430Z runtime/mgc.go:1339 +0x25 fp=0xc000074fe0 sp=0xc000074fc8 pc=0x57b24c265785
2026-01-06T14:46:00.430Z runtime.goexit({})
2026-01-06T14:46:00.430Z runtime/asm_amd64.s:1700 +0x1 fp=0xc000074fe8 sp=0xc000074fe0 pc=0x57b24c2bfa01
2026-01-06T14:46:00.430Z created by runtime.gcBgMarkStartWorkers in goroutine 1
2026-01-06T14:46:00.430Z runtime/mgc.go:1339 +0x105
2026-01-06T14:46:00.430Z goroutine 18 gp=0xc000102380 m=nil [GC worker (idle)]:
2026-01-06T14:46:00.430Z runtime.gopark(0x2c05fd6a3f20?, 0x3?, 0x65?, 0xe?, 0x0?)
2026-01-06T14:46:00.430Z runtime/proc.go:435 +0xce fp=0xc00006e738 sp=0xc00006e718 pc=0x57b24c2b7b6e
2026-01-06T14:46:00.430Z runtime.gcBgMarkWorker(0xc00003f960)
2026-01-06T14:46:00.430Z runtime/mgc.go:1423 +0xe9 fp=0xc00006e7c8 sp=0xc00006e738 pc=0x57b24c2658a9
2026-01-06T14:46:00.430Z runtime.gcBgMarkStartWorkers.gowrap1()
2026-01-06T14:46:00.430Z runtime/mgc.go:1339 +0x25 fp=0xc00006e7e0 sp=0xc00006e7c8 pc=0x57b24c265785
2026-01-06T14:46:00.430Z runtime.goexit({})
2026-01-06T14:46:00.430Z runtime/asm_amd64.s:1700 +0x1 fp=0xc00006e7e8 sp=0xc00006e7e0 pc=0x57b24c2bfa01
2026-01-06T14:46:00.430Z created by runtime.gcBgMarkStartWorkers in goroutine 1
2026-01-06T14:46:00.430Z runtime/mgc.go:1339 +0x105
2026-01-06T14:46:00.430Z goroutine 19 gp=0xc000102540 m=nil [GC worker (idle)]:
2026-01-06T14:46:00.430Z runtime.gopark(0x2c05fd6b0afd?, 0x1?, 0xdf?, 0xc?, 0x0?)
2026-01-06T14:46:00.430Z runtime/proc.go:435 +0xce fp=0xc00006ef38 sp=0xc00006ef18 pc=0x57b24c2b7b6e
2026-01-06T14:46:00.430Z runtime.gcBgMarkWorker(0xc00003f960)
2026-01-06T14:46:00.430Z runtime/mgc.go:1423 +0xe9 fp=0xc00006efc8 sp=0xc00006ef38 pc=0x57b24c2658a9
2026-01-06T14:46:00.430Z runtime.gcBgMarkStartWorkers.gowrap1()
2026-01-06T14:46:00.430Z runtime/mgc.go:1339 +0x25 fp=0xc00006efe0 sp=0xc00006efc8 pc=0x57b24c265785
2026-01-06T14:46:00.430Z runtime.goexit({})
2026-01-06T14:46:00.430Z runtime/asm_amd64.s:1700 +0x1 fp=0xc00006efe8 sp=0xc00006efe0 pc=0x57b24c2bfa01
2026-01-06T14:46:00.430Z created by runtime.gcBgMarkStartWorkers in goroutine 1
2026-01-06T14:46:00.430Z runtime/mgc.go:1339 +0x105
2026-01-06T14:46:00.430Z goroutine 20 gp=0xc000102700 m=nil [GC worker (idle)]:
2026-01-06T14:46:00.430Z runtime.gopark(0x2c05fd5f8e80?, 0x3?, 0xc9?, 0xd7?, 0x0?)
2026-01-06T14:46:00.430Z runtime/proc.go:435 +0xce fp=0xc00006f738 sp=0xc00006f718 pc=0x57b24c2b7b6e
2026-01-06T14:46:00.430Z runtime.gcBgMarkWorker(0xc00003f960)
2026-01-06T14:46:00.430Z runtime/mgc.go:1423 +0xe9 fp=0xc00006f7c8 sp=0xc00006f738 pc=0x57b24c2658a9
2026-01-06T14:46:00.430Z runtime.gcBgMarkStartWorkers.gowrap1()
2026-01-06T14:46:00.430Z runtime/mgc.go:1339 +0x25 fp=0xc00006f7e0 sp=0xc00006f7c8 pc=0x57b24c265785
2026-01-06T14:46:00.430Z runtime.goexit({})
2026-01-06T14:46:00.430Z runtime/asm_amd64.s:1700 +0x1 fp=0xc00006f7e8 sp=0xc00006f7e0 pc=0x57b24c2bfa01
2026-01-06T14:46:00.430Z created by runtime.gcBgMarkStartWorkers in goroutine 1
2026-01-06T14:46:00.430Z runtime/mgc.go:1339 +0x105
2026-01-06T14:46:00.430Z goroutine 21 gp=0xc0001028c0 m=nil [GC worker (idle)]:
2026-01-06T14:46:00.430Z runtime.gopark(0x57b24e20f680?, 0x1?, 0x59?, 0x11?, 0x0?)
2026-01-06T14:46:00.430Z runtime/proc.go:435 +0xce fp=0xc00006ff38 sp=0xc00006ff18 pc=0x57b24c2b7b6e
2026-01-06T14:46:00.430Z runtime.gcBgMarkWorker(0xc00003f960)
2026-01-06T14:46:00.430Z runtime/mgc.go:1423 +0xe9 fp=0xc00006ffc8 sp=0xc00006ff38 pc=0x57b24c2658a9
2026-01-06T14:46:00.430Z runtime.gcBgMarkStartWorkers.gowrap1()
2026-01-06T14:46:00.430Z runtime/mgc.go:1339 +0x25 fp=0xc00006ffe0 sp=0xc00006ffc8 pc=0x57b24c265785
2026-01-06T14:46:00.430Z runtime.goexit({})
2026-01-06T14:46:00.430Z runtime/asm_amd64.s:1700 +0x1 fp=0xc00006ffe8 sp=0xc00006ffe0 pc=0x57b24c2bfa01
2026-01-06T14:46:00.430Z created by runtime.gcBgMarkStartWorkers in goroutine 1
2026-01-06T14:46:00.430Z runtime/mgc.go:1339 +0x105
2026-01-06T14:46:00.430Z goroutine 34 gp=0xc000504000 m=nil [GC worker (idle)]:
2026-01-06T14:46:00.430Z runtime.gopark(0x2c05fd78f15d?, 0x1?, 0xb6?, 0xf9?, 0x0?)
2026-01-06T14:46:00.430Z runtime/proc.go:435 +0xce fp=0xc00050a738 sp=0xc00050a718 pc=0x57b24c2b7b6e
2026-01-06T14:46:00.430Z runtime.gcBgMarkWorker(0xc00003f960)
2026-01-06T14:46:00.430Z runtime/mgc.go:1423 +0xe9 fp=0xc00050a7c8 sp=0xc00050a738 pc=0x57b24c2658a9
2026-01-06T14:46:00.430Z runtime.gcBgMarkStartWorkers.gowrap1()
2026-01-06T14:46:00.430Z runtime/mgc.go:1339 +0x25 fp=0xc00050a7e0 sp=0xc00050a7c8 pc=0x57b24c265785
2026-01-06T14:46:00.431Z runtime.goexit({})
2026-01-06T14:46:00.431Z runtime/asm_amd64.s:1700 +0x1 fp=0xc00050a7e8 sp=0xc00050a7e0 pc=0x57b24c2bfa01
2026-01-06T14:46:00.431Z created by runtime.gcBgMarkStartWorkers in goroutine 1
2026-01-06T14:46:00.431Z runtime/mgc.go:1339 +0x105
2026-01-06T14:46:00.431Z goroutine 35 gp=0xc0005041c0 m=nil [GC worker (idle)]:
2026-01-06T14:46:00.431Z runtime.gopark(0x2c05fd6aa65e?, 0x3?, 0xbf?, 0x72?, 0x0?)
2026-01-06T14:46:00.431Z runtime/proc.go:435 +0xce fp=0xc00050af38 sp=0xc00050af18 pc=0x57b24c2b7b6e
2026-01-06T14:46:00.431Z runtime.gcBgMarkWorker(0xc00003f960)
2026-01-06T14:46:00.431Z runtime/mgc.go:1423 +0xe9 fp=0xc00050afc8 sp=0xc00050af38 pc=0x57b24c2658a9
2026-01-06T14:46:00.431Z runtime.gcBgMarkStartWorkers.gowrap1()
2026-01-06T14:46:00.431Z runtime/mgc.go:1339 +0x25 fp=0xc00050afe0 sp=0xc00050afc8 pc=0x57b24c265785
2026-01-06T14:46:00.431Z runtime.goexit({})
2026-01-06T14:46:00.431Z runtime/asm_amd64.s:1700 +0x1 fp=0xc00050afe8 sp=0xc00050afe0 pc=0x57b24c2bfa01
2026-01-06T14:46:00.431Z created by runtime.gcBgMarkStartWorkers in goroutine 1
2026-01-06T14:46:00.431Z runtime/mgc.go:1339 +0x105
2026-01-06T14:46:00.431Z goroutine 36 gp=0xc000504380 m=nil [GC worker (idle)]:
2026-01-06T14:46:00.431Z runtime.gopark(0x2c05fd5ce92a?, 0x3?, 0x4c?, 0x6f?, 0x0?)
2026-01-06T14:46:00.431Z runtime/proc.go:435 +0xce fp=0xc00050b738 sp=0xc00050b718 pc=0x57b24c2b7b6e
2026-01-06T14:46:00.431Z runtime.gcBgMarkWorker(0xc00003f960)
2026-01-06T14:46:00.431Z runtime/mgc.go:1423 +0xe9 fp=0xc00050b7c8 sp=0xc00050b738 pc=0x57b24c2658a9
2026-01-06T14:46:00.431Z runtime.gcBgMarkStartWorkers.gowrap1()
2026-01-06T14:46:00.431Z runtime/mgc.go:1339 +0x25 fp=0xc00050b7e0 sp=0xc00050b7c8 pc=0x57b24c265785
2026-01-06T14:46:00.431Z runtime.goexit({})
2026-01-06T14:46:00.431Z runtime/asm_amd64.s:1700 +0x1 fp=0xc00050b7e8 sp=0xc00050b7e0 pc=0x57b24c2bfa01
2026-01-06T14:46:00.431Z created by runtime.gcBgMarkStartWorkers in goroutine 1
2026-01-06T14:46:00.431Z runtime/mgc.go:1339 +0x105
2026-01-06T14:46:00.431Z goroutine 37 gp=0xc000504540 m=nil [GC worker (idle)]:
2026-01-06T14:46:00.431Z runtime.gopark(0x2c05fd6900a2?, 0x1?, 0xa1?, 0xf2?, 0x0?)
2026-01-06T14:46:00.431Z runtime/proc.go:435 +0xce fp=0xc00050bf38 sp=0xc00050bf18 pc=0x57b24c2b7b6e
2026-01-06T14:46:00.431Z runtime.gcBgMarkWorker(0xc00003f960)
2026-01-06T14:46:00.431Z runtime/mgc.go:1423 +0xe9 fp=0xc00050bfc8 sp=0xc00050bf38 pc=0x57b24c2658a9
2026-01-06T14:46:00.431Z runtime.gcBgMarkStartWorkers.gowrap1()
2026-01-06T14:46:00.431Z runtime/mgc.go:1339 +0x25 fp=0xc00050bfe0 sp=0xc00050bfc8 pc=0x57b24c265785
2026-01-06T14:46:00.431Z runtime.goexit({})
2026-01-06T14:46:00.431Z runtime/asm_amd64.s:1700 +0x1 fp=0xc00050bfe8 sp=0xc00050bfe0 pc=0x57b24c2bfa01
2026-01-06T14:46:00.431Z created by runtime.gcBgMarkStartWorkers in goroutine 1
2026-01-06T14:46:00.431Z runtime/mgc.go:1339 +0x105
2026-01-06T14:46:00.431Z goroutine 38 gp=0xc000504700 m=nil [GC worker (idle)]:
2026-01-06T14:46:00.431Z runtime.gopark(0x2c05fd78c896?, 0x3?, 0xe3?, 0x96?, 0x0?)
2026-01-06T14:46:00.431Z runtime/proc.go:435 +0xce fp=0xc00050c738 sp=0xc00050c718 pc=0x57b24c2b7b6e
2026-01-06T14:46:00.431Z runtime.gcBgMarkWorker(0xc00003f960)
2026-01-06T14:46:00.431Z runtime/mgc.go:1423 +0xe9 fp=0xc00050c7c8 sp=0xc00050c738 pc=0x57b24c2658a9
2026-01-06T14:46:00.431Z runtime.gcBgMarkStartWorkers.gowrap1()
2026-01-06T14:46:00.431Z runtime/mgc.go:1339 +0x25 fp=0xc00050c7e0 sp=0xc00050c7c8 pc=0x57b24c265785
2026-01-06T14:46:00.431Z runtime.goexit({})
2026-01-06T14:46:00.431Z runtime/asm_amd64.s:1700 +0x1 fp=0xc00050c7e8 sp=0xc00050c7e0 pc=0x57b24c2bfa01
2026-01-06T14:46:00.431Z created by runtime.gcBgMarkStartWorkers in goroutine 1
2026-01-06T14:46:00.431Z runtime/mgc.go:1339 +0x105
2026-01-06T14:46:00.431Z goroutine 39 gp=0xc0005048c0 m=nil [GC worker (idle)]:
2026-01-06T14:46:00.431Z runtime.gopark(0x57b24e20f680?, 0x1?, 0xf3?, 0xc?, 0x0?)
2026-01-06T14:46:00.431Z runtime/proc.go:435 +0xce fp=0xc00050cf38 sp=0xc00050cf18 pc=0x57b24c2b7b6e
2026-01-06T14:46:00.431Z runtime.gcBgMarkWorker(0xc00003f960)
2026-01-06T14:46:00.431Z runtime/mgc.go:1423 +0xe9 fp=0xc00050cfc8 sp=0xc00050cf38 pc=0x57b24c2658a9
2026-01-06T14:46:00.431Z runtime.gcBgMarkStartWorkers.gowrap1()
2026-01-06T14:46:00.431Z runtime/mgc.go:1339 +0x25 fp=0xc00050cfe0 sp=0xc00050cfc8 pc=0x57b24c265785
2026-01-06T14:46:00.431Z runtime.goexit({})
2026-01-06T14:46:00.431Z runtime/asm_amd64.s:1700 +0x1 fp=0xc00050cfe8 sp=0xc00050cfe0 pc=0x57b24c2bfa01
2026-01-06T14:46:00.431Z created by runtime.gcBgMarkStartWorkers in goroutine 1
2026-01-06T14:46:00.431Z runtime/mgc.go:1339 +0x105
2026-01-06T14:46:00.431Z goroutine 40 gp=0xc000504a80 m=nil [GC worker (idle)]:
2026-01-06T14:46:00.431Z runtime.gopark(0x2c05fd6a2986?, 0x3?, 0xae?, 0x20?, 0x0?)
2026-01-06T14:46:00.431Z runtime/proc.go:435 +0xce fp=0xc00050d738 sp=0xc00050d718 pc=0x57b24c2b7b6e
2026-01-06T14:46:00.431Z runtime.gcBgMarkWorker(0xc00003f960)
2026-01-06T14:46:00.431Z runtime/mgc.go:1423 +0xe9 fp=0xc00050d7c8 sp=0xc00050d738 pc=0x57b24c2658a9
2026-01-06T14:46:00.431Z runtime.gcBgMarkStartWorkers.gowrap1()
2026-01-06T14:46:00.431Z runtime/mgc.go:1339 +0x25 fp=0xc00050d7e0 sp=0xc00050d7c8 pc=0x57b24c265785
2026-01-06T14:46:00.431Z runtime.goexit({})
2026-01-06T14:46:00.431Z runtime/asm_amd64.s:1700 +0x1 fp=0xc00050d7e8 sp=0xc00050d7e0 pc=0x57b24c2bfa01
2026-01-06T14:46:00.431Z created by runtime.gcBgMarkStartWorkers in goroutine 1
2026-01-06T14:46:00.431Z runtime/mgc.go:1339 +0x105
2026-01-06T14:46:00.431Z goroutine 9 gp=0xc0005828c0 m=nil [select]:
2026-01-06T14:46:00.431Z runtime.gopark(0xc016187a08?, 0x2?, 0x4?, 0x0?, 0xc01618786c?)
2026-01-06T14:46:00.431Z runtime/proc.go:435 +0xce fp=0xc016187698 sp=0xc016187678 pc=0x57b24c2b7b6e
2026-01-06T14:46:00.431Z runtime.selectgo(0xc016187a08, 0xc016187868, 0xc000552e40?, 0x0, 0x1?, 0x1)
2026-01-06T14:46:00.431Z runtime/select.go:351 +0x837 fp=0xc0161877d0 sp=0xc016187698 pc=0x57b24c296a17
2026-01-06T14:46:00.431Z github.com/ollama/ollama/runner/ollamarunner.(*Server).completion(0xc000226f00, {0x57b24d86afa0, 0xc000aee000}, 0xc000494000)
2026-01-06T14:46:00.431Z github.com/ollama/ollama/runner/ollamarunner/runner.go:950 +0xc4e fp=0xc016187ac0 sp=0xc0161877d0 pc=0x57b24c7d368e
2026-01-06T14:46:00.431Z github.com/ollama/ollama/runner/ollamarunner.(*Server).completion-fm({0x57b24d86afa0?, 0xc000aee000?}, 0xc000049b40?)
2026-01-06T14:46:00.431Z <autogenerated>:1 +0x36 fp=0xc016187af0 sp=0xc016187ac0 pc=0x57b24c7d8b56
2026-01-06T14:46:00.431Z net/http.HandlerFunc.ServeHTTP(0xc000538a80?, {0x57b24d86afa0?, 0xc000aee000?}, 0xc000049b60?)
2026-01-06T14:46:00.432Z net/http/server.go:2294 +0x29 fp=0xc016187b18 sp=0xc016187af0 pc=0x57b24c5b70e9
2026-01-06T14:46:00.432Z net/http.(*ServeMux).ServeHTTP(0x57b24c25c8c5?, {0x57b24d86afa0, 0xc000aee000}, 0xc000494000)
2026-01-06T14:46:00.432Z net/http/server.go:2822 +0x1c4 fp=0xc016187b68 sp=0xc016187b18 pc=0x57b24c5b8fe4
2026-01-06T14:46:00.432Z net/http.serverHandler.ServeHTTP({0x57b24d867590?}, {0x57b24d86afa0?, 0xc000aee000?}, 0x1?)
2026-01-06T14:46:00.432Z net/http/server.go:3301 +0x8e fp=0xc016187b98 sp=0xc016187b68 pc=0x57b24c5d6a6e
2026-01-06T14:46:00.432Z net/http.(*conn).serve(0xc0004b43f0, {0x57b24d86d3d8, 0xc00021c2a0})
2026-01-06T14:46:00.432Z net/http/server.go:2102 +0x625 fp=0xc016187fb8 sp=0xc016187b98 pc=0x57b24c5b55e5
2026-01-06T14:46:00.432Z net/http.(*Server).Serve.gowrap3()
2026-01-06T14:46:00.432Z net/http/server.go:3454 +0x28 fp=0xc016187fe0 sp=0xc016187fb8 pc=0x57b24c5baea8
2026-01-06T14:46:00.432Z runtime.goexit({})
2026-01-06T14:46:00.432Z runtime/asm_amd64.s:1700 +0x1 fp=0xc016187fe8 sp=0xc016187fe0 pc=0x57b24c2bfa01
2026-01-06T14:46:00.432Z created by net/http.(*Server).Serve in goroutine 1
2026-01-06T14:46:00.432Z net/http/server.go:3454 +0x485
2026-01-06T14:46:00.432Z goroutine 830 gp=0xc000582fc0 m=nil [IO wait]:
2026-01-06T14:46:00.432Z runtime.gopark(0xff800000ff800000?, 0xff800000ff800000?, 0x0?, 0x0?, 0xb?)
2026-01-06T14:46:00.432Z runtime/proc.go:435 +0xce fp=0xc0009875d8 sp=0xc0009875b8 pc=0x57b24c2b7b6e
2026-01-06T14:46:00.432Z runtime.netpollblock(0x57b24c2db338?, 0x4c2512a6?, 0xb2?)
2026-01-06T14:46:00.432Z runtime/netpoll.go:575 +0xf7 fp=0xc000987610 sp=0xc0009875d8 pc=0x57b24c27ce97
2026-01-06T14:46:00.432Z internal/poll.runtime_pollWait(0x759a3ae56d98, 0x72)
2026-01-06T14:46:00.432Z runtime/netpoll.go:351 +0x85 fp=0xc000987630 sp=0xc000987610 pc=0x57b24c2b6d85
2026-01-06T14:46:00.432Z internal/poll.(*pollDesc).wait(0xc0001ca000?, 0xc000272041?, 0x0)
2026-01-06T14:46:00.432Z internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc000987658 sp=0xc000987630 pc=0x57b24c33ef07
2026-01-06T14:46:00.432Z internal/poll.(*pollDesc).waitRead(...)
2026-01-06T14:46:00.432Z internal/poll/fd_poll_runtime.go:89
2026-01-06T14:46:00.432Z internal/poll.(*FD).Read(0xc0001ca000, {0xc000272041, 0x1, 0x1})
2026-01-06T14:46:00.432Z internal/poll/fd_unix.go:165 +0x27a fp=0xc0009876f0 sp=0xc000987658 pc=0x57b24c3401fa
2026-01-06T14:46:00.432Z net.(*netFD).Read(0xc0001ca000, {0xc000272041?, 0xc00053c4d8?, 0xc000987770?})
2026-01-06T14:46:00.432Z net/fd_posix.go:55 +0x25 fp=0xc000987738 sp=0xc0009876f0 pc=0x57b24c3b5205
2026-01-06T14:46:00.432Z net.(*conn).Read(0xc000076658, {0xc000272041?, 0xc0043ad840?, 0x57b24c620500?})
2026-01-06T14:46:00.432Z net/net.go:194 +0x45 fp=0xc000987780 sp=0xc000987738 pc=0x57b24c3c35c5
2026-01-06T14:46:00.432Z net/http.(*connReader).backgroundRead(0xc000272030)
2026-01-06T14:46:00.432Z net/http/server.go:690 +0x37 fp=0xc0009877c8 sp=0xc000987780 pc=0x57b24c5af4b7
2026-01-06T14:46:00.432Z net/http.(*connReader).startBackgroundRead.gowrap2()
2026-01-06T14:46:00.432Z net/http/server.go:686 +0x25 fp=0xc0009877e0 sp=0xc0009877c8 pc=0x57b24c5af3e5
2026-01-06T14:46:00.432Z runtime.goexit({})
2026-01-06T14:46:00.432Z runtime/asm_amd64.s:1700 +0x1 fp=0xc0009877e8 sp=0xc0009877e0 pc=0x57b24c2bfa01
2026-01-06T14:46:00.432Z created by net/http.(*connReader).startBackgroundRead in goroutine 9
2026-01-06T14:46:00.432Z net/http/server.go:686 +0xb6
2026-01-06T14:46:00.432Z goroutine 3450 gp=0xc0001cf880 m=nil [sync.Mutex.Lock]:
2026-01-06T14:46:00.432Z runtime.gopark(0xc00055b008?, 0xc00012de90?, 0x60?, 0x98?, 0x57b24c2b5679?)
2026-01-06T14:46:00.432Z runtime/proc.go:435 +0xce fp=0xc00008aa88 sp=0xc00008aa68 pc=0x57b24c2b7b6e
2026-01-06T14:46:00.432Z runtime.goparkunlock(...)
2026-01-06T14:46:00.432Z runtime/proc.go:441
2026-01-06T14:46:00.432Z runtime.semacquire1(0xc000226ffc, 0x0, 0x3, 0x2, 0x15)
2026-01-06T14:46:00.432Z runtime/sema.go:188 +0x229 fp=0xc00008aaf0 sp=0xc00008aa88 pc=0x57b24c297ae9
2026-01-06T14:46:00.432Z internal/sync.runtime_SemacquireMutex(0x57b24c666e74?, 0xd8?, 0xc00012de90?)
2026-01-06T14:46:00.432Z runtime/sema.go:95 +0x25 fp=0xc00008ab28 sp=0xc00008aaf0 pc=0x57b24c2b9385
2026-01-06T14:46:00.432Z internal/sync.(*Mutex).lockSlow(0xc000226ff8)
2026-01-06T14:46:00.432Z internal/sync/mutex.go:149 +0x15d fp=0xc00008ab78 sp=0xc00008ab28 pc=0x57b24c2c94dd
2026-01-06T14:46:00.432Z internal/sync.(*Mutex).Lock(...)
2026-01-06T14:46:00.432Z internal/sync/mutex.go:70
2026-01-06T14:46:00.432Z sync.(*Mutex).Lock(...)
2026-01-06T14:46:00.432Z sync/mutex.go:46
2026-01-06T14:46:00.432Z github.com/ollama/ollama/runner/ollamarunner.(*Server).computeBatch(0xc000226f00, {0x4bd, {0x57b24d878250, 0xc0002e2c80}, {0x57b24d882b20, 0xc000662228}, {0xc00100f2d8, 0x1, 0x1}, {{0x57b24d882b20, ...}, ...}, ...})
2026-01-06T14:46:00.432Z github.com/ollama/ollama/runner/ollamarunner/runner.go:735 +0x972 fp=0xc00008aef0 sp=0xc00008ab78 pc=0x57b24c7d1692
2026-01-06T14:46:00.432Z github.com/ollama/ollama/runner/ollamarunner.(*Server).run.gowrap1()
2026-01-06T14:46:00.432Z github.com/ollama/ollama/runner/ollamarunner/runner.go:458 +0x58 fp=0xc00008afe0 sp=0xc00008aef0 pc=0x57b24c7cf198
2026-01-06T14:46:00.432Z runtime.goexit({})
2026-01-06T14:46:00.432Z runtime/asm_amd64.s:1700 +0x1 fp=0xc00008afe8 sp=0xc00008afe0 pc=0x57b24c2bfa01
2026-01-06T14:46:00.432Z created by github.com/ollama/ollama/runner/ollamarunner.(*Server).run in goroutine 8
2026-01-06T14:46:00.432Z github.com/ollama/ollama/runner/ollamarunner/runner.go:458 +0x2cd
2026-01-06T14:46:00.432Z rax    0x0
2026-01-06T14:46:00.432Z rbx    0x95
2026-01-06T14:46:00.432Z rcx    0x759a3b1c3b2c
2026-01-06T14:46:00.432Z rdx    0x6
2026-01-06T14:46:00.432Z rdi    0x8f
2026-01-06T14:46:00.432Z rsi    0x95
2026-01-06T14:46:00.432Z rbp    0x7599f252a330
2026-01-06T14:46:00.432Z rsp    0x7599f252a2f0
2026-01-06T14:46:00.432Z r8     0x0
2026-01-06T14:46:00.432Z r9     0x7
2026-01-06T14:46:00.432Z r10    0x8
2026-01-06T14:46:00.432Z r11    0x246
2026-01-06T14:46:00.432Z r12    0x6
2026-01-06T14:46:00.432Z r13    0x57b24d54cfdc
2026-01-06T14:46:00.432Z r14    0x16
2026-01-06T14:46:00.432Z r15    0x49742400
2026-01-06T14:46:00.432Z rip    0x759a3b1c3b2c
2026-01-06T14:46:00.432Z rflags 0x246
2026-01-06T14:46:00.432Z cs     0x33
2026-01-06T14:46:00.432Z fs     0x0
2026-01-06T14:46:00.432Z gs     0x0
2026-01-06T14:46:00.579Z time=2026-01-06T14:46:00.579Z level=ERROR source=server.go:302 msg="llama runner terminated" error="exit status 2"
2026-01-06T14:46:00.579Z [GIN] 2026/01/06 - 14:46:00 | 500 | 28.388806197s |      10.0.1.232 | POST     "/v1/chat/completions"
2026-01-06T14:46:01.292Z time=2026-01-06T14:46:01.292Z level=INFO source=server.go:429 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 42293"
2026-01-06T14:46:01.673Z time=2026-01-06T14:46:01.673Z level=INFO source=server.go:429 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 44257"
2026-01-06T14:46:01.928Z time=2026-01-06T14:46:01.926Z level=INFO source=server.go:429 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 36617"
2026-01-06T14:46:02.173Z time=2026-01-06T14:46:02.173Z level=INFO source=server.go:429 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 35851"
2026-01-06T14:46:02.423Z time=2026-01-06T14:46:02.423Z level=INFO source=server.go:429 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 43455"
2026-01-06T14:46:02.673Z time=2026-01-06T14:46:02.673Z level=INFO source=server.go:429 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 39881"
2026-01-06T14:46:02.923Z time=2026-01-06T14:46:02.923Z level=INFO source=server.go:429 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 45497"
2026-01-06T14:46:03.173Z time=2026-01-06T14:46:03.173Z level=INFO source=server.go:429 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 33945"
2026-01-06T14:46:03.423Z time=2026-01-06T14:46:03.423Z level=INFO source=server.go:429 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 33415"
2026-01-06T14:46:03.673Z time=2026-01-06T14:46:03.673Z level=INFO source=server.go:429 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 41883"
2026-01-06T14:46:03.924Z time=2026-01-06T14:46:03.923Z level=INFO source=server.go:429 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 35599"
2026-01-06T14:46:04.173Z time=2026-01-06T14:46:04.173Z level=INFO source=server.go:429 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 34825"
2026-01-06T14:46:04.423Z time=2026-01-06T14:46:04.423Z level=INFO source=server.go:429 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 41857"
2026-01-06T14:46:04.676Z time=2026-01-06T14:46:04.676Z level=INFO source=server.go:429 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 35041"
2026-01-06T14:46:04.924Z time=2026-01-06T14:46:04.923Z level=INFO source=server.go:429 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 33923"
2026-01-06T14:46:05.173Z time=2026-01-06T14:46:05.173Z level=INFO source=server.go:429 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 37359"
2026-01-06T14:46:05.424Z time=2026-01-06T14:46:05.424Z level=INFO source=server.go:429 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 44197"
2026-01-06T14:46:05.673Z time=2026-01-06T14:46:05.673Z level=INFO source=server.go:429 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 36093"
2026-01-06T14:46:05.923Z time=2026-01-06T14:46:05.923Z level=INFO source=server.go:429 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 33603"
2026-01-06T14:46:06.173Z time=2026-01-06T14:46:06.173Z level=INFO source=server.go:429 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 42247"
2026-01-06T14:46:06.423Z time=2026-01-06T14:46:06.423Z level=INFO source=server.go:429 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 36061"
2026-01-06T14:46:06.629Z time=2026-01-06T14:46:06.628Z level=INFO source=server.go:429 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --model /root/.ollama/models/blobs/sha256-df8b6415ce11eeaa85d11f8c4288c489aa3818354d9691d71523bcdffb5f2fa8 --port 40931"
2026-01-06T14:46:06.629Z time=2026-01-06T14:46:06.629Z level=INFO source=sched.go:443 msg="system memory" total="16.0 GiB" free="13.4 GiB" free_swap="18.0 GiB"
2026-01-06T14:46:06.629Z time=2026-01-06T14:46:06.629Z level=INFO source=sched.go:450 msg="gpu memory" id=GPU-2324d0b0-9a1d-8bb1-6241-75acd1170ede library=CUDA available="11.1 GiB" free="11.6 GiB" minimum="457.0 MiB" overhead="0 B"
2026-01-06T14:46:06.629Z time=2026-01-06T14:46:06.629Z level=INFO source=server.go:746 msg="loading model" "model layers"=37 requested=-1
2026-01-06T14:46:06.642Z time=2026-01-06T14:46:06.642Z level=INFO source=runner.go:1405 msg="starting ollama engine"
2026-01-06T14:46:06.642Z time=2026-01-06T14:46:06.642Z level=INFO source=runner.go:1440 msg="Server listening on 127.0.0.1:40931"
2026-01-06T14:46:06.652Z time=2026-01-06T14:46:06.652Z level=INFO source=runner.go:1278 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Disabled KvSize:4096 KvCacheType: NumThreads:8 GPULayers:37[ID:GPU-2324d0b0-9a1d-8bb1-6241-75acd1170ede Layers:37(0..36)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
2026-01-06T14:46:06.690Z time=2026-01-06T14:46:06.690Z level=INFO source=ggml.go:136 msg="" architecture=qwen25vl file_type=Q4_K_M name="" description="" num_tensors=953 num_key_values=36
2026-01-06T14:46:06.695Z load_backend: loaded CPU backend from /usr/lib/ollama/libggml-cpu-haswell.so
2026-01-06T14:46:06.769Z ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
2026-01-06T14:46:06.769Z ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
2026-01-06T14:46:06.769Z ggml_cuda_init: found 1 CUDA devices:
2026-01-06T14:46:06.769Z Device 0: NVIDIA GeForce RTX 3060, compute capability 8.6, VMM: yes, ID: GPU-2324d0b0-9a1d-8bb1-6241-75acd1170ede
2026-01-06T14:46:06.769Z load_backend: loaded CUDA backend from /usr/lib/ollama/cuda_v12/libggml-cuda.so
2026-01-06T14:46:06.769Z time=2026-01-06T14:46:06.769Z level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=500,520,600,610,700,750,800,860,890,900,1200 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc)
2026-01-06T14:46:07.969Z time=2026-01-06T14:46:07.969Z level=INFO source=runner.go:1278 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Disabled KvSize:4096 KvCacheType: NumThreads:8 GPULayers:37[ID:GPU-2324d0b0-9a1d-8bb1-6241-75acd1170ede Layers:37(0..36)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
2026-01-06T14:46:08.916Z time=2026-01-06T14:46:08.916Z level=INFO source=runner.go:1278 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Disabled KvSize:4096 KvCacheType: NumThreads:8 GPULayers:37[ID:GPU-2324d0b0-9a1d-8bb1-6241-75acd1170ede Layers:37(0..36)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
2026-01-06T14:46:08.916Z time=2026-01-06T14:46:08.916Z level=INFO source=ggml.go:482 msg="offloading 36 repeating layers to GPU"
2026-01-06T14:46:08.916Z time=2026-01-06T14:46:08.916Z level=INFO source=ggml.go:489 msg="offloading output layer to GPU"
2026-01-06T14:46:08.916Z time=2026-01-06T14:46:08.916Z level=INFO source=ggml.go:494 msg="offloaded 37/37 layers to GPU"
2026-01-06T14:46:08.916Z time=2026-01-06T14:46:08.916Z level=INFO source=device.go:240 msg="model weights" device=CUDA0 size="3.0 GiB"
2026-01-06T14:46:08.916Z time=2026-01-06T14:46:08.916Z level=INFO source=device.go:245 msg="model weights" device=CPU size="243.4 MiB"
2026-01-06T14:46:08.916Z time=2026-01-06T14:46:08.916Z level=INFO source=device.go:251 msg="kv cache" device=CUDA0 size="144.0 MiB"
2026-01-06T14:46:08.916Z time=2026-01-06T14:46:08.916Z level=INFO source=device.go:262 msg="compute graph" device=CUDA0 size="7.5 GiB"
2026-01-06T14:46:08.917Z time=2026-01-06T14:46:08.916Z level=INFO source=device.go:267 msg="compute graph" device=CPU size="20.3 MiB"
2026-01-06T14:46:08.917Z time=2026-01-06T14:46:08.916Z level=INFO source=device.go:272 msg="total memory" size="10.8 GiB"
2026-01-06T14:46:08.917Z time=2026-01-06T14:46:08.916Z level=INFO source=sched.go:517 msg="loaded runners" count=1
2026-01-06T14:46:08.917Z time=2026-01-06T14:46:08.916Z level=INFO source=server.go:1338 msg="waiting for llama runner to start responding"
2026-01-06T14:46:08.917Z time=2026-01-06T14:46:08.917Z level=INFO source=server.go:1372 msg="waiting for server to become available" status="llm server loading model"
2026-01-06T14:46:09.670Z time=2026-01-06T14:46:09.669Z level=INFO source=server.go:1376 msg="llama runner started in 3.04 seconds"
<!-- gh-comment-id:3714988094 --> @klemonade commented on GitHub (Jan 6, 2026): Here is full log I have reproduced. ``` 2026-01-06T13:55:54.804Z Couldn't find '/root/.ollama/id_ed25519'. Generating new private key. 2026-01-06T13:55:54.806Z Your new public key is: 2026-01-06T13:55:54.806Z ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIM+Cw6oyZnUE6gThwPnQyAoRmHq995u4zW+EfZDOM1cM 2026-01-06T13:55:54.806Z time=2026-01-06T13:55:54.806Z level=INFO source=routes.go:1554 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:INFO OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:500h0m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false OLLAMA_VULKAN:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]" 2026-01-06T13:55:54.806Z time=2026-01-06T13:55:54.806Z level=INFO source=images.go:493 msg="total blobs: 0" 2026-01-06T13:55:54.807Z time=2026-01-06T13:55:54.807Z level=INFO source=images.go:500 msg="total unused blobs removed: 0" 2026-01-06T13:55:54.807Z time=2026-01-06T13:55:54.807Z level=INFO source=routes.go:1607 msg="Listening on [::]:11434 (version 0.13.5)" 2026-01-06T13:55:54.807Z time=2026-01-06T13:55:54.807Z level=INFO source=runner.go:67 msg="discovering available GPUs..." 2026-01-06T13:55:54.808Z time=2026-01-06T13:55:54.808Z level=INFO source=server.go:429 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 35083" 2026-01-06T13:55:54.928Z time=2026-01-06T13:55:54.928Z level=INFO source=server.go:429 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 41097" 2026-01-06T13:55:55.021Z time=2026-01-06T13:55:55.020Z level=INFO source=runner.go:106 msg="experimental Vulkan support disabled. To enable, set OLLAMA_VULKAN=1" 2026-01-06T13:55:55.021Z time=2026-01-06T13:55:55.021Z level=INFO source=server.go:429 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 33135" 2026-01-06T13:55:55.159Z time=2026-01-06T13:55:55.159Z level=INFO source=types.go:42 msg="inference compute" id=GPU-2324d0b0-9a1d-8bb1-6241-75acd1170ede filter_id="" library=CUDA compute=8.6 name=CUDA0 description="NVIDIA GeForce RTX 3060" libdirs=ollama,cuda_v12 driver=12.2 pci_id=0000:07:00.0 type=discrete total="12.0 GiB" available="11.4 GiB" 2026-01-06T13:55:55.159Z time=2026-01-06T13:55:55.159Z level=INFO source=routes.go:1648 msg="entering low vram mode" "total vram"="12.0 GiB" threshold="20.0 GiB" 2026-01-06T13:57:33.748Z [GIN] 2026/01/06 - 13:57:33 | 200 | 45.586µs | 127.0.0.1 | HEAD "/" 2026-01-06T13:57:35.620Z time=2026-01-06T13:57:35.620Z level=INFO source=download.go:177 msg="downloading df8b6415ce11 in 16 200 MB part(s)" 2026-01-06T13:58:07.620Z time=2026-01-06T13:58:07.619Z level=INFO source=download.go:177 msg="downloading a242d8dfdc8f in 1 487 B part(s)" 2026-01-06T13:58:09.208Z time=2026-01-06T13:58:09.208Z level=INFO source=download.go:177 msg="downloading 75357d685f23 in 1 28 B part(s)" 2026-01-06T13:58:11.500Z time=2026-01-06T13:58:11.500Z level=INFO source=download.go:177 msg="downloading 832dd9e00a68 in 1 11 KB part(s)" 2026-01-06T13:58:13.090Z time=2026-01-06T13:58:13.090Z level=INFO source=download.go:177 msg="downloading 401a79d3fd09 in 1 41 B part(s)" 2026-01-06T13:58:14.764Z time=2026-01-06T13:58:14.764Z level=INFO source=download.go:177 msg="downloading 9e7b6c15f976 in 1 567 B part(s)" 2026-01-06T13:58:18.121Z [GIN] 2026/01/06 - 13:58:18 | 200 | 44.372395303s | 127.0.0.1 | POST "/api/pull" 2026-01-06T13:59:29.038Z [GIN] 2026/01/06 - 13:59:29 | 200 | 52.729µs | 10.0.1.116 | GET "/api/version" 2026-01-06T13:59:30.259Z [GIN] 2026/01/06 - 13:59:30 | 200 | 603.037µs | 10.0.1.116 | GET "/api/tags" 2026-01-06T13:59:30.260Z [GIN] 2026/01/06 - 13:59:30 | 200 | 72.116µs | 10.0.1.116 | GET "/api/ps" 2026-01-06T13:59:33.027Z [GIN] 2026/01/06 - 13:59:33 | 200 | 461.259µs | 10.0.1.116 | GET "/api/tags" 2026-01-06T13:59:33.029Z [GIN] 2026/01/06 - 13:59:33 | 200 | 22.692µs | 10.0.1.116 | GET "/api/ps" 2026-01-06T13:59:37.455Z [GIN] 2026/01/06 - 13:59:37 | 200 | 498.91µs | 10.0.1.116 | GET "/api/tags" 2026-01-06T13:59:37.457Z [GIN] 2026/01/06 - 13:59:37 | 200 | 24.917µs | 10.0.1.116 | GET "/api/ps" 2026-01-06T14:00:32.583Z time=2026-01-06T14:00:32.583Z level=INFO source=routes.go:1554 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:INFO OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:500h0m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false OLLAMA_VULKAN:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]" 2026-01-06T14:00:32.583Z time=2026-01-06T14:00:32.583Z level=INFO source=images.go:493 msg="total blobs: 6" 2026-01-06T14:00:32.583Z time=2026-01-06T14:00:32.583Z level=INFO source=images.go:500 msg="total unused blobs removed: 0" 2026-01-06T14:00:32.584Z time=2026-01-06T14:00:32.584Z level=INFO source=routes.go:1607 msg="Listening on [::]:11434 (version 0.13.5)" 2026-01-06T14:00:32.584Z time=2026-01-06T14:00:32.584Z level=INFO source=runner.go:67 msg="discovering available GPUs..." 2026-01-06T14:00:32.585Z time=2026-01-06T14:00:32.585Z level=INFO source=server.go:429 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 40247" 2026-01-06T14:00:32.714Z time=2026-01-06T14:00:32.714Z level=INFO source=server.go:429 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 45287" 2026-01-06T14:00:32.814Z time=2026-01-06T14:00:32.814Z level=INFO source=runner.go:106 msg="experimental Vulkan support disabled. To enable, set OLLAMA_VULKAN=1" 2026-01-06T14:00:32.814Z time=2026-01-06T14:00:32.814Z level=INFO source=server.go:429 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 44955" 2026-01-06T14:00:32.977Z time=2026-01-06T14:00:32.977Z level=INFO source=types.go:42 msg="inference compute" id=GPU-2324d0b0-9a1d-8bb1-6241-75acd1170ede filter_id="" library=CUDA compute=8.6 name=CUDA0 description="NVIDIA GeForce RTX 3060" libdirs=ollama,cuda_v12 driver=12.2 pci_id=0000:07:00.0 type=discrete total="12.0 GiB" available="11.4 GiB" 2026-01-06T14:00:32.977Z time=2026-01-06T14:00:32.977Z level=INFO source=routes.go:1648 msg="entering low vram mode" "total vram"="12.0 GiB" threshold="20.0 GiB" 2026-01-06T14:15:14.773Z [GIN] 2026/01/06 - 14:15:14 | 200 | 58.99µs | 161.35.58.159 | GET "/" 2026-01-06T14:15:46.587Z [GIN] 2026/01/06 - 14:15:46 | 200 | 116.718µs | 161.35.58.159 | GET "/api/ps" 2026-01-06T14:15:47.190Z [GIN] 2026/01/06 - 14:15:47 | 200 | 487.411µs | 161.35.58.159 | GET "/v1/models" 2026-01-06T14:15:47.815Z [GIN] 2026/01/06 - 14:15:47 | 200 | 81.399516ms | 161.35.58.159 | POST "/api/show" 2026-01-06T14:41:22.726Z [GIN] 2026/01/06 - 14:41:22 | 200 | 423.094µs | 10.0.1.184 | GET "/api/tags" 2026-01-06T14:43:03.394Z time=2026-01-06T14:43:03.394Z level=INFO source=download.go:177 msg="downloading b36530292268 in 16 156 MB part(s)" 2026-01-06T14:43:26.906Z time=2026-01-06T14:43:26.906Z level=INFO source=download.go:177 msg="downloading 636353bf6b2f in 1 1.4 KB part(s)" 2026-01-06T14:43:28.598Z time=2026-01-06T14:43:28.598Z level=INFO source=download.go:177 msg="downloading d18a5cc71b84 in 1 11 KB part(s)" 2026-01-06T14:43:30.301Z time=2026-01-06T14:43:30.301Z level=INFO source=download.go:177 msg="downloading 25b023c48a6b in 1 111 B part(s)" 2026-01-06T14:43:34.370Z time=2026-01-06T14:43:34.370Z level=INFO source=download.go:177 msg="downloading 9d085367cf15 in 1 487 B part(s)" 2026-01-06T14:43:37.713Z [GIN] 2026/01/06 - 14:43:37 | 200 | 35.904312205s | 10.0.1.184 | POST "/api/pull" 2026-01-06T14:43:37.748Z [GIN] 2026/01/06 - 14:43:37 | 200 | 654.441µs | 10.0.1.184 | GET "/api/tags" 2026-01-06T14:43:37.750Z [GIN] 2026/01/06 - 14:43:37 | 200 | 20.609µs | 10.0.1.184 | GET "/api/ps" 2026-01-06T14:44:30.040Z [GIN] 2026/01/06 - 14:44:30 | 200 | 607.752µs | 10.0.1.184 | GET "/api/tags" 2026-01-06T14:44:30.042Z [GIN] 2026/01/06 - 14:44:30 | 200 | 22.673µs | 10.0.1.184 | GET "/api/ps" 2026-01-06T14:44:32.370Z time=2026-01-06T14:44:32.370Z level=INFO source=server.go:429 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 42529" 2026-01-06T14:44:32.581Z time=2026-01-06T14:44:32.580Z level=INFO source=server.go:245 msg="enabling flash attention" 2026-01-06T14:44:32.581Z time=2026-01-06T14:44:32.581Z level=INFO source=server.go:429 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --model /root/.ollama/models/blobs/sha256-b365302922688f3a3c9ac8e3c00ab97a152cac0cdbf4eb5a734ecb483ae3e511 --port 39805" 2026-01-06T14:44:32.581Z time=2026-01-06T14:44:32.581Z level=INFO source=sched.go:443 msg="system memory" total="16.0 GiB" free="13.5 GiB" free_swap="18.0 GiB" 2026-01-06T14:44:32.581Z time=2026-01-06T14:44:32.581Z level=INFO source=sched.go:450 msg="gpu memory" id=GPU-2324d0b0-9a1d-8bb1-6241-75acd1170ede library=CUDA available="11.1 GiB" free="11.6 GiB" minimum="457.0 MiB" overhead="0 B" 2026-01-06T14:44:32.581Z time=2026-01-06T14:44:32.581Z level=INFO source=server.go:746 msg="loading model" "model layers"=37 requested=-1 2026-01-06T14:44:32.594Z time=2026-01-06T14:44:32.594Z level=INFO source=runner.go:1405 msg="starting ollama engine" 2026-01-06T14:44:32.594Z time=2026-01-06T14:44:32.594Z level=INFO source=runner.go:1440 msg="Server listening on 127.0.0.1:39805" 2026-01-06T14:44:32.604Z time=2026-01-06T14:44:32.604Z level=INFO source=runner.go:1278 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:4096 KvCacheType: NumThreads:8 GPULayers:37[ID:GPU-2324d0b0-9a1d-8bb1-6241-75acd1170ede Layers:37(0..36)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" 2026-01-06T14:44:32.637Z time=2026-01-06T14:44:32.637Z level=INFO source=ggml.go:136 msg="" architecture=qwen3 file_type=Q4_K_M name=scb10x/typhoon2.5-qwen3-4b-preview description="" num_tensors=398 num_key_values=28 2026-01-06T14:44:32.642Z load_backend: loaded CPU backend from /usr/lib/ollama/libggml-cpu-haswell.so 2026-01-06T14:44:32.713Z ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no 2026-01-06T14:44:32.713Z ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no 2026-01-06T14:44:32.713Z ggml_cuda_init: found 1 CUDA devices: 2026-01-06T14:44:32.713Z Device 0: NVIDIA GeForce RTX 3060, compute capability 8.6, VMM: yes, ID: GPU-2324d0b0-9a1d-8bb1-6241-75acd1170ede 2026-01-06T14:44:32.713Z load_backend: loaded CUDA backend from /usr/lib/ollama/cuda_v12/libggml-cuda.so 2026-01-06T14:44:32.713Z time=2026-01-06T14:44:32.713Z level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=500,520,600,610,700,750,800,860,890,900,1200 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc) 2026-01-06T14:44:32.849Z time=2026-01-06T14:44:32.848Z level=INFO source=runner.go:1278 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:4096 KvCacheType: NumThreads:8 GPULayers:37[ID:GPU-2324d0b0-9a1d-8bb1-6241-75acd1170ede Layers:37(0..36)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" 2026-01-06T14:44:32.967Z time=2026-01-06T14:44:32.966Z level=INFO source=runner.go:1278 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:4096 KvCacheType: NumThreads:8 GPULayers:37[ID:GPU-2324d0b0-9a1d-8bb1-6241-75acd1170ede Layers:37(0..36)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" 2026-01-06T14:44:32.967Z time=2026-01-06T14:44:32.966Z level=INFO source=ggml.go:482 msg="offloading 36 repeating layers to GPU" 2026-01-06T14:44:32.967Z time=2026-01-06T14:44:32.967Z level=INFO source=ggml.go:489 msg="offloading output layer to GPU" 2026-01-06T14:44:32.967Z time=2026-01-06T14:44:32.967Z level=INFO source=ggml.go:494 msg="offloaded 37/37 layers to GPU" 2026-01-06T14:44:32.967Z time=2026-01-06T14:44:32.967Z level=INFO source=device.go:240 msg="model weights" device=CUDA0 size="2.3 GiB" 2026-01-06T14:44:32.967Z time=2026-01-06T14:44:32.967Z level=INFO source=device.go:245 msg="model weights" device=CPU size="304.3 MiB" 2026-01-06T14:44:32.967Z time=2026-01-06T14:44:32.967Z level=INFO source=device.go:251 msg="kv cache" device=CUDA0 size="576.0 MiB" 2026-01-06T14:44:32.967Z time=2026-01-06T14:44:32.967Z level=INFO source=device.go:262 msg="compute graph" device=CUDA0 size="145.0 MiB" 2026-01-06T14:44:32.967Z time=2026-01-06T14:44:32.967Z level=INFO source=device.go:267 msg="compute graph" device=CPU size="5.0 MiB" 2026-01-06T14:44:32.967Z time=2026-01-06T14:44:32.967Z level=INFO source=device.go:272 msg="total memory" size="3.3 GiB" 2026-01-06T14:44:32.967Z time=2026-01-06T14:44:32.967Z level=INFO source=sched.go:517 msg="loaded runners" count=1 2026-01-06T14:44:32.967Z time=2026-01-06T14:44:32.967Z level=INFO source=server.go:1338 msg="waiting for llama runner to start responding" 2026-01-06T14:44:32.981Z time=2026-01-06T14:44:32.981Z level=INFO source=server.go:1372 msg="waiting for server to become available" status="llm server loading model" 2026-01-06T14:44:33.482Z time=2026-01-06T14:44:33.482Z level=INFO source=server.go:1376 msg="llama runner started in 0.90 seconds" 2026-01-06T14:44:33.666Z [GIN] 2026/01/06 - 14:44:33 | 200 | 1.433989555s | 10.0.1.184 | POST "/api/chat" 2026-01-06T14:44:34.359Z [GIN] 2026/01/06 - 14:44:34 | 200 | 679.295647ms | 10.0.1.184 | POST "/api/chat" 2026-01-06T14:44:34.713Z [GIN] 2026/01/06 - 14:44:34 | 200 | 351.121022ms | 10.0.1.184 | POST "/api/chat" 2026-01-06T14:44:35.016Z [GIN] 2026/01/06 - 14:44:35 | 200 | 288.62871ms | 10.0.1.184 | POST "/api/chat" 2026-01-06T14:45:32.440Z ggml_backend_cuda_device_get_memory device GPU-2324d0b0-9a1d-8bb1-6241-75acd1170ede utilizing NVML memory reporting free: 8989769728 total: 12884901888 2026-01-06T14:45:32.468Z time=2026-01-06T14:45:32.468Z level=INFO source=sched.go:583 msg="updated VRAM based on existing loaded models" gpu=GPU-2324d0b0-9a1d-8bb1-6241-75acd1170ede library=CUDA total="12.0 GiB" available="8.4 GiB" 2026-01-06T14:45:32.549Z time=2026-01-06T14:45:32.548Z level=INFO source=server.go:429 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --model /root/.ollama/models/blobs/sha256-df8b6415ce11eeaa85d11f8c4288c489aa3818354d9691d71523bcdffb5f2fa8 --port 34109" 2026-01-06T14:45:32.549Z time=2026-01-06T14:45:32.549Z level=INFO source=sched.go:443 msg="system memory" total="16.0 GiB" free="13.0 GiB" free_swap="18.0 GiB" 2026-01-06T14:45:32.549Z time=2026-01-06T14:45:32.549Z level=INFO source=sched.go:450 msg="gpu memory" id=GPU-2324d0b0-9a1d-8bb1-6241-75acd1170ede library=CUDA available="7.9 GiB" free="8.4 GiB" minimum="457.0 MiB" overhead="0 B" 2026-01-06T14:45:32.549Z time=2026-01-06T14:45:32.549Z level=INFO source=server.go:746 msg="loading model" "model layers"=37 requested=-1 2026-01-06T14:45:32.563Z time=2026-01-06T14:45:32.563Z level=INFO source=runner.go:1405 msg="starting ollama engine" 2026-01-06T14:45:32.563Z time=2026-01-06T14:45:32.563Z level=INFO source=runner.go:1440 msg="Server listening on 127.0.0.1:34109" 2026-01-06T14:45:32.571Z time=2026-01-06T14:45:32.571Z level=INFO source=runner.go:1278 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Disabled KvSize:4096 KvCacheType: NumThreads:8 GPULayers:37[ID:GPU-2324d0b0-9a1d-8bb1-6241-75acd1170ede Layers:37(0..36)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" 2026-01-06T14:45:32.609Z time=2026-01-06T14:45:32.609Z level=INFO source=ggml.go:136 msg="" architecture=qwen25vl file_type=Q4_K_M name="" description="" num_tensors=953 num_key_values=36 2026-01-06T14:45:32.615Z load_backend: loaded CPU backend from /usr/lib/ollama/libggml-cpu-haswell.so 2026-01-06T14:45:32.675Z ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no 2026-01-06T14:45:32.676Z ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no 2026-01-06T14:45:32.676Z ggml_cuda_init: found 1 CUDA devices: 2026-01-06T14:45:32.676Z Device 0: NVIDIA GeForce RTX 3060, compute capability 8.6, VMM: yes, ID: GPU-2324d0b0-9a1d-8bb1-6241-75acd1170ede 2026-01-06T14:45:32.676Z load_backend: loaded CUDA backend from /usr/lib/ollama/cuda_v12/libggml-cuda.so 2026-01-06T14:45:32.676Z time=2026-01-06T14:45:32.676Z level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=500,520,600,610,700,750,800,860,890,900,1200 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc) 2026-01-06T14:45:33.856Z time=2026-01-06T14:45:33.856Z level=INFO source=server.go:1018 msg="model requires more gpu memory than is currently available, evicting a model to make space" "loaded layers"=9 2026-01-06T14:45:33.856Z time=2026-01-06T14:45:33.856Z level=INFO source=runner.go:1278 msg=load request="{Operation:close LoraPath:[] Parallel:0 BatchSize:0 FlashAttention:Disabled KvSize:0 KvCacheType: NumThreads:0 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" 2026-01-06T14:45:33.856Z time=2026-01-06T14:45:33.856Z level=INFO source=device.go:240 msg="model weights" device=CUDA0 size="3.0 GiB" 2026-01-06T14:45:33.856Z time=2026-01-06T14:45:33.856Z level=INFO source=device.go:245 msg="model weights" device=CPU size="243.4 MiB" 2026-01-06T14:45:33.856Z time=2026-01-06T14:45:33.856Z level=INFO source=device.go:251 msg="kv cache" device=CUDA0 size="144.0 MiB" 2026-01-06T14:45:33.856Z time=2026-01-06T14:45:33.856Z level=INFO source=device.go:262 msg="compute graph" device=CUDA0 size="7.5 GiB" 2026-01-06T14:45:33.856Z time=2026-01-06T14:45:33.856Z level=INFO source=device.go:267 msg="compute graph" device=CPU size="20.3 MiB" 2026-01-06T14:45:33.856Z time=2026-01-06T14:45:33.856Z level=INFO source=device.go:272 msg="total memory" size="10.8 GiB" 2026-01-06T14:45:33.879Z ggml_backend_cuda_device_get_memory device GPU-2324d0b0-9a1d-8bb1-6241-75acd1170ede utilizing NVML memory reporting free: 8851292160 total: 12884901888 2026-01-06T14:45:34.133Z time=2026-01-06T14:45:34.133Z level=INFO source=server.go:429 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 34319" 2026-01-06T14:45:34.237Z time=2026-01-06T14:45:34.237Z level=INFO source=server.go:429 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 40637" 2026-01-06T14:45:34.423Z time=2026-01-06T14:45:34.423Z level=INFO source=sched.go:443 msg="system memory" total="16.0 GiB" free="12.3 GiB" free_swap="18.0 GiB" 2026-01-06T14:45:34.423Z time=2026-01-06T14:45:34.423Z level=INFO source=sched.go:450 msg="gpu memory" id=GPU-2324d0b0-9a1d-8bb1-6241-75acd1170ede library=CUDA available="11.0 GiB" free="11.4 GiB" minimum="457.0 MiB" overhead="0 B" 2026-01-06T14:45:34.423Z time=2026-01-06T14:45:34.423Z level=INFO source=server.go:746 msg="loading model" "model layers"=37 requested=-1 2026-01-06T14:45:34.424Z time=2026-01-06T14:45:34.424Z level=INFO source=runner.go:1278 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Disabled KvSize:4096 KvCacheType: NumThreads:8 GPULayers:37[ID:GPU-2324d0b0-9a1d-8bb1-6241-75acd1170ede Layers:37(0..36)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" 2026-01-06T14:45:35.022Z time=2026-01-06T14:45:35.022Z level=INFO source=runner.go:1278 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Disabled KvSize:4096 KvCacheType: NumThreads:8 GPULayers:37[ID:GPU-2324d0b0-9a1d-8bb1-6241-75acd1170ede Layers:37(0..36)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" 2026-01-06T14:45:35.929Z time=2026-01-06T14:45:35.929Z level=INFO source=runner.go:1278 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Disabled KvSize:4096 KvCacheType: NumThreads:8 GPULayers:37[ID:GPU-2324d0b0-9a1d-8bb1-6241-75acd1170ede Layers:37(0..36)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" 2026-01-06T14:45:35.929Z time=2026-01-06T14:45:35.929Z level=INFO source=ggml.go:482 msg="offloading 36 repeating layers to GPU" 2026-01-06T14:45:35.929Z time=2026-01-06T14:45:35.929Z level=INFO source=ggml.go:489 msg="offloading output layer to GPU" 2026-01-06T14:45:35.929Z time=2026-01-06T14:45:35.929Z level=INFO source=ggml.go:494 msg="offloaded 37/37 layers to GPU" 2026-01-06T14:45:35.930Z time=2026-01-06T14:45:35.929Z level=INFO source=device.go:240 msg="model weights" device=CUDA0 size="3.0 GiB" 2026-01-06T14:45:35.930Z time=2026-01-06T14:45:35.929Z level=INFO source=device.go:245 msg="model weights" device=CPU size="243.4 MiB" 2026-01-06T14:45:35.930Z time=2026-01-06T14:45:35.929Z level=INFO source=device.go:251 msg="kv cache" device=CUDA0 size="144.0 MiB" 2026-01-06T14:45:35.930Z time=2026-01-06T14:45:35.929Z level=INFO source=device.go:262 msg="compute graph" device=CUDA0 size="7.5 GiB" 2026-01-06T14:45:35.930Z time=2026-01-06T14:45:35.929Z level=INFO source=device.go:267 msg="compute graph" device=CPU size="20.3 MiB" 2026-01-06T14:45:35.930Z time=2026-01-06T14:45:35.929Z level=INFO source=device.go:272 msg="total memory" size="10.8 GiB" 2026-01-06T14:45:35.930Z time=2026-01-06T14:45:35.929Z level=INFO source=sched.go:517 msg="loaded runners" count=1 2026-01-06T14:45:35.930Z time=2026-01-06T14:45:35.929Z level=INFO source=server.go:1338 msg="waiting for llama runner to start responding" 2026-01-06T14:45:35.930Z time=2026-01-06T14:45:35.930Z level=INFO source=server.go:1372 msg="waiting for server to become available" status="llm server loading model" 2026-01-06T14:45:36.683Z time=2026-01-06T14:45:36.683Z level=INFO source=server.go:1376 msg="llama runner started in 4.13 seconds" 2026-01-06T14:46:00.369Z ggml.c:4081: GGML_ASSERT(a->ne[2] * 4 == b->ne[0]) failed 2026-01-06T14:46:00.400Z /usr/bin/ollama(+0x110c8d8)[0x57b24d04d8d8] 2026-01-06T14:46:00.400Z /usr/bin/ollama(+0x110ccb7)[0x57b24d04dcb7] 2026-01-06T14:46:00.400Z /usr/bin/ollama(+0x110ce3d)[0x57b24d04de3d] 2026-01-06T14:46:00.400Z /usr/bin/ollama(+0x1114f1e)[0x57b24d055f1e] 2026-01-06T14:46:00.400Z /usr/bin/ollama(+0x10c4011)[0x57b24d005011] 2026-01-06T14:46:00.400Z /usr/bin/ollama(+0x37e681)[0x57b24c2bf681] 2026-01-06T14:46:00.428Z SIGABRT: abort 2026-01-06T14:46:00.428Z PC=0x759a3b1c3b2c m=7 sigcode=18446744073709551610 2026-01-06T14:46:00.428Z signal arrived during cgo execution 2026-01-06T14:46:00.428Z goroutine 8 gp=0xc000582700 m=7 mp=0xc000580008 [syscall]: 2026-01-06T14:46:00.428Z runtime.cgocall(0x57b24d004fc0, 0xc000b03138) 2026-01-06T14:46:00.428Z runtime/cgocall.go:167 +0x4b fp=0xc000b03110 sp=0xc000b030d8 pc=0x57b24c2b46eb 2026-01-06T14:46:00.428Z github.com/ollama/ollama/ml/backend/ggml._Cfunc_ggml_rope_multi(0x7599d85fa540, 0x7599d88f5e10, 0x7599d88f5ca0, 0x0, 0x80, 0xc0174882d0, 0x8, 0x20000, 0x49742400, 0x3f800000, ...) 2026-01-06T14:46:00.428Z _cgo_gotypes.go:2066 +0x4b fp=0xc000b03138 sp=0xc000b03110 pc=0x57b24c6ee8ab 2026-01-06T14:46:00.428Z github.com/ollama/ollama/ml/backend/ggml.(*Tensor).RoPE.func2(...) 2026-01-06T14:46:00.428Z github.com/ollama/ollama/ml/backend/ggml/ggml.go:1543 2026-01-06T14:46:00.428Z github.com/ollama/ollama/ml/backend/ggml.(*Tensor).RoPE(0xc0006622a0, {0x57b24d878250, 0xc0002e2e40}, {0x57b24d882b20, 0xc000662288}, 0x80, 0x49742400, 0x3f800000, {0xc00100f530, 0x1, ...}) 2026-01-06T14:46:00.428Z github.com/ollama/ollama/ml/backend/ggml/ggml.go:1543 +0x61a fp=0xc000b03290 sp=0xc000b03138 pc=0x57b24c6fe97a 2026-01-06T14:46:00.428Z github.com/ollama/ollama/ml/nn.RoPE({0x57b24d878250?, 0xc0002e2e40?}, {0x57b24d882b20?, 0xc0006622a0?}, {0x57b24d882b20?, 0xc000662288?}, 0x57b24c6fe096?, 0xd85fa540?, 0x7599?, {0xc00100f530, ...}) 2026-01-06T14:46:00.428Z github.com/ollama/ollama/ml/nn/rope.go:16 +0x86 fp=0xc000b032f0 sp=0xc000b03290 pc=0x57b24c7317c6 2026-01-06T14:46:00.428Z github.com/ollama/ollama/model/models/qwen25vl.TextOptions.applyRotaryPositionEmbeddings({0x800, 0x10, 0x2, 0x80, 0x1f400, 0x358637bd, 0x49742400, 0x3f800000, {0xc003e67860, 0x3, ...}}, ...) 2026-01-06T14:46:00.428Z github.com/ollama/ollama/model/models/qwen25vl/model_text.go:21 +0x172 fp=0xc000b03378 sp=0xc000b032f0 pc=0x57b24c7b84b2 2026-01-06T14:46:00.428Z github.com/ollama/ollama/model/models/qwen25vl.(*TextModel).Shift(...) 2026-01-06T14:46:00.428Z github.com/ollama/ollama/model/models/qwen25vl/model_text.go:94 2026-01-06T14:46:00.428Z github.com/ollama/ollama/model/models/qwen25vl.(*TextModel).Shift-fm({0x57b24d878250?, 0xc0002e2e40?}, 0xc0002e2e40?, {0x57b24d882b20?, 0xc0006622a0?}, {0x57b24d882b20?, 0xc000662288?}) 2026-01-06T14:46:00.428Z <autogenerated>:1 +0x14f fp=0xc000b03460 sp=0xc000b03378 pc=0x57b24c7bdf4f 2026-01-06T14:46:00.428Z github.com/ollama/ollama/kvcache.(*Causal).shift(0xc0001f0800, 0x0, 0x4, 0xfffff4c4) 2026-01-06T14:46:00.428Z github.com/ollama/ollama/kvcache/causal.go:599 +0x507 fp=0xc000b035c0 sp=0xc000b03460 pc=0x57b24c6e56c7 2026-01-06T14:46:00.428Z github.com/ollama/ollama/kvcache.(*Causal).Remove(0xc0001f0800, 0x0, 0x4, 0xb40) 2026-01-06T14:46:00.428Z github.com/ollama/ollama/kvcache/causal.go:659 +0x285 fp=0xc000b03658 sp=0xc000b035c0 pc=0x57b24c6e5aa5 2026-01-06T14:46:00.428Z github.com/ollama/ollama/runner/ollamarunner.(*InputCache).ShiftCacheSlot(0xc0043ad140, 0xc0043ad100, 0x4) 2026-01-06T14:46:00.428Z github.com/ollama/ollama/runner/ollamarunner/cache.go:290 +0x34c fp=0xc000b037f0 sp=0xc000b03658 pc=0x57b24c7cc8ec 2026-01-06T14:46:00.428Z github.com/ollama/ollama/runner/ollamarunner.(*Server).forwardBatch(_, {0x4bd, {0x57b24d878250, 0xc0002e2c80}, {0x57b24d882b20, 0xc000662228}, {0xc00100f2d8, 0x1, 0x1}, {{0x57b24d882b20, ...}, ...}, ...}) 2026-01-06T14:46:00.429Z github.com/ollama/ollama/runner/ollamarunner/runner.go:565 +0xec5 fp=0xc000b03b58 sp=0xc000b037f0 pc=0x57b24c7d0085 2026-01-06T14:46:00.429Z github.com/ollama/ollama/runner/ollamarunner.(*Server).run(0xc000226f00, {0x57b24d86d410, 0xc000533540}) 2026-01-06T14:46:00.429Z github.com/ollama/ollama/runner/ollamarunner/runner.go:452 +0x18c fp=0xc000b03fb8 sp=0xc000b03b58 pc=0x57b24c7cef6c 2026-01-06T14:46:00.429Z github.com/ollama/ollama/runner/ollamarunner.Execute.gowrap1() 2026-01-06T14:46:00.429Z github.com/ollama/ollama/runner/ollamarunner/runner.go:1418 +0x28 fp=0xc000b03fe0 sp=0xc000b03fb8 pc=0x57b24c7d8668 2026-01-06T14:46:00.429Z runtime.goexit({}) 2026-01-06T14:46:00.429Z runtime/asm_amd64.s:1700 +0x1 fp=0xc000b03fe8 sp=0xc000b03fe0 pc=0x57b24c2bfa01 2026-01-06T14:46:00.429Z created by github.com/ollama/ollama/runner/ollamarunner.Execute in goroutine 1 2026-01-06T14:46:00.429Z github.com/ollama/ollama/runner/ollamarunner/runner.go:1418 +0x4c9 2026-01-06T14:46:00.429Z goroutine 1 gp=0xc000002380 m=nil [IO wait]: 2026-01-06T14:46:00.429Z runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) 2026-01-06T14:46:00.429Z runtime/proc.go:435 +0xce fp=0xc000b05790 sp=0xc000b05770 pc=0x57b24c2b7b6e 2026-01-06T14:46:00.429Z runtime.netpollblock(0xc00051f7e0?, 0x4c2512a6?, 0xb2?) 2026-01-06T14:46:00.429Z runtime/netpoll.go:575 +0xf7 fp=0xc000b057c8 sp=0xc000b05790 pc=0x57b24c27ce97 2026-01-06T14:46:00.429Z internal/poll.runtime_pollWait(0x759a3ae56eb0, 0x72) 2026-01-06T14:46:00.429Z runtime/netpoll.go:351 +0x85 fp=0xc000b057e8 sp=0xc000b057c8 pc=0x57b24c2b6d85 2026-01-06T14:46:00.429Z internal/poll.(*pollDesc).wait(0xc00062ff00?, 0x900000036?, 0x0) 2026-01-06T14:46:00.429Z internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc000b05810 sp=0xc000b057e8 pc=0x57b24c33ef07 2026-01-06T14:46:00.429Z internal/poll.(*pollDesc).waitRead(...) 2026-01-06T14:46:00.429Z internal/poll/fd_poll_runtime.go:89 2026-01-06T14:46:00.429Z internal/poll.(*FD).Accept(0xc00062ff00) 2026-01-06T14:46:00.429Z internal/poll/fd_unix.go:620 +0x295 fp=0xc000b058b8 sp=0xc000b05810 pc=0x57b24c3442d5 2026-01-06T14:46:00.429Z net.(*netFD).accept(0xc00062ff00) 2026-01-06T14:46:00.429Z net/fd_unix.go:172 +0x29 fp=0xc000b05970 sp=0xc000b058b8 pc=0x57b24c3b71a9 2026-01-06T14:46:00.429Z net.(*TCPListener).accept(0xc000415480) 2026-01-06T14:46:00.429Z net/tcpsock_posix.go:159 +0x1b fp=0xc000b059c0 sp=0xc000b05970 pc=0x57b24c3ccb5b 2026-01-06T14:46:00.429Z net.(*TCPListener).Accept(0xc000415480) 2026-01-06T14:46:00.429Z net/tcpsock.go:380 +0x30 fp=0xc000b059f0 sp=0xc000b059c0 pc=0x57b24c3cba10 2026-01-06T14:46:00.429Z net/http.(*onceCloseListener).Accept(0xc0004b43f0?) 2026-01-06T14:46:00.429Z <autogenerated>:1 +0x24 fp=0xc000b05a08 sp=0xc000b059f0 pc=0x57b24c5e31e4 2026-01-06T14:46:00.429Z net/http.(*Server).Serve(0xc00050ef00, {0x57b24d86adc0, 0xc000415480}) 2026-01-06T14:46:00.429Z net/http/server.go:3424 +0x30c fp=0xc000b05b38 sp=0xc000b05a08 pc=0x57b24c5baaac 2026-01-06T14:46:00.429Z github.com/ollama/ollama/runner/ollamarunner.Execute({0xc0000340a0, 0x4, 0x4}) 2026-01-06T14:46:00.429Z github.com/ollama/ollama/runner/ollamarunner/runner.go:1441 +0x94e fp=0xc000b05d08 sp=0xc000b05b38 pc=0x57b24c7d83ee 2026-01-06T14:46:00.429Z github.com/ollama/ollama/runner.Execute({0xc000034080?, 0x0?, 0x0?}) 2026-01-06T14:46:00.429Z github.com/ollama/ollama/runner/runner.go:20 +0xc9 fp=0xc000b05d30 sp=0xc000b05d08 pc=0x57b24c7d8ce9 2026-01-06T14:46:00.429Z github.com/ollama/ollama/cmd.NewCLI.func2(0xc00050ed00?, {0x57b24d34d0ad?, 0x4?, 0x57b24d34d0b1?}) 2026-01-06T14:46:00.429Z github.com/ollama/ollama/cmd/cmd.go:1841 +0x45 fp=0xc000b05d58 sp=0xc000b05d30 pc=0x57b24cf95f25 2026-01-06T14:46:00.429Z github.com/spf13/cobra.(*Command).execute(0xc00067b808, {0xc0005334a0, 0x5, 0x5}) 2026-01-06T14:46:00.429Z github.com/spf13/cobra@v1.7.0/command.go:940 +0x85c fp=0xc000b05e78 sp=0xc000b05d58 pc=0x57b24c4307fc 2026-01-06T14:46:00.429Z github.com/spf13/cobra.(*Command).ExecuteC(0xc00054c908) 2026-01-06T14:46:00.429Z github.com/spf13/cobra@v1.7.0/command.go:1068 +0x3a5 fp=0xc000b05f30 sp=0xc000b05e78 pc=0x57b24c431045 2026-01-06T14:46:00.429Z github.com/spf13/cobra.(*Command).Execute(...) 2026-01-06T14:46:00.429Z github.com/spf13/cobra@v1.7.0/command.go:992 2026-01-06T14:46:00.429Z github.com/spf13/cobra.(*Command).ExecuteContext(...) 2026-01-06T14:46:00.429Z github.com/spf13/cobra@v1.7.0/command.go:985 2026-01-06T14:46:00.429Z main.main() 2026-01-06T14:46:00.429Z github.com/ollama/ollama/main.go:12 +0x4d fp=0xc000b05f50 sp=0xc000b05f30 pc=0x57b24cf96a0d 2026-01-06T14:46:00.429Z runtime.main() 2026-01-06T14:46:00.429Z runtime/proc.go:283 +0x29d fp=0xc000b05fe0 sp=0xc000b05f50 pc=0x57b24c28451d 2026-01-06T14:46:00.429Z runtime.goexit({}) 2026-01-06T14:46:00.429Z runtime/asm_amd64.s:1700 +0x1 fp=0xc000b05fe8 sp=0xc000b05fe0 pc=0x57b24c2bfa01 2026-01-06T14:46:00.429Z goroutine 2 gp=0xc000002e00 m=nil [force gc (idle)]: 2026-01-06T14:46:00.429Z runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) 2026-01-06T14:46:00.429Z runtime/proc.go:435 +0xce fp=0xc000072fa8 sp=0xc000072f88 pc=0x57b24c2b7b6e 2026-01-06T14:46:00.429Z runtime.goparkunlock(...) 2026-01-06T14:46:00.429Z runtime/proc.go:441 2026-01-06T14:46:00.429Z runtime.forcegchelper() 2026-01-06T14:46:00.429Z runtime/proc.go:348 +0xb8 fp=0xc000072fe0 sp=0xc000072fa8 pc=0x57b24c284858 2026-01-06T14:46:00.429Z runtime.goexit({}) 2026-01-06T14:46:00.429Z runtime/asm_amd64.s:1700 +0x1 fp=0xc000072fe8 sp=0xc000072fe0 pc=0x57b24c2bfa01 2026-01-06T14:46:00.429Z created by runtime.init.7 in goroutine 1 2026-01-06T14:46:00.429Z runtime/proc.go:336 +0x1a 2026-01-06T14:46:00.429Z goroutine 3 gp=0xc000003340 m=nil [GC sweep wait]: 2026-01-06T14:46:00.429Z runtime.gopark(0x57b24e13f701?, 0x0?, 0x0?, 0x0?, 0x0?) 2026-01-06T14:46:00.429Z runtime/proc.go:435 +0xce fp=0xc000073780 sp=0xc000073760 pc=0x57b24c2b7b6e 2026-01-06T14:46:00.429Z runtime.goparkunlock(...) 2026-01-06T14:46:00.429Z runtime/proc.go:441 2026-01-06T14:46:00.429Z runtime.bgsweep(0xc00007e000) 2026-01-06T14:46:00.429Z runtime/mgcsweep.go:316 +0xdf fp=0xc0000737c8 sp=0xc000073780 pc=0x57b24c26efff 2026-01-06T14:46:00.429Z runtime.gcenable.gowrap1() 2026-01-06T14:46:00.429Z runtime/mgc.go:204 +0x25 fp=0xc0000737e0 sp=0xc0000737c8 pc=0x57b24c2633e5 2026-01-06T14:46:00.429Z runtime.goexit({}) 2026-01-06T14:46:00.429Z runtime/asm_amd64.s:1700 +0x1 fp=0xc0000737e8 sp=0xc0000737e0 pc=0x57b24c2bfa01 2026-01-06T14:46:00.429Z created by runtime.gcenable in goroutine 1 2026-01-06T14:46:00.429Z runtime/mgc.go:204 +0x66 2026-01-06T14:46:00.429Z goroutine 4 gp=0xc000003500 m=nil [GC scavenge wait]: 2026-01-06T14:46:00.429Z runtime.gopark(0x6cfcb6?, 0x6931a1?, 0x0?, 0x0?, 0x0?) 2026-01-06T14:46:00.429Z runtime/proc.go:435 +0xce fp=0xc000073f78 sp=0xc000073f58 pc=0x57b24c2b7b6e 2026-01-06T14:46:00.429Z runtime.goparkunlock(...) 2026-01-06T14:46:00.429Z runtime/proc.go:441 2026-01-06T14:46:00.429Z runtime.(*scavengerState).park(0x57b24e141280) 2026-01-06T14:46:00.429Z runtime/mgcscavenge.go:425 +0x49 fp=0xc000073fa8 sp=0xc000073f78 pc=0x57b24c26ca49 2026-01-06T14:46:00.429Z runtime.bgscavenge(0xc00007e000) 2026-01-06T14:46:00.429Z runtime/mgcscavenge.go:658 +0x59 fp=0xc000073fc8 sp=0xc000073fa8 pc=0x57b24c26cfd9 2026-01-06T14:46:00.429Z runtime.gcenable.gowrap2() 2026-01-06T14:46:00.429Z runtime/mgc.go:205 +0x25 fp=0xc000073fe0 sp=0xc000073fc8 pc=0x57b24c263385 2026-01-06T14:46:00.430Z runtime.goexit({}) 2026-01-06T14:46:00.430Z runtime/asm_amd64.s:1700 +0x1 fp=0xc000073fe8 sp=0xc000073fe0 pc=0x57b24c2bfa01 2026-01-06T14:46:00.430Z created by runtime.gcenable in goroutine 1 2026-01-06T14:46:00.430Z runtime/mgc.go:205 +0xa5 2026-01-06T14:46:00.430Z goroutine 5 gp=0xc000003dc0 m=nil [finalizer wait]: 2026-01-06T14:46:00.430Z runtime.gopark(0x1b8?, 0x57b24d849020?, 0x1?, 0x23?, 0x57b24c2bda14?) 2026-01-06T14:46:00.430Z runtime/proc.go:435 +0xce fp=0xc000072630 sp=0xc000072610 pc=0x57b24c2b7b6e 2026-01-06T14:46:00.430Z runtime.runfinq() 2026-01-06T14:46:00.430Z runtime/mfinal.go:196 +0x107 fp=0xc0000727e0 sp=0xc000072630 pc=0x57b24c2623a7 2026-01-06T14:46:00.430Z runtime.goexit({}) 2026-01-06T14:46:00.430Z runtime/asm_amd64.s:1700 +0x1 fp=0xc0000727e8 sp=0xc0000727e0 pc=0x57b24c2bfa01 2026-01-06T14:46:00.430Z created by runtime.createfing in goroutine 1 2026-01-06T14:46:00.430Z runtime/mfinal.go:166 +0x3d 2026-01-06T14:46:00.430Z goroutine 6 gp=0xc0001ce8c0 m=nil [chan receive]: 2026-01-06T14:46:00.430Z runtime.gopark(0xc000223680?, 0xc001002018?, 0x60?, 0x47?, 0x57b24c39dde8?) 2026-01-06T14:46:00.430Z runtime/proc.go:435 +0xce fp=0xc000074718 sp=0xc0000746f8 pc=0x57b24c2b7b6e 2026-01-06T14:46:00.430Z runtime.chanrecv(0xc00003e380, 0x0, 0x1) 2026-01-06T14:46:00.430Z runtime/chan.go:664 +0x445 fp=0xc000074790 sp=0xc000074718 pc=0x57b24c253e85 2026-01-06T14:46:00.430Z runtime.chanrecv1(0x0?, 0x0?) 2026-01-06T14:46:00.430Z runtime/chan.go:506 +0x12 fp=0xc0000747b8 sp=0xc000074790 pc=0x57b24c253a12 2026-01-06T14:46:00.430Z runtime.unique_runtime_registerUniqueMapCleanup.func2(...) 2026-01-06T14:46:00.430Z runtime/mgc.go:1796 2026-01-06T14:46:00.430Z runtime.unique_runtime_registerUniqueMapCleanup.gowrap1() 2026-01-06T14:46:00.430Z runtime/mgc.go:1799 +0x2f fp=0xc0000747e0 sp=0xc0000747b8 pc=0x57b24c26658f 2026-01-06T14:46:00.430Z runtime.goexit({}) 2026-01-06T14:46:00.430Z runtime/asm_amd64.s:1700 +0x1 fp=0xc0000747e8 sp=0xc0000747e0 pc=0x57b24c2bfa01 2026-01-06T14:46:00.430Z created by unique.runtime_registerUniqueMapCleanup in goroutine 1 2026-01-06T14:46:00.430Z runtime/mgc.go:1794 +0x85 2026-01-06T14:46:00.430Z goroutine 7 gp=0xc0001cee00 m=nil [GC worker (idle)]: 2026-01-06T14:46:00.430Z runtime.gopark(0x2c05bd5c2bf5?, 0x3?, 0x6d?, 0x2?, 0x0?) 2026-01-06T14:46:00.430Z runtime/proc.go:435 +0xce fp=0xc000074f38 sp=0xc000074f18 pc=0x57b24c2b7b6e 2026-01-06T14:46:00.430Z runtime.gcBgMarkWorker(0xc00003f960) 2026-01-06T14:46:00.430Z runtime/mgc.go:1423 +0xe9 fp=0xc000074fc8 sp=0xc000074f38 pc=0x57b24c2658a9 2026-01-06T14:46:00.430Z runtime.gcBgMarkStartWorkers.gowrap1() 2026-01-06T14:46:00.430Z runtime/mgc.go:1339 +0x25 fp=0xc000074fe0 sp=0xc000074fc8 pc=0x57b24c265785 2026-01-06T14:46:00.430Z runtime.goexit({}) 2026-01-06T14:46:00.430Z runtime/asm_amd64.s:1700 +0x1 fp=0xc000074fe8 sp=0xc000074fe0 pc=0x57b24c2bfa01 2026-01-06T14:46:00.430Z created by runtime.gcBgMarkStartWorkers in goroutine 1 2026-01-06T14:46:00.430Z runtime/mgc.go:1339 +0x105 2026-01-06T14:46:00.430Z goroutine 18 gp=0xc000102380 m=nil [GC worker (idle)]: 2026-01-06T14:46:00.430Z runtime.gopark(0x2c05fd6a3f20?, 0x3?, 0x65?, 0xe?, 0x0?) 2026-01-06T14:46:00.430Z runtime/proc.go:435 +0xce fp=0xc00006e738 sp=0xc00006e718 pc=0x57b24c2b7b6e 2026-01-06T14:46:00.430Z runtime.gcBgMarkWorker(0xc00003f960) 2026-01-06T14:46:00.430Z runtime/mgc.go:1423 +0xe9 fp=0xc00006e7c8 sp=0xc00006e738 pc=0x57b24c2658a9 2026-01-06T14:46:00.430Z runtime.gcBgMarkStartWorkers.gowrap1() 2026-01-06T14:46:00.430Z runtime/mgc.go:1339 +0x25 fp=0xc00006e7e0 sp=0xc00006e7c8 pc=0x57b24c265785 2026-01-06T14:46:00.430Z runtime.goexit({}) 2026-01-06T14:46:00.430Z runtime/asm_amd64.s:1700 +0x1 fp=0xc00006e7e8 sp=0xc00006e7e0 pc=0x57b24c2bfa01 2026-01-06T14:46:00.430Z created by runtime.gcBgMarkStartWorkers in goroutine 1 2026-01-06T14:46:00.430Z runtime/mgc.go:1339 +0x105 2026-01-06T14:46:00.430Z goroutine 19 gp=0xc000102540 m=nil [GC worker (idle)]: 2026-01-06T14:46:00.430Z runtime.gopark(0x2c05fd6b0afd?, 0x1?, 0xdf?, 0xc?, 0x0?) 2026-01-06T14:46:00.430Z runtime/proc.go:435 +0xce fp=0xc00006ef38 sp=0xc00006ef18 pc=0x57b24c2b7b6e 2026-01-06T14:46:00.430Z runtime.gcBgMarkWorker(0xc00003f960) 2026-01-06T14:46:00.430Z runtime/mgc.go:1423 +0xe9 fp=0xc00006efc8 sp=0xc00006ef38 pc=0x57b24c2658a9 2026-01-06T14:46:00.430Z runtime.gcBgMarkStartWorkers.gowrap1() 2026-01-06T14:46:00.430Z runtime/mgc.go:1339 +0x25 fp=0xc00006efe0 sp=0xc00006efc8 pc=0x57b24c265785 2026-01-06T14:46:00.430Z runtime.goexit({}) 2026-01-06T14:46:00.430Z runtime/asm_amd64.s:1700 +0x1 fp=0xc00006efe8 sp=0xc00006efe0 pc=0x57b24c2bfa01 2026-01-06T14:46:00.430Z created by runtime.gcBgMarkStartWorkers in goroutine 1 2026-01-06T14:46:00.430Z runtime/mgc.go:1339 +0x105 2026-01-06T14:46:00.430Z goroutine 20 gp=0xc000102700 m=nil [GC worker (idle)]: 2026-01-06T14:46:00.430Z runtime.gopark(0x2c05fd5f8e80?, 0x3?, 0xc9?, 0xd7?, 0x0?) 2026-01-06T14:46:00.430Z runtime/proc.go:435 +0xce fp=0xc00006f738 sp=0xc00006f718 pc=0x57b24c2b7b6e 2026-01-06T14:46:00.430Z runtime.gcBgMarkWorker(0xc00003f960) 2026-01-06T14:46:00.430Z runtime/mgc.go:1423 +0xe9 fp=0xc00006f7c8 sp=0xc00006f738 pc=0x57b24c2658a9 2026-01-06T14:46:00.430Z runtime.gcBgMarkStartWorkers.gowrap1() 2026-01-06T14:46:00.430Z runtime/mgc.go:1339 +0x25 fp=0xc00006f7e0 sp=0xc00006f7c8 pc=0x57b24c265785 2026-01-06T14:46:00.430Z runtime.goexit({}) 2026-01-06T14:46:00.430Z runtime/asm_amd64.s:1700 +0x1 fp=0xc00006f7e8 sp=0xc00006f7e0 pc=0x57b24c2bfa01 2026-01-06T14:46:00.430Z created by runtime.gcBgMarkStartWorkers in goroutine 1 2026-01-06T14:46:00.430Z runtime/mgc.go:1339 +0x105 2026-01-06T14:46:00.430Z goroutine 21 gp=0xc0001028c0 m=nil [GC worker (idle)]: 2026-01-06T14:46:00.430Z runtime.gopark(0x57b24e20f680?, 0x1?, 0x59?, 0x11?, 0x0?) 2026-01-06T14:46:00.430Z runtime/proc.go:435 +0xce fp=0xc00006ff38 sp=0xc00006ff18 pc=0x57b24c2b7b6e 2026-01-06T14:46:00.430Z runtime.gcBgMarkWorker(0xc00003f960) 2026-01-06T14:46:00.430Z runtime/mgc.go:1423 +0xe9 fp=0xc00006ffc8 sp=0xc00006ff38 pc=0x57b24c2658a9 2026-01-06T14:46:00.430Z runtime.gcBgMarkStartWorkers.gowrap1() 2026-01-06T14:46:00.430Z runtime/mgc.go:1339 +0x25 fp=0xc00006ffe0 sp=0xc00006ffc8 pc=0x57b24c265785 2026-01-06T14:46:00.430Z runtime.goexit({}) 2026-01-06T14:46:00.430Z runtime/asm_amd64.s:1700 +0x1 fp=0xc00006ffe8 sp=0xc00006ffe0 pc=0x57b24c2bfa01 2026-01-06T14:46:00.430Z created by runtime.gcBgMarkStartWorkers in goroutine 1 2026-01-06T14:46:00.430Z runtime/mgc.go:1339 +0x105 2026-01-06T14:46:00.430Z goroutine 34 gp=0xc000504000 m=nil [GC worker (idle)]: 2026-01-06T14:46:00.430Z runtime.gopark(0x2c05fd78f15d?, 0x1?, 0xb6?, 0xf9?, 0x0?) 2026-01-06T14:46:00.430Z runtime/proc.go:435 +0xce fp=0xc00050a738 sp=0xc00050a718 pc=0x57b24c2b7b6e 2026-01-06T14:46:00.430Z runtime.gcBgMarkWorker(0xc00003f960) 2026-01-06T14:46:00.430Z runtime/mgc.go:1423 +0xe9 fp=0xc00050a7c8 sp=0xc00050a738 pc=0x57b24c2658a9 2026-01-06T14:46:00.430Z runtime.gcBgMarkStartWorkers.gowrap1() 2026-01-06T14:46:00.430Z runtime/mgc.go:1339 +0x25 fp=0xc00050a7e0 sp=0xc00050a7c8 pc=0x57b24c265785 2026-01-06T14:46:00.431Z runtime.goexit({}) 2026-01-06T14:46:00.431Z runtime/asm_amd64.s:1700 +0x1 fp=0xc00050a7e8 sp=0xc00050a7e0 pc=0x57b24c2bfa01 2026-01-06T14:46:00.431Z created by runtime.gcBgMarkStartWorkers in goroutine 1 2026-01-06T14:46:00.431Z runtime/mgc.go:1339 +0x105 2026-01-06T14:46:00.431Z goroutine 35 gp=0xc0005041c0 m=nil [GC worker (idle)]: 2026-01-06T14:46:00.431Z runtime.gopark(0x2c05fd6aa65e?, 0x3?, 0xbf?, 0x72?, 0x0?) 2026-01-06T14:46:00.431Z runtime/proc.go:435 +0xce fp=0xc00050af38 sp=0xc00050af18 pc=0x57b24c2b7b6e 2026-01-06T14:46:00.431Z runtime.gcBgMarkWorker(0xc00003f960) 2026-01-06T14:46:00.431Z runtime/mgc.go:1423 +0xe9 fp=0xc00050afc8 sp=0xc00050af38 pc=0x57b24c2658a9 2026-01-06T14:46:00.431Z runtime.gcBgMarkStartWorkers.gowrap1() 2026-01-06T14:46:00.431Z runtime/mgc.go:1339 +0x25 fp=0xc00050afe0 sp=0xc00050afc8 pc=0x57b24c265785 2026-01-06T14:46:00.431Z runtime.goexit({}) 2026-01-06T14:46:00.431Z runtime/asm_amd64.s:1700 +0x1 fp=0xc00050afe8 sp=0xc00050afe0 pc=0x57b24c2bfa01 2026-01-06T14:46:00.431Z created by runtime.gcBgMarkStartWorkers in goroutine 1 2026-01-06T14:46:00.431Z runtime/mgc.go:1339 +0x105 2026-01-06T14:46:00.431Z goroutine 36 gp=0xc000504380 m=nil [GC worker (idle)]: 2026-01-06T14:46:00.431Z runtime.gopark(0x2c05fd5ce92a?, 0x3?, 0x4c?, 0x6f?, 0x0?) 2026-01-06T14:46:00.431Z runtime/proc.go:435 +0xce fp=0xc00050b738 sp=0xc00050b718 pc=0x57b24c2b7b6e 2026-01-06T14:46:00.431Z runtime.gcBgMarkWorker(0xc00003f960) 2026-01-06T14:46:00.431Z runtime/mgc.go:1423 +0xe9 fp=0xc00050b7c8 sp=0xc00050b738 pc=0x57b24c2658a9 2026-01-06T14:46:00.431Z runtime.gcBgMarkStartWorkers.gowrap1() 2026-01-06T14:46:00.431Z runtime/mgc.go:1339 +0x25 fp=0xc00050b7e0 sp=0xc00050b7c8 pc=0x57b24c265785 2026-01-06T14:46:00.431Z runtime.goexit({}) 2026-01-06T14:46:00.431Z runtime/asm_amd64.s:1700 +0x1 fp=0xc00050b7e8 sp=0xc00050b7e0 pc=0x57b24c2bfa01 2026-01-06T14:46:00.431Z created by runtime.gcBgMarkStartWorkers in goroutine 1 2026-01-06T14:46:00.431Z runtime/mgc.go:1339 +0x105 2026-01-06T14:46:00.431Z goroutine 37 gp=0xc000504540 m=nil [GC worker (idle)]: 2026-01-06T14:46:00.431Z runtime.gopark(0x2c05fd6900a2?, 0x1?, 0xa1?, 0xf2?, 0x0?) 2026-01-06T14:46:00.431Z runtime/proc.go:435 +0xce fp=0xc00050bf38 sp=0xc00050bf18 pc=0x57b24c2b7b6e 2026-01-06T14:46:00.431Z runtime.gcBgMarkWorker(0xc00003f960) 2026-01-06T14:46:00.431Z runtime/mgc.go:1423 +0xe9 fp=0xc00050bfc8 sp=0xc00050bf38 pc=0x57b24c2658a9 2026-01-06T14:46:00.431Z runtime.gcBgMarkStartWorkers.gowrap1() 2026-01-06T14:46:00.431Z runtime/mgc.go:1339 +0x25 fp=0xc00050bfe0 sp=0xc00050bfc8 pc=0x57b24c265785 2026-01-06T14:46:00.431Z runtime.goexit({}) 2026-01-06T14:46:00.431Z runtime/asm_amd64.s:1700 +0x1 fp=0xc00050bfe8 sp=0xc00050bfe0 pc=0x57b24c2bfa01 2026-01-06T14:46:00.431Z created by runtime.gcBgMarkStartWorkers in goroutine 1 2026-01-06T14:46:00.431Z runtime/mgc.go:1339 +0x105 2026-01-06T14:46:00.431Z goroutine 38 gp=0xc000504700 m=nil [GC worker (idle)]: 2026-01-06T14:46:00.431Z runtime.gopark(0x2c05fd78c896?, 0x3?, 0xe3?, 0x96?, 0x0?) 2026-01-06T14:46:00.431Z runtime/proc.go:435 +0xce fp=0xc00050c738 sp=0xc00050c718 pc=0x57b24c2b7b6e 2026-01-06T14:46:00.431Z runtime.gcBgMarkWorker(0xc00003f960) 2026-01-06T14:46:00.431Z runtime/mgc.go:1423 +0xe9 fp=0xc00050c7c8 sp=0xc00050c738 pc=0x57b24c2658a9 2026-01-06T14:46:00.431Z runtime.gcBgMarkStartWorkers.gowrap1() 2026-01-06T14:46:00.431Z runtime/mgc.go:1339 +0x25 fp=0xc00050c7e0 sp=0xc00050c7c8 pc=0x57b24c265785 2026-01-06T14:46:00.431Z runtime.goexit({}) 2026-01-06T14:46:00.431Z runtime/asm_amd64.s:1700 +0x1 fp=0xc00050c7e8 sp=0xc00050c7e0 pc=0x57b24c2bfa01 2026-01-06T14:46:00.431Z created by runtime.gcBgMarkStartWorkers in goroutine 1 2026-01-06T14:46:00.431Z runtime/mgc.go:1339 +0x105 2026-01-06T14:46:00.431Z goroutine 39 gp=0xc0005048c0 m=nil [GC worker (idle)]: 2026-01-06T14:46:00.431Z runtime.gopark(0x57b24e20f680?, 0x1?, 0xf3?, 0xc?, 0x0?) 2026-01-06T14:46:00.431Z runtime/proc.go:435 +0xce fp=0xc00050cf38 sp=0xc00050cf18 pc=0x57b24c2b7b6e 2026-01-06T14:46:00.431Z runtime.gcBgMarkWorker(0xc00003f960) 2026-01-06T14:46:00.431Z runtime/mgc.go:1423 +0xe9 fp=0xc00050cfc8 sp=0xc00050cf38 pc=0x57b24c2658a9 2026-01-06T14:46:00.431Z runtime.gcBgMarkStartWorkers.gowrap1() 2026-01-06T14:46:00.431Z runtime/mgc.go:1339 +0x25 fp=0xc00050cfe0 sp=0xc00050cfc8 pc=0x57b24c265785 2026-01-06T14:46:00.431Z runtime.goexit({}) 2026-01-06T14:46:00.431Z runtime/asm_amd64.s:1700 +0x1 fp=0xc00050cfe8 sp=0xc00050cfe0 pc=0x57b24c2bfa01 2026-01-06T14:46:00.431Z created by runtime.gcBgMarkStartWorkers in goroutine 1 2026-01-06T14:46:00.431Z runtime/mgc.go:1339 +0x105 2026-01-06T14:46:00.431Z goroutine 40 gp=0xc000504a80 m=nil [GC worker (idle)]: 2026-01-06T14:46:00.431Z runtime.gopark(0x2c05fd6a2986?, 0x3?, 0xae?, 0x20?, 0x0?) 2026-01-06T14:46:00.431Z runtime/proc.go:435 +0xce fp=0xc00050d738 sp=0xc00050d718 pc=0x57b24c2b7b6e 2026-01-06T14:46:00.431Z runtime.gcBgMarkWorker(0xc00003f960) 2026-01-06T14:46:00.431Z runtime/mgc.go:1423 +0xe9 fp=0xc00050d7c8 sp=0xc00050d738 pc=0x57b24c2658a9 2026-01-06T14:46:00.431Z runtime.gcBgMarkStartWorkers.gowrap1() 2026-01-06T14:46:00.431Z runtime/mgc.go:1339 +0x25 fp=0xc00050d7e0 sp=0xc00050d7c8 pc=0x57b24c265785 2026-01-06T14:46:00.431Z runtime.goexit({}) 2026-01-06T14:46:00.431Z runtime/asm_amd64.s:1700 +0x1 fp=0xc00050d7e8 sp=0xc00050d7e0 pc=0x57b24c2bfa01 2026-01-06T14:46:00.431Z created by runtime.gcBgMarkStartWorkers in goroutine 1 2026-01-06T14:46:00.431Z runtime/mgc.go:1339 +0x105 2026-01-06T14:46:00.431Z goroutine 9 gp=0xc0005828c0 m=nil [select]: 2026-01-06T14:46:00.431Z runtime.gopark(0xc016187a08?, 0x2?, 0x4?, 0x0?, 0xc01618786c?) 2026-01-06T14:46:00.431Z runtime/proc.go:435 +0xce fp=0xc016187698 sp=0xc016187678 pc=0x57b24c2b7b6e 2026-01-06T14:46:00.431Z runtime.selectgo(0xc016187a08, 0xc016187868, 0xc000552e40?, 0x0, 0x1?, 0x1) 2026-01-06T14:46:00.431Z runtime/select.go:351 +0x837 fp=0xc0161877d0 sp=0xc016187698 pc=0x57b24c296a17 2026-01-06T14:46:00.431Z github.com/ollama/ollama/runner/ollamarunner.(*Server).completion(0xc000226f00, {0x57b24d86afa0, 0xc000aee000}, 0xc000494000) 2026-01-06T14:46:00.431Z github.com/ollama/ollama/runner/ollamarunner/runner.go:950 +0xc4e fp=0xc016187ac0 sp=0xc0161877d0 pc=0x57b24c7d368e 2026-01-06T14:46:00.431Z github.com/ollama/ollama/runner/ollamarunner.(*Server).completion-fm({0x57b24d86afa0?, 0xc000aee000?}, 0xc000049b40?) 2026-01-06T14:46:00.431Z <autogenerated>:1 +0x36 fp=0xc016187af0 sp=0xc016187ac0 pc=0x57b24c7d8b56 2026-01-06T14:46:00.431Z net/http.HandlerFunc.ServeHTTP(0xc000538a80?, {0x57b24d86afa0?, 0xc000aee000?}, 0xc000049b60?) 2026-01-06T14:46:00.432Z net/http/server.go:2294 +0x29 fp=0xc016187b18 sp=0xc016187af0 pc=0x57b24c5b70e9 2026-01-06T14:46:00.432Z net/http.(*ServeMux).ServeHTTP(0x57b24c25c8c5?, {0x57b24d86afa0, 0xc000aee000}, 0xc000494000) 2026-01-06T14:46:00.432Z net/http/server.go:2822 +0x1c4 fp=0xc016187b68 sp=0xc016187b18 pc=0x57b24c5b8fe4 2026-01-06T14:46:00.432Z net/http.serverHandler.ServeHTTP({0x57b24d867590?}, {0x57b24d86afa0?, 0xc000aee000?}, 0x1?) 2026-01-06T14:46:00.432Z net/http/server.go:3301 +0x8e fp=0xc016187b98 sp=0xc016187b68 pc=0x57b24c5d6a6e 2026-01-06T14:46:00.432Z net/http.(*conn).serve(0xc0004b43f0, {0x57b24d86d3d8, 0xc00021c2a0}) 2026-01-06T14:46:00.432Z net/http/server.go:2102 +0x625 fp=0xc016187fb8 sp=0xc016187b98 pc=0x57b24c5b55e5 2026-01-06T14:46:00.432Z net/http.(*Server).Serve.gowrap3() 2026-01-06T14:46:00.432Z net/http/server.go:3454 +0x28 fp=0xc016187fe0 sp=0xc016187fb8 pc=0x57b24c5baea8 2026-01-06T14:46:00.432Z runtime.goexit({}) 2026-01-06T14:46:00.432Z runtime/asm_amd64.s:1700 +0x1 fp=0xc016187fe8 sp=0xc016187fe0 pc=0x57b24c2bfa01 2026-01-06T14:46:00.432Z created by net/http.(*Server).Serve in goroutine 1 2026-01-06T14:46:00.432Z net/http/server.go:3454 +0x485 2026-01-06T14:46:00.432Z goroutine 830 gp=0xc000582fc0 m=nil [IO wait]: 2026-01-06T14:46:00.432Z runtime.gopark(0xff800000ff800000?, 0xff800000ff800000?, 0x0?, 0x0?, 0xb?) 2026-01-06T14:46:00.432Z runtime/proc.go:435 +0xce fp=0xc0009875d8 sp=0xc0009875b8 pc=0x57b24c2b7b6e 2026-01-06T14:46:00.432Z runtime.netpollblock(0x57b24c2db338?, 0x4c2512a6?, 0xb2?) 2026-01-06T14:46:00.432Z runtime/netpoll.go:575 +0xf7 fp=0xc000987610 sp=0xc0009875d8 pc=0x57b24c27ce97 2026-01-06T14:46:00.432Z internal/poll.runtime_pollWait(0x759a3ae56d98, 0x72) 2026-01-06T14:46:00.432Z runtime/netpoll.go:351 +0x85 fp=0xc000987630 sp=0xc000987610 pc=0x57b24c2b6d85 2026-01-06T14:46:00.432Z internal/poll.(*pollDesc).wait(0xc0001ca000?, 0xc000272041?, 0x0) 2026-01-06T14:46:00.432Z internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc000987658 sp=0xc000987630 pc=0x57b24c33ef07 2026-01-06T14:46:00.432Z internal/poll.(*pollDesc).waitRead(...) 2026-01-06T14:46:00.432Z internal/poll/fd_poll_runtime.go:89 2026-01-06T14:46:00.432Z internal/poll.(*FD).Read(0xc0001ca000, {0xc000272041, 0x1, 0x1}) 2026-01-06T14:46:00.432Z internal/poll/fd_unix.go:165 +0x27a fp=0xc0009876f0 sp=0xc000987658 pc=0x57b24c3401fa 2026-01-06T14:46:00.432Z net.(*netFD).Read(0xc0001ca000, {0xc000272041?, 0xc00053c4d8?, 0xc000987770?}) 2026-01-06T14:46:00.432Z net/fd_posix.go:55 +0x25 fp=0xc000987738 sp=0xc0009876f0 pc=0x57b24c3b5205 2026-01-06T14:46:00.432Z net.(*conn).Read(0xc000076658, {0xc000272041?, 0xc0043ad840?, 0x57b24c620500?}) 2026-01-06T14:46:00.432Z net/net.go:194 +0x45 fp=0xc000987780 sp=0xc000987738 pc=0x57b24c3c35c5 2026-01-06T14:46:00.432Z net/http.(*connReader).backgroundRead(0xc000272030) 2026-01-06T14:46:00.432Z net/http/server.go:690 +0x37 fp=0xc0009877c8 sp=0xc000987780 pc=0x57b24c5af4b7 2026-01-06T14:46:00.432Z net/http.(*connReader).startBackgroundRead.gowrap2() 2026-01-06T14:46:00.432Z net/http/server.go:686 +0x25 fp=0xc0009877e0 sp=0xc0009877c8 pc=0x57b24c5af3e5 2026-01-06T14:46:00.432Z runtime.goexit({}) 2026-01-06T14:46:00.432Z runtime/asm_amd64.s:1700 +0x1 fp=0xc0009877e8 sp=0xc0009877e0 pc=0x57b24c2bfa01 2026-01-06T14:46:00.432Z created by net/http.(*connReader).startBackgroundRead in goroutine 9 2026-01-06T14:46:00.432Z net/http/server.go:686 +0xb6 2026-01-06T14:46:00.432Z goroutine 3450 gp=0xc0001cf880 m=nil [sync.Mutex.Lock]: 2026-01-06T14:46:00.432Z runtime.gopark(0xc00055b008?, 0xc00012de90?, 0x60?, 0x98?, 0x57b24c2b5679?) 2026-01-06T14:46:00.432Z runtime/proc.go:435 +0xce fp=0xc00008aa88 sp=0xc00008aa68 pc=0x57b24c2b7b6e 2026-01-06T14:46:00.432Z runtime.goparkunlock(...) 2026-01-06T14:46:00.432Z runtime/proc.go:441 2026-01-06T14:46:00.432Z runtime.semacquire1(0xc000226ffc, 0x0, 0x3, 0x2, 0x15) 2026-01-06T14:46:00.432Z runtime/sema.go:188 +0x229 fp=0xc00008aaf0 sp=0xc00008aa88 pc=0x57b24c297ae9 2026-01-06T14:46:00.432Z internal/sync.runtime_SemacquireMutex(0x57b24c666e74?, 0xd8?, 0xc00012de90?) 2026-01-06T14:46:00.432Z runtime/sema.go:95 +0x25 fp=0xc00008ab28 sp=0xc00008aaf0 pc=0x57b24c2b9385 2026-01-06T14:46:00.432Z internal/sync.(*Mutex).lockSlow(0xc000226ff8) 2026-01-06T14:46:00.432Z internal/sync/mutex.go:149 +0x15d fp=0xc00008ab78 sp=0xc00008ab28 pc=0x57b24c2c94dd 2026-01-06T14:46:00.432Z internal/sync.(*Mutex).Lock(...) 2026-01-06T14:46:00.432Z internal/sync/mutex.go:70 2026-01-06T14:46:00.432Z sync.(*Mutex).Lock(...) 2026-01-06T14:46:00.432Z sync/mutex.go:46 2026-01-06T14:46:00.432Z github.com/ollama/ollama/runner/ollamarunner.(*Server).computeBatch(0xc000226f00, {0x4bd, {0x57b24d878250, 0xc0002e2c80}, {0x57b24d882b20, 0xc000662228}, {0xc00100f2d8, 0x1, 0x1}, {{0x57b24d882b20, ...}, ...}, ...}) 2026-01-06T14:46:00.432Z github.com/ollama/ollama/runner/ollamarunner/runner.go:735 +0x972 fp=0xc00008aef0 sp=0xc00008ab78 pc=0x57b24c7d1692 2026-01-06T14:46:00.432Z github.com/ollama/ollama/runner/ollamarunner.(*Server).run.gowrap1() 2026-01-06T14:46:00.432Z github.com/ollama/ollama/runner/ollamarunner/runner.go:458 +0x58 fp=0xc00008afe0 sp=0xc00008aef0 pc=0x57b24c7cf198 2026-01-06T14:46:00.432Z runtime.goexit({}) 2026-01-06T14:46:00.432Z runtime/asm_amd64.s:1700 +0x1 fp=0xc00008afe8 sp=0xc00008afe0 pc=0x57b24c2bfa01 2026-01-06T14:46:00.432Z created by github.com/ollama/ollama/runner/ollamarunner.(*Server).run in goroutine 8 2026-01-06T14:46:00.432Z github.com/ollama/ollama/runner/ollamarunner/runner.go:458 +0x2cd 2026-01-06T14:46:00.432Z rax 0x0 2026-01-06T14:46:00.432Z rbx 0x95 2026-01-06T14:46:00.432Z rcx 0x759a3b1c3b2c 2026-01-06T14:46:00.432Z rdx 0x6 2026-01-06T14:46:00.432Z rdi 0x8f 2026-01-06T14:46:00.432Z rsi 0x95 2026-01-06T14:46:00.432Z rbp 0x7599f252a330 2026-01-06T14:46:00.432Z rsp 0x7599f252a2f0 2026-01-06T14:46:00.432Z r8 0x0 2026-01-06T14:46:00.432Z r9 0x7 2026-01-06T14:46:00.432Z r10 0x8 2026-01-06T14:46:00.432Z r11 0x246 2026-01-06T14:46:00.432Z r12 0x6 2026-01-06T14:46:00.432Z r13 0x57b24d54cfdc 2026-01-06T14:46:00.432Z r14 0x16 2026-01-06T14:46:00.432Z r15 0x49742400 2026-01-06T14:46:00.432Z rip 0x759a3b1c3b2c 2026-01-06T14:46:00.432Z rflags 0x246 2026-01-06T14:46:00.432Z cs 0x33 2026-01-06T14:46:00.432Z fs 0x0 2026-01-06T14:46:00.432Z gs 0x0 2026-01-06T14:46:00.579Z time=2026-01-06T14:46:00.579Z level=ERROR source=server.go:302 msg="llama runner terminated" error="exit status 2" 2026-01-06T14:46:00.579Z [GIN] 2026/01/06 - 14:46:00 | 500 | 28.388806197s | 10.0.1.232 | POST "/v1/chat/completions" 2026-01-06T14:46:01.292Z time=2026-01-06T14:46:01.292Z level=INFO source=server.go:429 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 42293" 2026-01-06T14:46:01.673Z time=2026-01-06T14:46:01.673Z level=INFO source=server.go:429 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 44257" 2026-01-06T14:46:01.928Z time=2026-01-06T14:46:01.926Z level=INFO source=server.go:429 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 36617" 2026-01-06T14:46:02.173Z time=2026-01-06T14:46:02.173Z level=INFO source=server.go:429 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 35851" 2026-01-06T14:46:02.423Z time=2026-01-06T14:46:02.423Z level=INFO source=server.go:429 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 43455" 2026-01-06T14:46:02.673Z time=2026-01-06T14:46:02.673Z level=INFO source=server.go:429 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 39881" 2026-01-06T14:46:02.923Z time=2026-01-06T14:46:02.923Z level=INFO source=server.go:429 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 45497" 2026-01-06T14:46:03.173Z time=2026-01-06T14:46:03.173Z level=INFO source=server.go:429 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 33945" 2026-01-06T14:46:03.423Z time=2026-01-06T14:46:03.423Z level=INFO source=server.go:429 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 33415" 2026-01-06T14:46:03.673Z time=2026-01-06T14:46:03.673Z level=INFO source=server.go:429 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 41883" 2026-01-06T14:46:03.924Z time=2026-01-06T14:46:03.923Z level=INFO source=server.go:429 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 35599" 2026-01-06T14:46:04.173Z time=2026-01-06T14:46:04.173Z level=INFO source=server.go:429 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 34825" 2026-01-06T14:46:04.423Z time=2026-01-06T14:46:04.423Z level=INFO source=server.go:429 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 41857" 2026-01-06T14:46:04.676Z time=2026-01-06T14:46:04.676Z level=INFO source=server.go:429 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 35041" 2026-01-06T14:46:04.924Z time=2026-01-06T14:46:04.923Z level=INFO source=server.go:429 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 33923" 2026-01-06T14:46:05.173Z time=2026-01-06T14:46:05.173Z level=INFO source=server.go:429 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 37359" 2026-01-06T14:46:05.424Z time=2026-01-06T14:46:05.424Z level=INFO source=server.go:429 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 44197" 2026-01-06T14:46:05.673Z time=2026-01-06T14:46:05.673Z level=INFO source=server.go:429 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 36093" 2026-01-06T14:46:05.923Z time=2026-01-06T14:46:05.923Z level=INFO source=server.go:429 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 33603" 2026-01-06T14:46:06.173Z time=2026-01-06T14:46:06.173Z level=INFO source=server.go:429 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 42247" 2026-01-06T14:46:06.423Z time=2026-01-06T14:46:06.423Z level=INFO source=server.go:429 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 36061" 2026-01-06T14:46:06.629Z time=2026-01-06T14:46:06.628Z level=INFO source=server.go:429 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --model /root/.ollama/models/blobs/sha256-df8b6415ce11eeaa85d11f8c4288c489aa3818354d9691d71523bcdffb5f2fa8 --port 40931" 2026-01-06T14:46:06.629Z time=2026-01-06T14:46:06.629Z level=INFO source=sched.go:443 msg="system memory" total="16.0 GiB" free="13.4 GiB" free_swap="18.0 GiB" 2026-01-06T14:46:06.629Z time=2026-01-06T14:46:06.629Z level=INFO source=sched.go:450 msg="gpu memory" id=GPU-2324d0b0-9a1d-8bb1-6241-75acd1170ede library=CUDA available="11.1 GiB" free="11.6 GiB" minimum="457.0 MiB" overhead="0 B" 2026-01-06T14:46:06.629Z time=2026-01-06T14:46:06.629Z level=INFO source=server.go:746 msg="loading model" "model layers"=37 requested=-1 2026-01-06T14:46:06.642Z time=2026-01-06T14:46:06.642Z level=INFO source=runner.go:1405 msg="starting ollama engine" 2026-01-06T14:46:06.642Z time=2026-01-06T14:46:06.642Z level=INFO source=runner.go:1440 msg="Server listening on 127.0.0.1:40931" 2026-01-06T14:46:06.652Z time=2026-01-06T14:46:06.652Z level=INFO source=runner.go:1278 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Disabled KvSize:4096 KvCacheType: NumThreads:8 GPULayers:37[ID:GPU-2324d0b0-9a1d-8bb1-6241-75acd1170ede Layers:37(0..36)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" 2026-01-06T14:46:06.690Z time=2026-01-06T14:46:06.690Z level=INFO source=ggml.go:136 msg="" architecture=qwen25vl file_type=Q4_K_M name="" description="" num_tensors=953 num_key_values=36 2026-01-06T14:46:06.695Z load_backend: loaded CPU backend from /usr/lib/ollama/libggml-cpu-haswell.so 2026-01-06T14:46:06.769Z ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no 2026-01-06T14:46:06.769Z ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no 2026-01-06T14:46:06.769Z ggml_cuda_init: found 1 CUDA devices: 2026-01-06T14:46:06.769Z Device 0: NVIDIA GeForce RTX 3060, compute capability 8.6, VMM: yes, ID: GPU-2324d0b0-9a1d-8bb1-6241-75acd1170ede 2026-01-06T14:46:06.769Z load_backend: loaded CUDA backend from /usr/lib/ollama/cuda_v12/libggml-cuda.so 2026-01-06T14:46:06.769Z time=2026-01-06T14:46:06.769Z level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=500,520,600,610,700,750,800,860,890,900,1200 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc) 2026-01-06T14:46:07.969Z time=2026-01-06T14:46:07.969Z level=INFO source=runner.go:1278 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Disabled KvSize:4096 KvCacheType: NumThreads:8 GPULayers:37[ID:GPU-2324d0b0-9a1d-8bb1-6241-75acd1170ede Layers:37(0..36)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" 2026-01-06T14:46:08.916Z time=2026-01-06T14:46:08.916Z level=INFO source=runner.go:1278 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Disabled KvSize:4096 KvCacheType: NumThreads:8 GPULayers:37[ID:GPU-2324d0b0-9a1d-8bb1-6241-75acd1170ede Layers:37(0..36)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" 2026-01-06T14:46:08.916Z time=2026-01-06T14:46:08.916Z level=INFO source=ggml.go:482 msg="offloading 36 repeating layers to GPU" 2026-01-06T14:46:08.916Z time=2026-01-06T14:46:08.916Z level=INFO source=ggml.go:489 msg="offloading output layer to GPU" 2026-01-06T14:46:08.916Z time=2026-01-06T14:46:08.916Z level=INFO source=ggml.go:494 msg="offloaded 37/37 layers to GPU" 2026-01-06T14:46:08.916Z time=2026-01-06T14:46:08.916Z level=INFO source=device.go:240 msg="model weights" device=CUDA0 size="3.0 GiB" 2026-01-06T14:46:08.916Z time=2026-01-06T14:46:08.916Z level=INFO source=device.go:245 msg="model weights" device=CPU size="243.4 MiB" 2026-01-06T14:46:08.916Z time=2026-01-06T14:46:08.916Z level=INFO source=device.go:251 msg="kv cache" device=CUDA0 size="144.0 MiB" 2026-01-06T14:46:08.916Z time=2026-01-06T14:46:08.916Z level=INFO source=device.go:262 msg="compute graph" device=CUDA0 size="7.5 GiB" 2026-01-06T14:46:08.917Z time=2026-01-06T14:46:08.916Z level=INFO source=device.go:267 msg="compute graph" device=CPU size="20.3 MiB" 2026-01-06T14:46:08.917Z time=2026-01-06T14:46:08.916Z level=INFO source=device.go:272 msg="total memory" size="10.8 GiB" 2026-01-06T14:46:08.917Z time=2026-01-06T14:46:08.916Z level=INFO source=sched.go:517 msg="loaded runners" count=1 2026-01-06T14:46:08.917Z time=2026-01-06T14:46:08.916Z level=INFO source=server.go:1338 msg="waiting for llama runner to start responding" 2026-01-06T14:46:08.917Z time=2026-01-06T14:46:08.917Z level=INFO source=server.go:1372 msg="waiting for server to become available" status="llm server loading model" 2026-01-06T14:46:09.670Z time=2026-01-06T14:46:09.669Z level=INFO source=server.go:1376 msg="llama runner started in 3.04 seconds" ```
Author
Owner

@Kenerator commented on GitHub (Jan 7, 2026):

Getting this same GGML assertion failure on Apple M3 Max. Qwen3-vl is working fine though on Ollama 0.13.5, for those who don't want to downgrade Ollama to 0.12.x.

<!-- gh-comment-id:3720108252 --> @Kenerator commented on GitHub (Jan 7, 2026): Getting this same GGML assertion failure on Apple M3 Max. Qwen3-vl is working fine though on Ollama 0.13.5, for those who don't want to downgrade Ollama to 0.12.x.
Author
Owner

@RevengeRip commented on GitHub (Jan 8, 2026):

Getting similar error (I think) on ollama 0.13.5 from Arch Linux repos with ROCm backend (ROCm version 7.1.1)

ollama.log

<!-- gh-comment-id:3723218675 --> @RevengeRip commented on GitHub (Jan 8, 2026): Getting similar error (I think) on ollama 0.13.5 from Arch Linux repos with ROCm backend (ROCm version 7.1.1) [ollama.log](https://github.com/user-attachments/files/24493066/ollama.log)
Author
Owner

@rick-github commented on GitHub (Jan 8, 2026):

@RevengeRip Post the full server log.

<!-- gh-comment-id:3723878354 --> @rick-github commented on GitHub (Jan 8, 2026): @RevengeRip Post the full server log.
Author
Owner

@RevengeRip commented on GitHub (Jan 8, 2026):

@rick-github

ollama-full.log

<!-- gh-comment-id:3723906243 --> @RevengeRip commented on GitHub (Jan 8, 2026): @rick-github [ollama-full.log](https://github.com/user-attachments/files/24496362/ollama-full.log)
Author
Owner

@heema commented on GitHub (Jan 23, 2026):

I'm experiencing the same issue on RTX 3070 Ti (8GB).

Working version: v0.13.3
Broken versions: v0.13.5, v0.13.15, v0.14.x (latest)
Model: qwen2.5-vl (Q4_K_M)

v0.13.3 loads all 29 layers to GPU successfully with KvSize:8192.
Newer versions enter "low vram mode" and fail with "cudaMalloc failed: out of memory" when trying to allocate 7350.58 MiB.

Logs attached showing the comparison between working v0.13.3 and broken v0.13.15.

ollama-0_13_3.log
ollama-0_13_5.log

<!-- gh-comment-id:3790952986 --> @heema commented on GitHub (Jan 23, 2026): I'm experiencing the same issue on RTX 3070 Ti (8GB). **Working version:** v0.13.3 **Broken versions:** v0.13.5, v0.13.15, v0.14.x (latest) **Model:** qwen2.5-vl (Q4_K_M) v0.13.3 loads all 29 layers to GPU successfully with KvSize:8192. Newer versions enter "low vram mode" and fail with "cudaMalloc failed: out of memory" when trying to allocate 7350.58 MiB. Logs attached showing the comparison between working v0.13.3 and broken v0.13.15. [ollama-0_13_3.log](https://github.com/user-attachments/files/24824528/ollama-0_13_3.log) [ollama-0_13_5.log](https://github.com/user-attachments/files/24824527/ollama-0_13_5.log)
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#34727