0.6.6 running on RTX4090 runs out of VRAM when using mistral-small3.1 to analyze one image #6828

Closed
opened 2025-11-12 13:46:20 -06:00 by GiteaMirror · 5 comments
Owner

Originally created by @codearranger on GitHub (Apr 23, 2025).

What is the issue?

ollama runs out of VRAM with mistral-small3.1 when analyzing a single tiled image.

Relevant log output

ollama-1  | time=2025-04-23T03:17:19.000Z level=WARN source=sched.go:648 msg="gpu VRAM usage didn't recover within timeout" seconds=5.18963552 model=/root/.ollama/models/blobs/sha256-1fa8532d986d729117d6b5ac2c884824d0717c9468094554fd1d36412c740cfc
ollama-1  | time=2025-04-23T03:17:19.254Z level=WARN source=sched.go:648 msg="gpu VRAM usage didn't recover within timeout" seconds=5.443351597 model=/root/.ollama/models/blobs/sha256-1fa8532d986d729117d6b5ac2c884824d0717c9468094554fd1d36412c740cfc
ollama-1  | time=2025-04-23T03:17:19.807Z level=WARN source=sched.go:648 msg="gpu VRAM usage didn't recover within timeout" seconds=5.996507944 model=/root/.ollama/models/blobs/sha256-1fa8532d986d729117d6b5ac2c884824d0717c9468094554fd1d36412c740cfc
ollama-1  | time=2025-04-23T03:17:20.075Z level=INFO source=server.go:105 msg="system memory" total="251.7 GiB" free="204.9 GiB" free_swap="70.7 GiB"
ollama-1  | time=2025-04-23T03:17:20.077Z level=INFO source=server.go:138 msg=offload library=cuda layers.requested=-1 layers.model=41 layers.offload=37 layers.split="" memory.available="[23.1 GiB]" memory.gpu_overhead="0 B" memory.required.full="24.4 GiB" memory.required.partial="23.0 GiB" memory.required.kv="640.0 MiB" memory.required.allocations="[23.0 GiB]" memory.weights.total="13.1 GiB" memory.weights.repeating="12.7 GiB" memory.weights.nonrepeating="360.0 MiB" memory.graph.full="426.7 MiB" memory.graph.partial="426.7 MiB" projector.weights="769.3 MiB" projector.graph="8.8 GiB"
ollama-1  | time=2025-04-23T03:17:20.168Z level=WARN source=ggml.go:152 msg="key not found" key=tokenizer.ggml.pretokenizer default="[^\\r\\n\\p{L}\\p{N}]?[\\p{Lu}\\p{Lt}\\p{Lm}\\p{Lo}\\p{M}]*[\\p{Ll}\\p{Lm}\\p{Lo}\\p{M}]+|[^\\r\\n\\p{L}\\p{N}]?[\\p{Lu}\\p{Lt}\\p{Lm}\\p{Lo}\\p{M}]+[\\p{Ll}\\p{Lm}\\p{Lo}\\p{M}]*|\\p{N}| ?[^\\s\\p{L}\\p{N}]+[\\r\\n/]*|\\s*[\\r\\n]+|\\s+(?!\\S)|\\s+"
ollama-1  | time=2025-04-23T03:17:20.172Z level=WARN source=ggml.go:152 msg="key not found" key=mistral3.rope.freq_scale default=1
ollama-1  | time=2025-04-23T03:17:20.172Z level=WARN source=ggml.go:152 msg="key not found" key=mistral3.vision.attention.layer_norm_epsilon default=9.999999747378752e-06
ollama-1  | time=2025-04-23T03:17:20.172Z level=WARN source=ggml.go:152 msg="key not found" key=mistral3.vision.longest_edge default=1540
ollama-1  | time=2025-04-23T03:17:20.172Z level=WARN source=ggml.go:152 msg="key not found" key=mistral3.text_config.rms_norm_eps default=9.999999747378752e-06
ollama-1  | time=2025-04-23T03:17:20.172Z level=INFO source=server.go:405 msg="starting llama server" cmd="/usr/bin/ollama runner --ollama-engine --model /root/.ollama/models/blobs/sha256-1fa8532d986d729117d6b5ac2c884824d0717c9468094554fd1d36412c740cfc --ctx-size 4096 --batch-size 512 --n-gpu-layers 37 --threads 36 --parallel 1 --port 42459"
ollama-1  | time=2025-04-23T03:17:20.173Z level=INFO source=sched.go:451 msg="loaded runners" count=1
ollama-1  | time=2025-04-23T03:17:20.173Z level=INFO source=server.go:580 msg="waiting for llama runner to start responding"
ollama-1  | time=2025-04-23T03:17:20.174Z level=INFO source=server.go:614 msg="waiting for server to become available" status="llm server error"
ollama-1  | time=2025-04-23T03:17:20.193Z level=INFO source=runner.go:816 msg="starting ollama engine"
ollama-1  | time=2025-04-23T03:17:20.194Z level=INFO source=runner.go:879 msg="Server listening on 127.0.0.1:42459"
ollama-1  | time=2025-04-23T03:17:20.306Z level=WARN source=ggml.go:152 msg="key not found" key=general.name default=""
ollama-1  | time=2025-04-23T03:17:20.306Z level=WARN source=ggml.go:152 msg="key not found" key=general.description default=""
ollama-1  | time=2025-04-23T03:17:20.306Z level=INFO source=ggml.go:67 msg="" architecture=mistral3 file_type=Q4_K_M name="" description="" num_tensors=585 num_key_values=43
ollama-1  | ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ollama-1  | ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ollama-1  | ggml_cuda_init: found 1 CUDA devices:
ollama-1  |   Device 0: NVIDIA GeForce RTX 4090, compute capability 8.9, VMM: yes
ollama-1  | load_backend: loaded CUDA backend from /usr/lib/ollama/cuda_v12/libggml-cuda.so
ollama-1  | load_backend: loaded CPU backend from /usr/lib/ollama/libggml-cpu-haswell.so
ollama-1  | time=2025-04-23T03:17:20.425Z level=INFO source=ggml.go:109 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=500,600,610,700,750,800,860,870,890,900,1200 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc)
ollama-1  | time=2025-04-23T03:17:20.426Z level=INFO source=server.go:614 msg="waiting for server to become available" status="llm server loading model"
ollama-1  | time=2025-04-23T03:17:20.649Z level=INFO source=ggml.go:289 msg="model weights" buffer=CPU size="2.7 GiB"
ollama-1  | time=2025-04-23T03:17:20.649Z level=INFO source=ggml.go:289 msg="model weights" buffer=CUDA0 size="11.7 GiB"
helium@sob:~/git/ollama$ ^C
helium@sob:~/git/ollama$ docker compose logs -n 300 ollama
ollama-1  |     runtime/mgc.go:1339 +0x105
ollama-1  | 
ollama-1  | goroutine 64 gp=0xc0004c2fc0 m=nil [GC worker (idle)]:
ollama-1  | runtime.gopark(0x55ba857a6a20?, 0x3?, 0xe?, 0xaf?, 0x0?)
ollama-1  |     runtime/proc.go:435 +0xce fp=0xc0004bcf38 sp=0xc0004bcf18 pc=0x55ba83bb2dae
ollama-1  | runtime.gcBgMarkWorker(0xc000137570)
ollama-1  |     runtime/mgc.go:1423 +0xe9 fp=0xc0004bcfc8 sp=0xc0004bcf38 pc=0x55ba83b60449
ollama-1  | runtime.gcBgMarkStartWorkers.gowrap1()
ollama-1  |     runtime/mgc.go:1339 +0x25 fp=0xc0004bcfe0 sp=0xc0004bcfc8 pc=0x55ba83b60325
ollama-1  | runtime.goexit({})
ollama-1  |     runtime/asm_amd64.s:1700 +0x1 fp=0xc0004bcfe8 sp=0xc0004bcfe0 pc=0x55ba83bba4e1
ollama-1  | created by runtime.gcBgMarkStartWorkers in goroutine 1
ollama-1  |     runtime/mgc.go:1339 +0x105
ollama-1  | 
ollama-1  | goroutine 65 gp=0xc0004c3180 m=nil [GC worker (idle)]:
ollama-1  | runtime.gopark(0xb46601afdcb68?, 0x3?, 0xc5?, 0xfd?, 0x0?)
ollama-1  |     runtime/proc.go:435 +0xce fp=0xc0004bd738 sp=0xc0004bd718 pc=0x55ba83bb2dae
ollama-1  | runtime.gcBgMarkWorker(0xc000137570)
ollama-1  |     runtime/mgc.go:1423 +0xe9 fp=0xc0004bd7c8 sp=0xc0004bd738 pc=0x55ba83b60449
ollama-1  | runtime.gcBgMarkStartWorkers.gowrap1()
ollama-1  |     runtime/mgc.go:1339 +0x25 fp=0xc0004bd7e0 sp=0xc0004bd7c8 pc=0x55ba83b60325
ollama-1  | runtime.goexit({})
ollama-1  |     runtime/asm_amd64.s:1700 +0x1 fp=0xc0004bd7e8 sp=0xc0004bd7e0 pc=0x55ba83bba4e1
ollama-1  | created by runtime.gcBgMarkStartWorkers in goroutine 1
ollama-1  |     runtime/mgc.go:1339 +0x105
ollama-1  | 
ollama-1  | goroutine 66 gp=0xc0004c3340 m=nil [GC worker (idle)]:
ollama-1  | runtime.gopark(0xb46601b03d951?, 0x3?, 0xb8?, 0x5e?, 0x0?)
ollama-1  |     runtime/proc.go:435 +0xce fp=0xc0004bdf38 sp=0xc0004bdf18 pc=0x55ba83bb2dae
ollama-1  | runtime.gcBgMarkWorker(0xc000137570)
ollama-1  |     runtime/mgc.go:1423 +0xe9 fp=0xc0004bdfc8 sp=0xc0004bdf38 pc=0x55ba83b60449
ollama-1  | runtime.gcBgMarkStartWorkers.gowrap1()
ollama-1  |     runtime/mgc.go:1339 +0x25 fp=0xc0004bdfe0 sp=0xc0004bdfc8 pc=0x55ba83b60325
ollama-1  | runtime.goexit({})
ollama-1  |     runtime/asm_amd64.s:1700 +0x1 fp=0xc0004bdfe8 sp=0xc0004bdfe0 pc=0x55ba83bba4e1
ollama-1  | created by runtime.gcBgMarkStartWorkers in goroutine 1
ollama-1  |     runtime/mgc.go:1339 +0x105
ollama-1  | 
ollama-1  | goroutine 67 gp=0xc0004c3500 m=nil [GC worker (idle)]:
ollama-1  | runtime.gopark(0xb46601b096df2?, 0x1?, 0x51?, 0xb0?, 0x0?)
ollama-1  |     runtime/proc.go:435 +0xce fp=0xc0004c8738 sp=0xc0004c8718 pc=0x55ba83bb2dae
ollama-1  | runtime.gcBgMarkWorker(0xc000137570)
ollama-1  |     runtime/mgc.go:1423 +0xe9 fp=0xc0004c87c8 sp=0xc0004c8738 pc=0x55ba83b60449
ollama-1  | runtime.gcBgMarkStartWorkers.gowrap1()
ollama-1  |     runtime/mgc.go:1339 +0x25 fp=0xc0004c87e0 sp=0xc0004c87c8 pc=0x55ba83b60325
ollama-1  | runtime.goexit({})
ollama-1  |     runtime/asm_amd64.s:1700 +0x1 fp=0xc0004c87e8 sp=0xc0004c87e0 pc=0x55ba83bba4e1
ollama-1  | created by runtime.gcBgMarkStartWorkers in goroutine 1
ollama-1  |     runtime/mgc.go:1339 +0x105
ollama-1  | 
ollama-1  | goroutine 68 gp=0xc0004c36c0 m=nil [GC worker (idle)]:
ollama-1  | runtime.gopark(0xb46601b088469?, 0x1?, 0x48?, 0x2f?, 0x0?)
ollama-1  |     runtime/proc.go:435 +0xce fp=0xc0004c8f38 sp=0xc0004c8f18 pc=0x55ba83bb2dae
ollama-1  | runtime.gcBgMarkWorker(0xc000137570)
ollama-1  |     runtime/mgc.go:1423 +0xe9 fp=0xc0004c8fc8 sp=0xc0004c8f38 pc=0x55ba83b60449
ollama-1  | runtime.gcBgMarkStartWorkers.gowrap1()
ollama-1  |     runtime/mgc.go:1339 +0x25 fp=0xc0004c8fe0 sp=0xc0004c8fc8 pc=0x55ba83b60325
ollama-1  | runtime.goexit({})
ollama-1  |     runtime/asm_amd64.s:1700 +0x1 fp=0xc0004c8fe8 sp=0xc0004c8fe0 pc=0x55ba83bba4e1
ollama-1  | created by runtime.gcBgMarkStartWorkers in goroutine 1
ollama-1  |     runtime/mgc.go:1339 +0x105
ollama-1  | 
ollama-1  | goroutine 69 gp=0xc0004c3880 m=nil [GC worker (idle)]:
ollama-1  | runtime.gopark(0x55ba857a6a20?, 0x1?, 0x38?, 0x90?, 0x0?)
ollama-1  |     runtime/proc.go:435 +0xce fp=0xc0004c9738 sp=0xc0004c9718 pc=0x55ba83bb2dae
ollama-1  | runtime.gcBgMarkWorker(0xc000137570)
ollama-1  |     runtime/mgc.go:1423 +0xe9 fp=0xc0004c97c8 sp=0xc0004c9738 pc=0x55ba83b60449
ollama-1  | runtime.gcBgMarkStartWorkers.gowrap1()
ollama-1  |     runtime/mgc.go:1339 +0x25 fp=0xc0004c97e0 sp=0xc0004c97c8 pc=0x55ba83b60325
ollama-1  | runtime.goexit({})
ollama-1  |     runtime/asm_amd64.s:1700 +0x1 fp=0xc0004c97e8 sp=0xc0004c97e0 pc=0x55ba83bba4e1
ollama-1  | created by runtime.gcBgMarkStartWorkers in goroutine 1
ollama-1  |     runtime/mgc.go:1339 +0x105
ollama-1  | 
ollama-1  | goroutine 70 gp=0xc0004c3a40 m=nil [GC worker (idle)]:
ollama-1  | runtime.gopark(0x55ba857a6a20?, 0x1?, 0x24?, 0x3e?, 0x0?)
ollama-1  |     runtime/proc.go:435 +0xce fp=0xc0004c9f38 sp=0xc0004c9f18 pc=0x55ba83bb2dae
ollama-1  | runtime.gcBgMarkWorker(0xc000137570)
ollama-1  |     runtime/mgc.go:1423 +0xe9 fp=0xc0004c9fc8 sp=0xc0004c9f38 pc=0x55ba83b60449
ollama-1  | runtime.gcBgMarkStartWorkers.gowrap1()
ollama-1  |     runtime/mgc.go:1339 +0x25 fp=0xc0004c9fe0 sp=0xc0004c9fc8 pc=0x55ba83b60325
ollama-1  | runtime.goexit({})
ollama-1  |     runtime/asm_amd64.s:1700 +0x1 fp=0xc0004c9fe8 sp=0xc0004c9fe0 pc=0x55ba83bba4e1
ollama-1  | created by runtime.gcBgMarkStartWorkers in goroutine 1
ollama-1  |     runtime/mgc.go:1339 +0x105
ollama-1  | 
ollama-1  | goroutine 71 gp=0xc0004c3c00 m=nil [GC worker (idle)]:
ollama-1  | runtime.gopark(0xb46601b052039?, 0x3?, 0x2e?, 0x13?, 0x0?)
ollama-1  |     runtime/proc.go:435 +0xce fp=0xc0004ca738 sp=0xc0004ca718 pc=0x55ba83bb2dae
ollama-1  | runtime.gcBgMarkWorker(0xc000137570)
ollama-1  |     runtime/mgc.go:1423 +0xe9 fp=0xc0004ca7c8 sp=0xc0004ca738 pc=0x55ba83b60449
ollama-1  | runtime.gcBgMarkStartWorkers.gowrap1()
ollama-1  |     runtime/mgc.go:1339 +0x25 fp=0xc0004ca7e0 sp=0xc0004ca7c8 pc=0x55ba83b60325
ollama-1  | runtime.goexit({})
ollama-1  |     runtime/asm_amd64.s:1700 +0x1 fp=0xc0004ca7e8 sp=0xc0004ca7e0 pc=0x55ba83bba4e1
ollama-1  | created by runtime.gcBgMarkStartWorkers in goroutine 1
ollama-1  |     runtime/mgc.go:1339 +0x105
ollama-1  | 
ollama-1  | goroutine 72 gp=0xc0004c3dc0 m=nil [GC worker (idle)]:
ollama-1  | runtime.gopark(0xb46601b030ad0?, 0x3?, 0x64?, 0x6c?, 0x0?)
ollama-1  |     runtime/proc.go:435 +0xce fp=0xc0004caf38 sp=0xc0004caf18 pc=0x55ba83bb2dae
ollama-1  | runtime.gcBgMarkWorker(0xc000137570)
ollama-1  |     runtime/mgc.go:1423 +0xe9 fp=0xc0004cafc8 sp=0xc0004caf38 pc=0x55ba83b60449
ollama-1  | runtime.gcBgMarkStartWorkers.gowrap1()
ollama-1  |     runtime/mgc.go:1339 +0x25 fp=0xc0004cafe0 sp=0xc0004cafc8 pc=0x55ba83b60325
ollama-1  | runtime.goexit({})
ollama-1  |     runtime/asm_amd64.s:1700 +0x1 fp=0xc0004cafe8 sp=0xc0004cafe0 pc=0x55ba83bba4e1
ollama-1  | created by runtime.gcBgMarkStartWorkers in goroutine 1
ollama-1  |     runtime/mgc.go:1339 +0x105
ollama-1  | 
ollama-1  | goroutine 73 gp=0xc0004cc000 m=nil [GC worker (idle)]:
ollama-1  | runtime.gopark(0xb46601b0523d5?, 0x3?, 0xf6?, 0xcd?, 0x0?)
ollama-1  |     runtime/proc.go:435 +0xce fp=0xc0004cb738 sp=0xc0004cb718 pc=0x55ba83bb2dae
ollama-1  | runtime.gcBgMarkWorker(0xc000137570)
ollama-1  |     runtime/mgc.go:1423 +0xe9 fp=0xc0004cb7c8 sp=0xc0004cb738 pc=0x55ba83b60449
ollama-1  | runtime.gcBgMarkStartWorkers.gowrap1()
ollama-1  |     runtime/mgc.go:1339 +0x25 fp=0xc0004cb7e0 sp=0xc0004cb7c8 pc=0x55ba83b60325
ollama-1  | runtime.goexit({})
ollama-1  |     runtime/asm_amd64.s:1700 +0x1 fp=0xc0004cb7e8 sp=0xc0004cb7e0 pc=0x55ba83bba4e1
ollama-1  | created by runtime.gcBgMarkStartWorkers in goroutine 1
ollama-1  |     runtime/mgc.go:1339 +0x105
ollama-1  | 
ollama-1  | goroutine 82 gp=0xc000182380 m=nil [GC worker (idle)]:
ollama-1  | runtime.gopark(0xb46601b017e17?, 0x3?, 0x99?, 0xef?, 0x0?)
ollama-1  |     runtime/proc.go:435 +0xce fp=0xc0004c4738 sp=0xc0004c4718 pc=0x55ba83bb2dae
ollama-1  | runtime.gcBgMarkWorker(0xc000137570)
ollama-1  |     runtime/mgc.go:1423 +0xe9 fp=0xc0004c47c8 sp=0xc0004c4738 pc=0x55ba83b60449
ollama-1  | runtime.gcBgMarkStartWorkers.gowrap1()
ollama-1  |     runtime/mgc.go:1339 +0x25 fp=0xc0004c47e0 sp=0xc0004c47c8 pc=0x55ba83b60325
ollama-1  | runtime.goexit({})
ollama-1  |     runtime/asm_amd64.s:1700 +0x1 fp=0xc0004c47e8 sp=0xc0004c47e0 pc=0x55ba83bba4e1
ollama-1  | created by runtime.gcBgMarkStartWorkers in goroutine 1
ollama-1  |     runtime/mgc.go:1339 +0x105
ollama-1  | 
ollama-1  | goroutine 98 gp=0xc000504000 m=nil [GC worker (idle)]:
ollama-1  | runtime.gopark(0xb46601b03d88a?, 0x3?, 0xb4?, 0x84?, 0x0?)
ollama-1  |     runtime/proc.go:435 +0xce fp=0xc00050a738 sp=0xc00050a718 pc=0x55ba83bb2dae
ollama-1  | runtime.gcBgMarkWorker(0xc000137570)
ollama-1  |     runtime/mgc.go:1423 +0xe9 fp=0xc00050a7c8 sp=0xc00050a738 pc=0x55ba83b60449
ollama-1  | runtime.gcBgMarkStartWorkers.gowrap1()
ollama-1  |     runtime/mgc.go:1339 +0x25 fp=0xc00050a7e0 sp=0xc00050a7c8 pc=0x55ba83b60325
ollama-1  | runtime.goexit({})
ollama-1  |     runtime/asm_amd64.s:1700 +0x1 fp=0xc00050a7e8 sp=0xc00050a7e0 pc=0x55ba83bba4e1
ollama-1  | created by runtime.gcBgMarkStartWorkers in goroutine 1
ollama-1  |     runtime/mgc.go:1339 +0x105
ollama-1  | 
ollama-1  | goroutine 99 gp=0xc0005041c0 m=nil [GC worker (idle)]:
ollama-1  | runtime.gopark(0xb46601b0519e3?, 0x1?, 0xa6?, 0xc9?, 0x0?)
ollama-1  |     runtime/proc.go:435 +0xce fp=0xc00050af38 sp=0xc00050af18 pc=0x55ba83bb2dae
ollama-1  | runtime.gcBgMarkWorker(0xc000137570)
ollama-1  |     runtime/mgc.go:1423 +0xe9 fp=0xc00050afc8 sp=0xc00050af38 pc=0x55ba83b60449
ollama-1  | runtime.gcBgMarkStartWorkers.gowrap1()
ollama-1  |     runtime/mgc.go:1339 +0x25 fp=0xc00050afe0 sp=0xc00050afc8 pc=0x55ba83b60325
ollama-1  | runtime.goexit({})
ollama-1  |     runtime/asm_amd64.s:1700 +0x1 fp=0xc00050afe8 sp=0xc00050afe0 pc=0x55ba83bba4e1
ollama-1  | created by runtime.gcBgMarkStartWorkers in goroutine 1
ollama-1  |     runtime/mgc.go:1339 +0x105
ollama-1  | 
ollama-1  | goroutine 100 gp=0xc000504380 m=nil [GC worker (idle)]:
ollama-1  | runtime.gopark(0xb46601b02c70f?, 0x3?, 0xbe?, 0x4b?, 0x0?)
ollama-1  |     runtime/proc.go:435 +0xce fp=0xc00050b738 sp=0xc00050b718 pc=0x55ba83bb2dae
ollama-1  | runtime.gcBgMarkWorker(0xc000137570)
ollama-1  |     runtime/mgc.go:1423 +0xe9 fp=0xc00050b7c8 sp=0xc00050b738 pc=0x55ba83b60449
ollama-1  | runtime.gcBgMarkStartWorkers.gowrap1()
ollama-1  |     runtime/mgc.go:1339 +0x25 fp=0xc00050b7e0 sp=0xc00050b7c8 pc=0x55ba83b60325
ollama-1  | runtime.goexit({})
ollama-1  |     runtime/asm_amd64.s:1700 +0x1 fp=0xc00050b7e8 sp=0xc00050b7e0 pc=0x55ba83bba4e1
ollama-1  | created by runtime.gcBgMarkStartWorkers in goroutine 1
ollama-1  |     runtime/mgc.go:1339 +0x105
ollama-1  | 
ollama-1  | goroutine 101 gp=0xc000504540 m=nil [GC worker (idle)]:
ollama-1  | runtime.gopark(0xb46601b018576?, 0x3?, 0x95?, 0xeb?, 0x0?)
ollama-1  |     runtime/proc.go:435 +0xce fp=0xc00050bf38 sp=0xc00050bf18 pc=0x55ba83bb2dae
ollama-1  | runtime.gcBgMarkWorker(0xc000137570)
ollama-1  |     runtime/mgc.go:1423 +0xe9 fp=0xc00050bfc8 sp=0xc00050bf38 pc=0x55ba83b60449
ollama-1  | runtime.gcBgMarkStartWorkers.gowrap1()
ollama-1  |     runtime/mgc.go:1339 +0x25 fp=0xc00050bfe0 sp=0xc00050bfc8 pc=0x55ba83b60325
ollama-1  | runtime.goexit({})
ollama-1  |     runtime/asm_amd64.s:1700 +0x1 fp=0xc00050bfe8 sp=0xc00050bfe0 pc=0x55ba83bba4e1
ollama-1  | created by runtime.gcBgMarkStartWorkers in goroutine 1
ollama-1  |     runtime/mgc.go:1339 +0x105
ollama-1  | 
ollama-1  | goroutine 102 gp=0xc000504700 m=nil [GC worker (idle)]:
ollama-1  | runtime.gopark(0xb46601b096db3?, 0x1?, 0x93?, 0xbf?, 0x0?)
ollama-1  |     runtime/proc.go:435 +0xce fp=0xc00050c738 sp=0xc00050c718 pc=0x55ba83bb2dae
ollama-1  | runtime.gcBgMarkWorker(0xc000137570)
ollama-1  |     runtime/mgc.go:1423 +0xe9 fp=0xc00050c7c8 sp=0xc00050c738 pc=0x55ba83b60449
ollama-1  | runtime.gcBgMarkStartWorkers.gowrap1()
ollama-1  |     runtime/mgc.go:1339 +0x25 fp=0xc00050c7e0 sp=0xc00050c7c8 pc=0x55ba83b60325
ollama-1  | runtime.goexit({})
ollama-1  |     runtime/asm_amd64.s:1700 +0x1 fp=0xc00050c7e8 sp=0xc00050c7e0 pc=0x55ba83bba4e1
ollama-1  | created by runtime.gcBgMarkStartWorkers in goroutine 1
ollama-1  |     runtime/mgc.go:1339 +0x105
ollama-1  | 
ollama-1  | goroutine 103 gp=0xc0004cc8c0 m=nil [select]:
ollama-1  | runtime.gopark(0xc000049a28?, 0x2?, 0x0?, 0xba?, 0xc000049894?)
ollama-1  |     runtime/proc.go:435 +0xce fp=0xc0000496a8 sp=0xc000049688 pc=0x55ba83bb2dae
ollama-1  | runtime.selectgo(0xc000049a28, 0xc000049890, 0x1000?, 0x0, 0x4?, 0x1)
ollama-1  |     runtime/select.go:351 +0x837 fp=0xc0000497e0 sp=0xc0000496a8 pc=0x55ba83b91697
ollama-1  | github.com/ollama/ollama/runner/ollamarunner.(*Server).completion(0xc000352a20, {0x55ba84e92438, 0xc000000700}, 0xc0016003c0)
ollama-1  |     github.com/ollama/ollama/runner/ollamarunner/runner.go:677 +0xb05 fp=0xc000049ac0 sp=0xc0000497e0 pc=0x55ba84067945
ollama-1  | github.com/ollama/ollama/runner/ollamarunner.(*Server).completion-fm({0x55ba84e92438?, 0xc000000700?}, 0xc000197b40?)
ollama-1  |     <autogenerated>:1 +0x36 fp=0xc000049af0 sp=0xc000049ac0 pc=0x55ba84069ef6
ollama-1  | net/http.HandlerFunc.ServeHTTP(0xc0004f4780?, {0x55ba84e92438?, 0xc000000700?}, 0xc000197b60?)
ollama-1  |     net/http/server.go:2294 +0x29 fp=0xc000049b18 sp=0xc000049af0 pc=0x55ba83eb13c9
ollama-1  | net/http.(*ServeMux).ServeHTTP(0x55ba83b57465?, {0x55ba84e92438, 0xc000000700}, 0xc0016003c0)
ollama-1  |     net/http/server.go:2822 +0x1c4 fp=0xc000049b68 sp=0xc000049b18 pc=0x55ba83eb32c4
ollama-1  | net/http.serverHandler.ServeHTTP({0x55ba84e8eb10?}, {0x55ba84e92438?, 0xc000000700?}, 0x1?)
ollama-1  |     net/http/server.go:3301 +0x8e fp=0xc000049b98 sp=0xc000049b68 pc=0x55ba83ed0d4e
ollama-1  | net/http.(*conn).serve(0xc0002fc240, {0x55ba84e94518, 0xc0002f6c00})
ollama-1  |     net/http/server.go:2102 +0x625 fp=0xc000049fb8 sp=0xc000049b98 pc=0x55ba83eaf8c5
ollama-1  | net/http.(*Server).Serve.gowrap3()
ollama-1  |     net/http/server.go:3454 +0x28 fp=0xc000049fe0 sp=0xc000049fb8 pc=0x55ba83eb5188
ollama-1  | runtime.goexit({})
ollama-1  |     runtime/asm_amd64.s:1700 +0x1 fp=0xc000049fe8 sp=0xc000049fe0 pc=0x55ba83bba4e1
ollama-1  | created by net/http.(*Server).Serve in goroutine 1
ollama-1  |     net/http/server.go:3454 +0x485
ollama-1  | 
ollama-1  | goroutine 347 gp=0xc00255ee00 m=nil [IO wait]:
ollama-1  | runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0xb?)
ollama-1  |     runtime/proc.go:435 +0xce fp=0xc0016fb5d8 sp=0xc0016fb5b8 pc=0x55ba83bb2dae
ollama-1  | runtime.netpollblock(0x55ba83bd6238?, 0x83b4c566?, 0xba?)
ollama-1  |     runtime/netpoll.go:575 +0xf7 fp=0xc0016fb610 sp=0xc0016fb5d8 pc=0x55ba83b77b97
ollama-1  | internal/poll.runtime_pollWait(0x796b56e21d98, 0x72)
ollama-1  |     runtime/netpoll.go:351 +0x85 fp=0xc0016fb630 sp=0xc0016fb610 pc=0x55ba83bb1fc5
ollama-1  | internal/poll.(*pollDesc).wait(0xc0006d0000?, 0xc0002b8131?, 0x0)
ollama-1  |     internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc0016fb658 sp=0xc0016fb630 pc=0x55ba83c39447
ollama-1  | internal/poll.(*pollDesc).waitRead(...)
ollama-1  |     internal/poll/fd_poll_runtime.go:89
ollama-1  | internal/poll.(*FD).Read(0xc0006d0000, {0xc0002b8131, 0x1, 0x1})
ollama-1  |     internal/poll/fd_unix.go:165 +0x27a fp=0xc0016fb6f0 sp=0xc0016fb658 pc=0x55ba83c3a73a
ollama-1  | net.(*netFD).Read(0xc0006d0000, {0xc0002b8131?, 0x0?, 0x0?})
ollama-1  |     net/fd_posix.go:55 +0x25 fp=0xc0016fb738 sp=0xc0016fb6f0 pc=0x55ba83caf685
ollama-1  | net.(*conn).Read(0xc00061c000, {0xc0002b8131?, 0x0?, 0x0?})
ollama-1  |     net/net.go:194 +0x45 fp=0xc0016fb780 sp=0xc0016fb738 pc=0x55ba83cbda45
ollama-1  | net/http.(*connReader).backgroundRead(0xc0002b8120)
ollama-1  |     net/http/server.go:690 +0x37 fp=0xc0016fb7c8 sp=0xc0016fb780 pc=0x55ba83ea9797
ollama-1  | net/http.(*connReader).startBackgroundRead.gowrap2()
ollama-1  |     net/http/server.go:686 +0x25 fp=0xc0016fb7e0 sp=0xc0016fb7c8 pc=0x55ba83ea96c5
ollama-1  | runtime.goexit({})
ollama-1  |     runtime/asm_amd64.s:1700 +0x1 fp=0xc0016fb7e8 sp=0xc0016fb7e0 pc=0x55ba83bba4e1
ollama-1  | created by net/http.(*connReader).startBackgroundRead in goroutine 103
ollama-1  |     net/http/server.go:686 +0xb6
ollama-1  | 
ollama-1  | rax    0x0
ollama-1  | rbx    0x796a26a00700
ollama-1  | rcx    0x796b5703200b
ollama-1  | rdx    0x0
ollama-1  | rdi    0x2
ollama-1  | rsi    0x796a269ff8f0
ollama-1  | rbp    0x7969d4e01c85
ollama-1  | rsp    0x796a269ff8f0
ollama-1  | r8     0x0
ollama-1  | r9     0x796a269ff8f0
ollama-1  | r10    0x8
ollama-1  | r11    0x246
ollama-1  | r12    0x7969d4e021b8
ollama-1  | r13    0x49
ollama-1  | r14    0x796b0c7e04f8
ollama-1  | r15    0x796ab800bbe0
ollama-1  | rip    0x796b5703200b
ollama-1  | rflags 0x246
ollama-1  | cs     0x33
ollama-1  | fs     0x0
ollama-1  | gs     0x0
ollama-1  | [GIN] 2025/04/23 - 03:17:31 | 500 | 17.974505069s |   192.168.176.1 | POST     "/api/chat"
ollama-1  | [GIN] 2025/04/23 - 03:17:31 | 500 | 17.860236069s |   192.168.176.1 | POST     "/api/chat"
ollama-1  | [GIN] 2025/04/23 - 03:17:31 | 500 | 17.771792067s |   192.168.176.1 | POST     "/api/chat"
ollama-1  | [GIN] 2025/04/23 - 03:17:31 | 500 | 15.635726626s |   192.168.176.1 | POST     "/api/chat"
ollama-1  | time=2025-04-23T03:17:31.379Z level=ERROR source=server.go:449 msg="llama runner terminated" error="exit status 2"

OS

docker with NVIDIA CUDA support

GPU

RTX4090

CPU

Dual Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz

Ollama version

0.6.5 and 0.6.6

Originally created by @codearranger on GitHub (Apr 23, 2025). ### What is the issue? ollama runs out of VRAM with mistral-small3.1 when analyzing a single tiled image. ### Relevant log output ```shell ollama-1 | time=2025-04-23T03:17:19.000Z level=WARN source=sched.go:648 msg="gpu VRAM usage didn't recover within timeout" seconds=5.18963552 model=/root/.ollama/models/blobs/sha256-1fa8532d986d729117d6b5ac2c884824d0717c9468094554fd1d36412c740cfc ollama-1 | time=2025-04-23T03:17:19.254Z level=WARN source=sched.go:648 msg="gpu VRAM usage didn't recover within timeout" seconds=5.443351597 model=/root/.ollama/models/blobs/sha256-1fa8532d986d729117d6b5ac2c884824d0717c9468094554fd1d36412c740cfc ollama-1 | time=2025-04-23T03:17:19.807Z level=WARN source=sched.go:648 msg="gpu VRAM usage didn't recover within timeout" seconds=5.996507944 model=/root/.ollama/models/blobs/sha256-1fa8532d986d729117d6b5ac2c884824d0717c9468094554fd1d36412c740cfc ollama-1 | time=2025-04-23T03:17:20.075Z level=INFO source=server.go:105 msg="system memory" total="251.7 GiB" free="204.9 GiB" free_swap="70.7 GiB" ollama-1 | time=2025-04-23T03:17:20.077Z level=INFO source=server.go:138 msg=offload library=cuda layers.requested=-1 layers.model=41 layers.offload=37 layers.split="" memory.available="[23.1 GiB]" memory.gpu_overhead="0 B" memory.required.full="24.4 GiB" memory.required.partial="23.0 GiB" memory.required.kv="640.0 MiB" memory.required.allocations="[23.0 GiB]" memory.weights.total="13.1 GiB" memory.weights.repeating="12.7 GiB" memory.weights.nonrepeating="360.0 MiB" memory.graph.full="426.7 MiB" memory.graph.partial="426.7 MiB" projector.weights="769.3 MiB" projector.graph="8.8 GiB" ollama-1 | time=2025-04-23T03:17:20.168Z level=WARN source=ggml.go:152 msg="key not found" key=tokenizer.ggml.pretokenizer default="[^\\r\\n\\p{L}\\p{N}]?[\\p{Lu}\\p{Lt}\\p{Lm}\\p{Lo}\\p{M}]*[\\p{Ll}\\p{Lm}\\p{Lo}\\p{M}]+|[^\\r\\n\\p{L}\\p{N}]?[\\p{Lu}\\p{Lt}\\p{Lm}\\p{Lo}\\p{M}]+[\\p{Ll}\\p{Lm}\\p{Lo}\\p{M}]*|\\p{N}| ?[^\\s\\p{L}\\p{N}]+[\\r\\n/]*|\\s*[\\r\\n]+|\\s+(?!\\S)|\\s+" ollama-1 | time=2025-04-23T03:17:20.172Z level=WARN source=ggml.go:152 msg="key not found" key=mistral3.rope.freq_scale default=1 ollama-1 | time=2025-04-23T03:17:20.172Z level=WARN source=ggml.go:152 msg="key not found" key=mistral3.vision.attention.layer_norm_epsilon default=9.999999747378752e-06 ollama-1 | time=2025-04-23T03:17:20.172Z level=WARN source=ggml.go:152 msg="key not found" key=mistral3.vision.longest_edge default=1540 ollama-1 | time=2025-04-23T03:17:20.172Z level=WARN source=ggml.go:152 msg="key not found" key=mistral3.text_config.rms_norm_eps default=9.999999747378752e-06 ollama-1 | time=2025-04-23T03:17:20.172Z level=INFO source=server.go:405 msg="starting llama server" cmd="/usr/bin/ollama runner --ollama-engine --model /root/.ollama/models/blobs/sha256-1fa8532d986d729117d6b5ac2c884824d0717c9468094554fd1d36412c740cfc --ctx-size 4096 --batch-size 512 --n-gpu-layers 37 --threads 36 --parallel 1 --port 42459" ollama-1 | time=2025-04-23T03:17:20.173Z level=INFO source=sched.go:451 msg="loaded runners" count=1 ollama-1 | time=2025-04-23T03:17:20.173Z level=INFO source=server.go:580 msg="waiting for llama runner to start responding" ollama-1 | time=2025-04-23T03:17:20.174Z level=INFO source=server.go:614 msg="waiting for server to become available" status="llm server error" ollama-1 | time=2025-04-23T03:17:20.193Z level=INFO source=runner.go:816 msg="starting ollama engine" ollama-1 | time=2025-04-23T03:17:20.194Z level=INFO source=runner.go:879 msg="Server listening on 127.0.0.1:42459" ollama-1 | time=2025-04-23T03:17:20.306Z level=WARN source=ggml.go:152 msg="key not found" key=general.name default="" ollama-1 | time=2025-04-23T03:17:20.306Z level=WARN source=ggml.go:152 msg="key not found" key=general.description default="" ollama-1 | time=2025-04-23T03:17:20.306Z level=INFO source=ggml.go:67 msg="" architecture=mistral3 file_type=Q4_K_M name="" description="" num_tensors=585 num_key_values=43 ollama-1 | ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ollama-1 | ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ollama-1 | ggml_cuda_init: found 1 CUDA devices: ollama-1 | Device 0: NVIDIA GeForce RTX 4090, compute capability 8.9, VMM: yes ollama-1 | load_backend: loaded CUDA backend from /usr/lib/ollama/cuda_v12/libggml-cuda.so ollama-1 | load_backend: loaded CPU backend from /usr/lib/ollama/libggml-cpu-haswell.so ollama-1 | time=2025-04-23T03:17:20.425Z level=INFO source=ggml.go:109 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=500,600,610,700,750,800,860,870,890,900,1200 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc) ollama-1 | time=2025-04-23T03:17:20.426Z level=INFO source=server.go:614 msg="waiting for server to become available" status="llm server loading model" ollama-1 | time=2025-04-23T03:17:20.649Z level=INFO source=ggml.go:289 msg="model weights" buffer=CPU size="2.7 GiB" ollama-1 | time=2025-04-23T03:17:20.649Z level=INFO source=ggml.go:289 msg="model weights" buffer=CUDA0 size="11.7 GiB" helium@sob:~/git/ollama$ ^C helium@sob:~/git/ollama$ docker compose logs -n 300 ollama ollama-1 | runtime/mgc.go:1339 +0x105 ollama-1 | ollama-1 | goroutine 64 gp=0xc0004c2fc0 m=nil [GC worker (idle)]: ollama-1 | runtime.gopark(0x55ba857a6a20?, 0x3?, 0xe?, 0xaf?, 0x0?) ollama-1 | runtime/proc.go:435 +0xce fp=0xc0004bcf38 sp=0xc0004bcf18 pc=0x55ba83bb2dae ollama-1 | runtime.gcBgMarkWorker(0xc000137570) ollama-1 | runtime/mgc.go:1423 +0xe9 fp=0xc0004bcfc8 sp=0xc0004bcf38 pc=0x55ba83b60449 ollama-1 | runtime.gcBgMarkStartWorkers.gowrap1() ollama-1 | runtime/mgc.go:1339 +0x25 fp=0xc0004bcfe0 sp=0xc0004bcfc8 pc=0x55ba83b60325 ollama-1 | runtime.goexit({}) ollama-1 | runtime/asm_amd64.s:1700 +0x1 fp=0xc0004bcfe8 sp=0xc0004bcfe0 pc=0x55ba83bba4e1 ollama-1 | created by runtime.gcBgMarkStartWorkers in goroutine 1 ollama-1 | runtime/mgc.go:1339 +0x105 ollama-1 | ollama-1 | goroutine 65 gp=0xc0004c3180 m=nil [GC worker (idle)]: ollama-1 | runtime.gopark(0xb46601afdcb68?, 0x3?, 0xc5?, 0xfd?, 0x0?) ollama-1 | runtime/proc.go:435 +0xce fp=0xc0004bd738 sp=0xc0004bd718 pc=0x55ba83bb2dae ollama-1 | runtime.gcBgMarkWorker(0xc000137570) ollama-1 | runtime/mgc.go:1423 +0xe9 fp=0xc0004bd7c8 sp=0xc0004bd738 pc=0x55ba83b60449 ollama-1 | runtime.gcBgMarkStartWorkers.gowrap1() ollama-1 | runtime/mgc.go:1339 +0x25 fp=0xc0004bd7e0 sp=0xc0004bd7c8 pc=0x55ba83b60325 ollama-1 | runtime.goexit({}) ollama-1 | runtime/asm_amd64.s:1700 +0x1 fp=0xc0004bd7e8 sp=0xc0004bd7e0 pc=0x55ba83bba4e1 ollama-1 | created by runtime.gcBgMarkStartWorkers in goroutine 1 ollama-1 | runtime/mgc.go:1339 +0x105 ollama-1 | ollama-1 | goroutine 66 gp=0xc0004c3340 m=nil [GC worker (idle)]: ollama-1 | runtime.gopark(0xb46601b03d951?, 0x3?, 0xb8?, 0x5e?, 0x0?) ollama-1 | runtime/proc.go:435 +0xce fp=0xc0004bdf38 sp=0xc0004bdf18 pc=0x55ba83bb2dae ollama-1 | runtime.gcBgMarkWorker(0xc000137570) ollama-1 | runtime/mgc.go:1423 +0xe9 fp=0xc0004bdfc8 sp=0xc0004bdf38 pc=0x55ba83b60449 ollama-1 | runtime.gcBgMarkStartWorkers.gowrap1() ollama-1 | runtime/mgc.go:1339 +0x25 fp=0xc0004bdfe0 sp=0xc0004bdfc8 pc=0x55ba83b60325 ollama-1 | runtime.goexit({}) ollama-1 | runtime/asm_amd64.s:1700 +0x1 fp=0xc0004bdfe8 sp=0xc0004bdfe0 pc=0x55ba83bba4e1 ollama-1 | created by runtime.gcBgMarkStartWorkers in goroutine 1 ollama-1 | runtime/mgc.go:1339 +0x105 ollama-1 | ollama-1 | goroutine 67 gp=0xc0004c3500 m=nil [GC worker (idle)]: ollama-1 | runtime.gopark(0xb46601b096df2?, 0x1?, 0x51?, 0xb0?, 0x0?) ollama-1 | runtime/proc.go:435 +0xce fp=0xc0004c8738 sp=0xc0004c8718 pc=0x55ba83bb2dae ollama-1 | runtime.gcBgMarkWorker(0xc000137570) ollama-1 | runtime/mgc.go:1423 +0xe9 fp=0xc0004c87c8 sp=0xc0004c8738 pc=0x55ba83b60449 ollama-1 | runtime.gcBgMarkStartWorkers.gowrap1() ollama-1 | runtime/mgc.go:1339 +0x25 fp=0xc0004c87e0 sp=0xc0004c87c8 pc=0x55ba83b60325 ollama-1 | runtime.goexit({}) ollama-1 | runtime/asm_amd64.s:1700 +0x1 fp=0xc0004c87e8 sp=0xc0004c87e0 pc=0x55ba83bba4e1 ollama-1 | created by runtime.gcBgMarkStartWorkers in goroutine 1 ollama-1 | runtime/mgc.go:1339 +0x105 ollama-1 | ollama-1 | goroutine 68 gp=0xc0004c36c0 m=nil [GC worker (idle)]: ollama-1 | runtime.gopark(0xb46601b088469?, 0x1?, 0x48?, 0x2f?, 0x0?) ollama-1 | runtime/proc.go:435 +0xce fp=0xc0004c8f38 sp=0xc0004c8f18 pc=0x55ba83bb2dae ollama-1 | runtime.gcBgMarkWorker(0xc000137570) ollama-1 | runtime/mgc.go:1423 +0xe9 fp=0xc0004c8fc8 sp=0xc0004c8f38 pc=0x55ba83b60449 ollama-1 | runtime.gcBgMarkStartWorkers.gowrap1() ollama-1 | runtime/mgc.go:1339 +0x25 fp=0xc0004c8fe0 sp=0xc0004c8fc8 pc=0x55ba83b60325 ollama-1 | runtime.goexit({}) ollama-1 | runtime/asm_amd64.s:1700 +0x1 fp=0xc0004c8fe8 sp=0xc0004c8fe0 pc=0x55ba83bba4e1 ollama-1 | created by runtime.gcBgMarkStartWorkers in goroutine 1 ollama-1 | runtime/mgc.go:1339 +0x105 ollama-1 | ollama-1 | goroutine 69 gp=0xc0004c3880 m=nil [GC worker (idle)]: ollama-1 | runtime.gopark(0x55ba857a6a20?, 0x1?, 0x38?, 0x90?, 0x0?) ollama-1 | runtime/proc.go:435 +0xce fp=0xc0004c9738 sp=0xc0004c9718 pc=0x55ba83bb2dae ollama-1 | runtime.gcBgMarkWorker(0xc000137570) ollama-1 | runtime/mgc.go:1423 +0xe9 fp=0xc0004c97c8 sp=0xc0004c9738 pc=0x55ba83b60449 ollama-1 | runtime.gcBgMarkStartWorkers.gowrap1() ollama-1 | runtime/mgc.go:1339 +0x25 fp=0xc0004c97e0 sp=0xc0004c97c8 pc=0x55ba83b60325 ollama-1 | runtime.goexit({}) ollama-1 | runtime/asm_amd64.s:1700 +0x1 fp=0xc0004c97e8 sp=0xc0004c97e0 pc=0x55ba83bba4e1 ollama-1 | created by runtime.gcBgMarkStartWorkers in goroutine 1 ollama-1 | runtime/mgc.go:1339 +0x105 ollama-1 | ollama-1 | goroutine 70 gp=0xc0004c3a40 m=nil [GC worker (idle)]: ollama-1 | runtime.gopark(0x55ba857a6a20?, 0x1?, 0x24?, 0x3e?, 0x0?) ollama-1 | runtime/proc.go:435 +0xce fp=0xc0004c9f38 sp=0xc0004c9f18 pc=0x55ba83bb2dae ollama-1 | runtime.gcBgMarkWorker(0xc000137570) ollama-1 | runtime/mgc.go:1423 +0xe9 fp=0xc0004c9fc8 sp=0xc0004c9f38 pc=0x55ba83b60449 ollama-1 | runtime.gcBgMarkStartWorkers.gowrap1() ollama-1 | runtime/mgc.go:1339 +0x25 fp=0xc0004c9fe0 sp=0xc0004c9fc8 pc=0x55ba83b60325 ollama-1 | runtime.goexit({}) ollama-1 | runtime/asm_amd64.s:1700 +0x1 fp=0xc0004c9fe8 sp=0xc0004c9fe0 pc=0x55ba83bba4e1 ollama-1 | created by runtime.gcBgMarkStartWorkers in goroutine 1 ollama-1 | runtime/mgc.go:1339 +0x105 ollama-1 | ollama-1 | goroutine 71 gp=0xc0004c3c00 m=nil [GC worker (idle)]: ollama-1 | runtime.gopark(0xb46601b052039?, 0x3?, 0x2e?, 0x13?, 0x0?) ollama-1 | runtime/proc.go:435 +0xce fp=0xc0004ca738 sp=0xc0004ca718 pc=0x55ba83bb2dae ollama-1 | runtime.gcBgMarkWorker(0xc000137570) ollama-1 | runtime/mgc.go:1423 +0xe9 fp=0xc0004ca7c8 sp=0xc0004ca738 pc=0x55ba83b60449 ollama-1 | runtime.gcBgMarkStartWorkers.gowrap1() ollama-1 | runtime/mgc.go:1339 +0x25 fp=0xc0004ca7e0 sp=0xc0004ca7c8 pc=0x55ba83b60325 ollama-1 | runtime.goexit({}) ollama-1 | runtime/asm_amd64.s:1700 +0x1 fp=0xc0004ca7e8 sp=0xc0004ca7e0 pc=0x55ba83bba4e1 ollama-1 | created by runtime.gcBgMarkStartWorkers in goroutine 1 ollama-1 | runtime/mgc.go:1339 +0x105 ollama-1 | ollama-1 | goroutine 72 gp=0xc0004c3dc0 m=nil [GC worker (idle)]: ollama-1 | runtime.gopark(0xb46601b030ad0?, 0x3?, 0x64?, 0x6c?, 0x0?) ollama-1 | runtime/proc.go:435 +0xce fp=0xc0004caf38 sp=0xc0004caf18 pc=0x55ba83bb2dae ollama-1 | runtime.gcBgMarkWorker(0xc000137570) ollama-1 | runtime/mgc.go:1423 +0xe9 fp=0xc0004cafc8 sp=0xc0004caf38 pc=0x55ba83b60449 ollama-1 | runtime.gcBgMarkStartWorkers.gowrap1() ollama-1 | runtime/mgc.go:1339 +0x25 fp=0xc0004cafe0 sp=0xc0004cafc8 pc=0x55ba83b60325 ollama-1 | runtime.goexit({}) ollama-1 | runtime/asm_amd64.s:1700 +0x1 fp=0xc0004cafe8 sp=0xc0004cafe0 pc=0x55ba83bba4e1 ollama-1 | created by runtime.gcBgMarkStartWorkers in goroutine 1 ollama-1 | runtime/mgc.go:1339 +0x105 ollama-1 | ollama-1 | goroutine 73 gp=0xc0004cc000 m=nil [GC worker (idle)]: ollama-1 | runtime.gopark(0xb46601b0523d5?, 0x3?, 0xf6?, 0xcd?, 0x0?) ollama-1 | runtime/proc.go:435 +0xce fp=0xc0004cb738 sp=0xc0004cb718 pc=0x55ba83bb2dae ollama-1 | runtime.gcBgMarkWorker(0xc000137570) ollama-1 | runtime/mgc.go:1423 +0xe9 fp=0xc0004cb7c8 sp=0xc0004cb738 pc=0x55ba83b60449 ollama-1 | runtime.gcBgMarkStartWorkers.gowrap1() ollama-1 | runtime/mgc.go:1339 +0x25 fp=0xc0004cb7e0 sp=0xc0004cb7c8 pc=0x55ba83b60325 ollama-1 | runtime.goexit({}) ollama-1 | runtime/asm_amd64.s:1700 +0x1 fp=0xc0004cb7e8 sp=0xc0004cb7e0 pc=0x55ba83bba4e1 ollama-1 | created by runtime.gcBgMarkStartWorkers in goroutine 1 ollama-1 | runtime/mgc.go:1339 +0x105 ollama-1 | ollama-1 | goroutine 82 gp=0xc000182380 m=nil [GC worker (idle)]: ollama-1 | runtime.gopark(0xb46601b017e17?, 0x3?, 0x99?, 0xef?, 0x0?) ollama-1 | runtime/proc.go:435 +0xce fp=0xc0004c4738 sp=0xc0004c4718 pc=0x55ba83bb2dae ollama-1 | runtime.gcBgMarkWorker(0xc000137570) ollama-1 | runtime/mgc.go:1423 +0xe9 fp=0xc0004c47c8 sp=0xc0004c4738 pc=0x55ba83b60449 ollama-1 | runtime.gcBgMarkStartWorkers.gowrap1() ollama-1 | runtime/mgc.go:1339 +0x25 fp=0xc0004c47e0 sp=0xc0004c47c8 pc=0x55ba83b60325 ollama-1 | runtime.goexit({}) ollama-1 | runtime/asm_amd64.s:1700 +0x1 fp=0xc0004c47e8 sp=0xc0004c47e0 pc=0x55ba83bba4e1 ollama-1 | created by runtime.gcBgMarkStartWorkers in goroutine 1 ollama-1 | runtime/mgc.go:1339 +0x105 ollama-1 | ollama-1 | goroutine 98 gp=0xc000504000 m=nil [GC worker (idle)]: ollama-1 | runtime.gopark(0xb46601b03d88a?, 0x3?, 0xb4?, 0x84?, 0x0?) ollama-1 | runtime/proc.go:435 +0xce fp=0xc00050a738 sp=0xc00050a718 pc=0x55ba83bb2dae ollama-1 | runtime.gcBgMarkWorker(0xc000137570) ollama-1 | runtime/mgc.go:1423 +0xe9 fp=0xc00050a7c8 sp=0xc00050a738 pc=0x55ba83b60449 ollama-1 | runtime.gcBgMarkStartWorkers.gowrap1() ollama-1 | runtime/mgc.go:1339 +0x25 fp=0xc00050a7e0 sp=0xc00050a7c8 pc=0x55ba83b60325 ollama-1 | runtime.goexit({}) ollama-1 | runtime/asm_amd64.s:1700 +0x1 fp=0xc00050a7e8 sp=0xc00050a7e0 pc=0x55ba83bba4e1 ollama-1 | created by runtime.gcBgMarkStartWorkers in goroutine 1 ollama-1 | runtime/mgc.go:1339 +0x105 ollama-1 | ollama-1 | goroutine 99 gp=0xc0005041c0 m=nil [GC worker (idle)]: ollama-1 | runtime.gopark(0xb46601b0519e3?, 0x1?, 0xa6?, 0xc9?, 0x0?) ollama-1 | runtime/proc.go:435 +0xce fp=0xc00050af38 sp=0xc00050af18 pc=0x55ba83bb2dae ollama-1 | runtime.gcBgMarkWorker(0xc000137570) ollama-1 | runtime/mgc.go:1423 +0xe9 fp=0xc00050afc8 sp=0xc00050af38 pc=0x55ba83b60449 ollama-1 | runtime.gcBgMarkStartWorkers.gowrap1() ollama-1 | runtime/mgc.go:1339 +0x25 fp=0xc00050afe0 sp=0xc00050afc8 pc=0x55ba83b60325 ollama-1 | runtime.goexit({}) ollama-1 | runtime/asm_amd64.s:1700 +0x1 fp=0xc00050afe8 sp=0xc00050afe0 pc=0x55ba83bba4e1 ollama-1 | created by runtime.gcBgMarkStartWorkers in goroutine 1 ollama-1 | runtime/mgc.go:1339 +0x105 ollama-1 | ollama-1 | goroutine 100 gp=0xc000504380 m=nil [GC worker (idle)]: ollama-1 | runtime.gopark(0xb46601b02c70f?, 0x3?, 0xbe?, 0x4b?, 0x0?) ollama-1 | runtime/proc.go:435 +0xce fp=0xc00050b738 sp=0xc00050b718 pc=0x55ba83bb2dae ollama-1 | runtime.gcBgMarkWorker(0xc000137570) ollama-1 | runtime/mgc.go:1423 +0xe9 fp=0xc00050b7c8 sp=0xc00050b738 pc=0x55ba83b60449 ollama-1 | runtime.gcBgMarkStartWorkers.gowrap1() ollama-1 | runtime/mgc.go:1339 +0x25 fp=0xc00050b7e0 sp=0xc00050b7c8 pc=0x55ba83b60325 ollama-1 | runtime.goexit({}) ollama-1 | runtime/asm_amd64.s:1700 +0x1 fp=0xc00050b7e8 sp=0xc00050b7e0 pc=0x55ba83bba4e1 ollama-1 | created by runtime.gcBgMarkStartWorkers in goroutine 1 ollama-1 | runtime/mgc.go:1339 +0x105 ollama-1 | ollama-1 | goroutine 101 gp=0xc000504540 m=nil [GC worker (idle)]: ollama-1 | runtime.gopark(0xb46601b018576?, 0x3?, 0x95?, 0xeb?, 0x0?) ollama-1 | runtime/proc.go:435 +0xce fp=0xc00050bf38 sp=0xc00050bf18 pc=0x55ba83bb2dae ollama-1 | runtime.gcBgMarkWorker(0xc000137570) ollama-1 | runtime/mgc.go:1423 +0xe9 fp=0xc00050bfc8 sp=0xc00050bf38 pc=0x55ba83b60449 ollama-1 | runtime.gcBgMarkStartWorkers.gowrap1() ollama-1 | runtime/mgc.go:1339 +0x25 fp=0xc00050bfe0 sp=0xc00050bfc8 pc=0x55ba83b60325 ollama-1 | runtime.goexit({}) ollama-1 | runtime/asm_amd64.s:1700 +0x1 fp=0xc00050bfe8 sp=0xc00050bfe0 pc=0x55ba83bba4e1 ollama-1 | created by runtime.gcBgMarkStartWorkers in goroutine 1 ollama-1 | runtime/mgc.go:1339 +0x105 ollama-1 | ollama-1 | goroutine 102 gp=0xc000504700 m=nil [GC worker (idle)]: ollama-1 | runtime.gopark(0xb46601b096db3?, 0x1?, 0x93?, 0xbf?, 0x0?) ollama-1 | runtime/proc.go:435 +0xce fp=0xc00050c738 sp=0xc00050c718 pc=0x55ba83bb2dae ollama-1 | runtime.gcBgMarkWorker(0xc000137570) ollama-1 | runtime/mgc.go:1423 +0xe9 fp=0xc00050c7c8 sp=0xc00050c738 pc=0x55ba83b60449 ollama-1 | runtime.gcBgMarkStartWorkers.gowrap1() ollama-1 | runtime/mgc.go:1339 +0x25 fp=0xc00050c7e0 sp=0xc00050c7c8 pc=0x55ba83b60325 ollama-1 | runtime.goexit({}) ollama-1 | runtime/asm_amd64.s:1700 +0x1 fp=0xc00050c7e8 sp=0xc00050c7e0 pc=0x55ba83bba4e1 ollama-1 | created by runtime.gcBgMarkStartWorkers in goroutine 1 ollama-1 | runtime/mgc.go:1339 +0x105 ollama-1 | ollama-1 | goroutine 103 gp=0xc0004cc8c0 m=nil [select]: ollama-1 | runtime.gopark(0xc000049a28?, 0x2?, 0x0?, 0xba?, 0xc000049894?) ollama-1 | runtime/proc.go:435 +0xce fp=0xc0000496a8 sp=0xc000049688 pc=0x55ba83bb2dae ollama-1 | runtime.selectgo(0xc000049a28, 0xc000049890, 0x1000?, 0x0, 0x4?, 0x1) ollama-1 | runtime/select.go:351 +0x837 fp=0xc0000497e0 sp=0xc0000496a8 pc=0x55ba83b91697 ollama-1 | github.com/ollama/ollama/runner/ollamarunner.(*Server).completion(0xc000352a20, {0x55ba84e92438, 0xc000000700}, 0xc0016003c0) ollama-1 | github.com/ollama/ollama/runner/ollamarunner/runner.go:677 +0xb05 fp=0xc000049ac0 sp=0xc0000497e0 pc=0x55ba84067945 ollama-1 | github.com/ollama/ollama/runner/ollamarunner.(*Server).completion-fm({0x55ba84e92438?, 0xc000000700?}, 0xc000197b40?) ollama-1 | <autogenerated>:1 +0x36 fp=0xc000049af0 sp=0xc000049ac0 pc=0x55ba84069ef6 ollama-1 | net/http.HandlerFunc.ServeHTTP(0xc0004f4780?, {0x55ba84e92438?, 0xc000000700?}, 0xc000197b60?) ollama-1 | net/http/server.go:2294 +0x29 fp=0xc000049b18 sp=0xc000049af0 pc=0x55ba83eb13c9 ollama-1 | net/http.(*ServeMux).ServeHTTP(0x55ba83b57465?, {0x55ba84e92438, 0xc000000700}, 0xc0016003c0) ollama-1 | net/http/server.go:2822 +0x1c4 fp=0xc000049b68 sp=0xc000049b18 pc=0x55ba83eb32c4 ollama-1 | net/http.serverHandler.ServeHTTP({0x55ba84e8eb10?}, {0x55ba84e92438?, 0xc000000700?}, 0x1?) ollama-1 | net/http/server.go:3301 +0x8e fp=0xc000049b98 sp=0xc000049b68 pc=0x55ba83ed0d4e ollama-1 | net/http.(*conn).serve(0xc0002fc240, {0x55ba84e94518, 0xc0002f6c00}) ollama-1 | net/http/server.go:2102 +0x625 fp=0xc000049fb8 sp=0xc000049b98 pc=0x55ba83eaf8c5 ollama-1 | net/http.(*Server).Serve.gowrap3() ollama-1 | net/http/server.go:3454 +0x28 fp=0xc000049fe0 sp=0xc000049fb8 pc=0x55ba83eb5188 ollama-1 | runtime.goexit({}) ollama-1 | runtime/asm_amd64.s:1700 +0x1 fp=0xc000049fe8 sp=0xc000049fe0 pc=0x55ba83bba4e1 ollama-1 | created by net/http.(*Server).Serve in goroutine 1 ollama-1 | net/http/server.go:3454 +0x485 ollama-1 | ollama-1 | goroutine 347 gp=0xc00255ee00 m=nil [IO wait]: ollama-1 | runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0xb?) ollama-1 | runtime/proc.go:435 +0xce fp=0xc0016fb5d8 sp=0xc0016fb5b8 pc=0x55ba83bb2dae ollama-1 | runtime.netpollblock(0x55ba83bd6238?, 0x83b4c566?, 0xba?) ollama-1 | runtime/netpoll.go:575 +0xf7 fp=0xc0016fb610 sp=0xc0016fb5d8 pc=0x55ba83b77b97 ollama-1 | internal/poll.runtime_pollWait(0x796b56e21d98, 0x72) ollama-1 | runtime/netpoll.go:351 +0x85 fp=0xc0016fb630 sp=0xc0016fb610 pc=0x55ba83bb1fc5 ollama-1 | internal/poll.(*pollDesc).wait(0xc0006d0000?, 0xc0002b8131?, 0x0) ollama-1 | internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc0016fb658 sp=0xc0016fb630 pc=0x55ba83c39447 ollama-1 | internal/poll.(*pollDesc).waitRead(...) ollama-1 | internal/poll/fd_poll_runtime.go:89 ollama-1 | internal/poll.(*FD).Read(0xc0006d0000, {0xc0002b8131, 0x1, 0x1}) ollama-1 | internal/poll/fd_unix.go:165 +0x27a fp=0xc0016fb6f0 sp=0xc0016fb658 pc=0x55ba83c3a73a ollama-1 | net.(*netFD).Read(0xc0006d0000, {0xc0002b8131?, 0x0?, 0x0?}) ollama-1 | net/fd_posix.go:55 +0x25 fp=0xc0016fb738 sp=0xc0016fb6f0 pc=0x55ba83caf685 ollama-1 | net.(*conn).Read(0xc00061c000, {0xc0002b8131?, 0x0?, 0x0?}) ollama-1 | net/net.go:194 +0x45 fp=0xc0016fb780 sp=0xc0016fb738 pc=0x55ba83cbda45 ollama-1 | net/http.(*connReader).backgroundRead(0xc0002b8120) ollama-1 | net/http/server.go:690 +0x37 fp=0xc0016fb7c8 sp=0xc0016fb780 pc=0x55ba83ea9797 ollama-1 | net/http.(*connReader).startBackgroundRead.gowrap2() ollama-1 | net/http/server.go:686 +0x25 fp=0xc0016fb7e0 sp=0xc0016fb7c8 pc=0x55ba83ea96c5 ollama-1 | runtime.goexit({}) ollama-1 | runtime/asm_amd64.s:1700 +0x1 fp=0xc0016fb7e8 sp=0xc0016fb7e0 pc=0x55ba83bba4e1 ollama-1 | created by net/http.(*connReader).startBackgroundRead in goroutine 103 ollama-1 | net/http/server.go:686 +0xb6 ollama-1 | ollama-1 | rax 0x0 ollama-1 | rbx 0x796a26a00700 ollama-1 | rcx 0x796b5703200b ollama-1 | rdx 0x0 ollama-1 | rdi 0x2 ollama-1 | rsi 0x796a269ff8f0 ollama-1 | rbp 0x7969d4e01c85 ollama-1 | rsp 0x796a269ff8f0 ollama-1 | r8 0x0 ollama-1 | r9 0x796a269ff8f0 ollama-1 | r10 0x8 ollama-1 | r11 0x246 ollama-1 | r12 0x7969d4e021b8 ollama-1 | r13 0x49 ollama-1 | r14 0x796b0c7e04f8 ollama-1 | r15 0x796ab800bbe0 ollama-1 | rip 0x796b5703200b ollama-1 | rflags 0x246 ollama-1 | cs 0x33 ollama-1 | fs 0x0 ollama-1 | gs 0x0 ollama-1 | [GIN] 2025/04/23 - 03:17:31 | 500 | 17.974505069s | 192.168.176.1 | POST "/api/chat" ollama-1 | [GIN] 2025/04/23 - 03:17:31 | 500 | 17.860236069s | 192.168.176.1 | POST "/api/chat" ollama-1 | [GIN] 2025/04/23 - 03:17:31 | 500 | 17.771792067s | 192.168.176.1 | POST "/api/chat" ollama-1 | [GIN] 2025/04/23 - 03:17:31 | 500 | 15.635726626s | 192.168.176.1 | POST "/api/chat" ollama-1 | time=2025-04-23T03:17:31.379Z level=ERROR source=server.go:449 msg="llama runner terminated" error="exit status 2" ``` ### OS docker with NVIDIA CUDA support ### GPU RTX4090 ### CPU Dual Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz ### Ollama version 0.6.5 and 0.6.6
GiteaMirror added the bug label 2025-11-12 13:46:20 -06:00
Author
Owner

@codearranger commented on GitHub (Apr 23, 2025):

I just tested with 0.6.6 prerelease with the same results.

Image
@codearranger commented on GitHub (Apr 23, 2025): I just tested with 0.6.6 prerelease with the same results. <img width="1309" alt="Image" src="https://github.com/user-attachments/assets/c937ed1a-8d21-4c99-846e-da72765029c8" />
Author
Owner

@codearranger commented on GitHub (Apr 23, 2025):

Image

Image

@codearranger commented on GitHub (Apr 23, 2025): <img width="512" alt="Image" src="https://github.com/user-attachments/assets/c119c527-da37-4b18-bad9-cb2f0b964d51" /> ![Image](https://github.com/user-attachments/assets/13e5c1d7-42ca-4e13-99b9-a093314d8e8f)
Author
Owner

@jessegross commented on GitHub (Apr 24, 2025):

Can you please post the full log?

@jessegross commented on GitHub (Apr 24, 2025): Can you please post the full log?
Author
Owner

@codearranger commented on GitHub (Apr 25, 2025):

ollama.log

@codearranger commented on GitHub (Apr 25, 2025): [ollama.log](https://github.com/user-attachments/files/19902245/ollama.log)
Author
Owner

@jessegross commented on GitHub (Apr 25, 2025):

Thanks, this looks like it is the same as #10234 so I'm going to consolidate them over there. There is a patch linked to that issue, though I have not reviewed it yet.

@jessegross commented on GitHub (Apr 25, 2025): Thanks, this looks like it is the same as #10234 so I'm going to consolidate them over there. There is a patch linked to that issue, though I have not reviewed it yet.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama-ollama#6828