[GH-ISSUE #11753] gpt-oss:20b OOM during inference with large context size #85476

Closed
opened 2026-05-10 00:16:33 -05:00 by GiteaMirror · 3 comments
Owner

Originally created by @mikhail-shevtsov-wiregate on GitHub (Aug 6, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/11753

What is the issue?

I've noticed memory leak when using large context with gpt-oss:20b on Nvidia RTX 3090
In my example I had num_ctx: 36864 and actual context ~34k
After 30 seconds it crashes with OOM. I don't see such problem with qwen3 models. Memory consumption is static.

https://github.com/user-attachments/assets/164417bd-7e40-46a9-abc0-15f18272e5af

Relevant log output

Full log
releasing cuda driver library
time=2025-08-06T19:42:45.813Z level=INFO source=server.go:175 msg=offload library=cuda layers.requested=-1 layers.model=25 layers.offload=25 layers.split="" memory.available="[23.4 GiB]" memory.gpu_overhead="0 B" memory.required.full="22.6 GiB" memory.required.partial="22.6 GiB" memory.required.kv="972.0 MiB" memory.required.allocations="[22.6 GiB]" memory.weights.total="11.7 GiB" memory.weights.repeating="10.7 GiB" memory.weights.nonrepeating="1.1 GiB" memory.graph.full="9.0 GiB" memory.graph.partial="9.0 GiB"
time=2025-08-06T19:42:45.813Z level=WARN source=server.go:211 msg="flash attention enabled but not supported by model"
time=2025-08-06T19:42:45.813Z level=WARN source=server.go:229 msg="quantized kv cache requested but flash attention disabled" type=q8_0
time=2025-08-06T19:42:45.813Z level=DEBUG source=server.go:291 msg="compatible gpu libraries" compatible=[]
time=2025-08-06T19:42:45.882Z level=DEBUG source=ggml.go:208 msg="key with type not found" key=general.alignment default=32
time=2025-08-06T19:42:45.883Z level=DEBUG source=ggml.go:208 msg="key with type not found" key=tokenizer.ggml.pretokenizer default="[^\\r\\n\\p{L}\\p{N}]?[\\p{Lu}\\p{Lt}\\p{Lm}\\p{Lo}\\p{M}]*[\\p{Ll}\\p{Lm}\\p{Lo}\\p{M}]+(?i:'s|'t|'re|'ve|'m|'ll|'d)?|[^\\r\\n\\p{L}\\p{N}]?[\\p{Lu}\\p{Lt}\\p{Lm}\\p{Lo}\\p{M}]+[\\p{Ll}\\p{Lm}\\p{Lo}\\p{M}]*(?i:'s|'t|'re|'ve|'m|'ll|'d)?|\\p{N}{1,3}| ?[^\\s\\p{L}\\p{N}]+[\\r\\n/]*|\\s*[\\r\\n]+|\\s+(?!\\S)|\\s+"
time=2025-08-06T19:42:45.883Z level=INFO source=server.go:438 msg="starting llama server" cmd="/usr/bin/ollama runner --ollama-engine --model /root/.ollama/models/blobs/sha256-b112e727c6f18875636c56a779790a590d705aec9e1c0eb5a97d51fc2a778583 --ctx-size 36864 --batch-size 512 --n-gpu-layers 25 --threads 8 --parallel 1 --port 44317"
time=2025-08-06T19:42:45.883Z level=DEBUG source=server.go:439 msg=subprocess PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin OLLAMA_KV_CACHE_TYPE=q8_0 OLLAMA_DEBUG=1 OLLAMA_FLASH_ATTENTION=1 LD_LIBRARY_PATH=/usr/lib/ollama:/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/usr/lib/ollama OLLAMA_HOST=0.0.0.0:11434 OLLAMA_MAX_LOADED_MODELS=3 OLLAMA_LIBRARY_PATH=/usr/lib/ollama CUDA_VISIBLE_DEVICES=GPU-4a702f36-6e27-c7db-15af-79cfe15ee9df
time=2025-08-06T19:42:45.884Z level=INFO source=sched.go:481 msg="loaded runners" count=1
time=2025-08-06T19:42:45.884Z level=INFO source=server.go:598 msg="waiting for llama runner to start responding"
time=2025-08-06T19:42:45.884Z level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server not responding"
time=2025-08-06T19:42:45.900Z level=INFO source=runner.go:925 msg="starting ollama engine"
time=2025-08-06T19:42:45.901Z level=INFO source=runner.go:983 msg="Server listening on 127.0.0.1:44317"
time=2025-08-06T19:42:45.983Z level=DEBUG source=ggml.go:208 msg="key with type not found" key=general.alignment default=32
time=2025-08-06T19:42:45.983Z level=DEBUG source=ggml.go:208 msg="key with type not found" key=general.name default=""
time=2025-08-06T19:42:45.984Z level=DEBUG source=ggml.go:208 msg="key with type not found" key=general.description default=""
time=2025-08-06T19:42:45.984Z level=INFO source=ggml.go:92 msg="" architecture=gptoss file_type=MXFP4 name="" description="" num_tensors=315 num_key_values=30
time=2025-08-06T19:42:45.984Z level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/lib/ollama
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
load_backend: loaded CUDA backend from /usr/lib/ollama/libggml-cuda.so
load_backend: loaded CPU backend from /usr/lib/ollama/libggml-cpu-haswell.so
time=2025-08-06T19:42:46.050Z level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=500,600,610,700,750,800,860,870,890,900,1200 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc)
time=2025-08-06T19:42:46.135Z level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server loading model"
time=2025-08-06T19:42:46.141Z level=INFO source=ggml.go:367 msg="offloading 24 repeating layers to GPU"
time=2025-08-06T19:42:46.141Z level=INFO source=ggml.go:373 msg="offloading output layer to GPU"
time=2025-08-06T19:42:46.141Z level=INFO source=ggml.go:378 msg="offloaded 25/25 layers to GPU"
time=2025-08-06T19:42:46.141Z level=INFO source=ggml.go:381 msg="model weights" buffer=CUDA0 size="11.7 GiB"
time=2025-08-06T19:42:46.141Z level=INFO source=ggml.go:381 msg="model weights" buffer=CPU size="1.1 GiB"
time=2025-08-06T19:42:46.142Z level=DEBUG source=ggml.go:208 msg="key with type not found" key=tokenizer.ggml.pretokenizer default="[^\\r\\n\\p{L}\\p{N}]?[\\p{Lu}\\p{Lt}\\p{Lm}\\p{Lo}\\p{M}]*[\\p{Ll}\\p{Lm}\\p{Lo}\\p{M}]+(?i:'s|'t|'re|'ve|'m|'ll|'d)?|[^\\r\\n\\p{L}\\p{N}]?[\\p{Lu}\\p{Lt}\\p{Lm}\\p{Lo}\\p{M}]+[\\p{Ll}\\p{Lm}\\p{Lo}\\p{M}]*(?i:'s|'t|'re|'ve|'m|'ll|'d)?|\\p{N}{1,3}| ?[^\\s\\p{L}\\p{N}]+[\\r\\n/]*|\\s*[\\r\\n]+|\\s+(?!\\S)|\\s+"
time=2025-08-06T19:42:46.158Z level=DEBUG source=ggml.go:654 msg="compute graph" nodes=1847 splits=2
time=2025-08-06T19:42:46.158Z level=INFO source=ggml.go:672 msg="compute graph" backend=CUDA0 buffer_type=CUDA0 size="9.1 GiB"
time=2025-08-06T19:42:46.158Z level=INFO source=ggml.go:672 msg="compute graph" backend=CPU buffer_type=CPU size="5.6 MiB"
time=2025-08-06T19:42:46.158Z level=DEBUG source=runner.go:883 msg=memory allocated.InputWeights=1158266880A allocated.CPU.Graph=5898240A allocated.CUDA0.ID=GPU-4a702f36-6e27-c7db-15af-79cfe15ee9df allocated.CUDA0.Weights="[477075840A 477075840A 477075840A 477075840A 477075840A 477075840A 477075840A 477075840A 477075840A 477075840A 477075840A 477075840A 477075840A 477075840A 477075840A 477075840A 477075840A 477075840A 477075840A 477075840A 477075840A 477075840A 477075840A 477075840A 1158278400A]" allocated.CUDA0.Cache="[9437184A 75497472A 9437184A 75497472A 9437184A 75497472A 9437184A 75497472A 9437184A 75497472A 9437184A 75497472A 9437184A 75497472A 9437184A 75497472A 9437184A 75497472A 9437184A 75497472A 9437184A 75497472A 9437184A 75497472A 0U]" allocated.CUDA0.Graph=9779415296A
time=2025-08-06T19:42:46.386Z level=DEBUG source=server.go:643 msg="model load progress 0.03"
time=2025-08-06T19:42:46.637Z level=DEBUG source=server.go:643 msg="model load progress 0.07"
time=2025-08-06T19:42:46.888Z level=DEBUG source=server.go:643 msg="model load progress 0.10"
time=2025-08-06T19:42:47.139Z level=DEBUG source=server.go:643 msg="model load progress 0.13"
time=2025-08-06T19:42:47.390Z level=DEBUG source=server.go:643 msg="model load progress 0.16"
time=2025-08-06T19:42:47.641Z level=DEBUG source=server.go:643 msg="model load progress 0.20"
time=2025-08-06T19:42:47.892Z level=DEBUG source=server.go:643 msg="model load progress 0.23"
time=2025-08-06T19:42:48.143Z level=DEBUG source=server.go:643 msg="model load progress 0.27"
time=2025-08-06T19:42:48.393Z level=DEBUG source=server.go:643 msg="model load progress 0.30"
time=2025-08-06T19:42:48.644Z level=DEBUG source=server.go:643 msg="model load progress 0.34"
time=2025-08-06T19:42:48.895Z level=DEBUG source=server.go:643 msg="model load progress 0.37"
time=2025-08-06T19:42:49.146Z level=DEBUG source=server.go:643 msg="model load progress 0.40"
time=2025-08-06T19:42:49.397Z level=DEBUG source=server.go:643 msg="model load progress 0.43"
time=2025-08-06T19:42:49.648Z level=DEBUG source=server.go:643 msg="model load progress 0.47"
time=2025-08-06T19:42:49.899Z level=DEBUG source=server.go:643 msg="model load progress 0.50"
time=2025-08-06T19:42:50.149Z level=DEBUG source=server.go:643 msg="model load progress 0.53"
time=2025-08-06T19:42:50.400Z level=DEBUG source=server.go:643 msg="model load progress 0.56"
time=2025-08-06T19:42:50.651Z level=DEBUG source=server.go:643 msg="model load progress 0.60"
time=2025-08-06T19:42:50.902Z level=DEBUG source=server.go:643 msg="model load progress 0.63"
time=2025-08-06T19:42:51.153Z level=DEBUG source=server.go:643 msg="model load progress 0.66"
time=2025-08-06T19:42:51.404Z level=DEBUG source=server.go:643 msg="model load progress 0.69"
time=2025-08-06T19:42:51.655Z level=DEBUG source=server.go:643 msg="model load progress 0.73"
time=2025-08-06T19:42:51.906Z level=DEBUG source=server.go:643 msg="model load progress 0.79"
time=2025-08-06T19:42:52.156Z level=DEBUG source=server.go:643 msg="model load progress 0.85"
time=2025-08-06T19:42:52.407Z level=DEBUG source=server.go:643 msg="model load progress 0.92"
time=2025-08-06T19:42:52.658Z level=DEBUG source=server.go:643 msg="model load progress 0.95"
time=2025-08-06T19:42:52.909Z level=DEBUG source=server.go:643 msg="model load progress 0.97"
time=2025-08-06T19:42:53.160Z level=DEBUG source=server.go:643 msg="model load progress 1.00"
time=2025-08-06T19:42:53.411Z level=INFO source=server.go:637 msg="llama runner started in 7.53 seconds"
time=2025-08-06T19:42:53.411Z level=DEBUG source=sched.go:493 msg="finished setting up" runner.name=registry.ollama.ai/library/gpt-oss:20b runner.inference=cuda runner.devices=1 runner.size="22.6 GiB" runner.vram="22.6 GiB" runner.parallel=1 runner.pid=648 runner.model=/root/.ollama/models/blobs/sha256-b112e727c6f18875636c56a779790a590d705aec9e1c0eb5a97d51fc2a778583 runner.num_ctx=36864
time=2025-08-06T19:42:53.411Z level=DEBUG source=server.go:736 msg="completion request" images=0 prompt=2876 format=""
time=2025-08-06T19:42:53.491Z level=DEBUG source=cache.go:136 msg="loading cache slot" id=0 cache=0 prompt=768 used=0 remaining=768
[GIN] 2025/08/06 - 19:43:01 | 200 | 18.015478674s |      172.18.0.1 | POST     "/api/chat"
time=2025-08-06T19:43:01.888Z level=DEBUG source=sched.go:501 msg="context for request finished"
time=2025-08-06T19:43:01.889Z level=DEBUG source=sched.go:341 msg="runner with non-zero duration has gone idle, adding timer" runner.name=registry.ollama.ai/library/gpt-oss:20b runner.inference=cuda runner.devices=1 runner.size="22.6 GiB" runner.vram="22.6 GiB" runner.parallel=1 runner.pid=648 runner.model=/root/.ollama/models/blobs/sha256-b112e727c6f18875636c56a779790a590d705aec9e1c0eb5a97d51fc2a778583 runner.num_ctx=36864 duration=2562047h47m16.854775807s
time=2025-08-06T19:43:01.889Z level=DEBUG source=sched.go:359 msg="after processing request finished event" runner.name=registry.ollama.ai/library/gpt-oss:20b runner.inference=cuda runner.devices=1 runner.size="22.6 GiB" runner.vram="22.6 GiB" runner.parallel=1 runner.pid=648 runner.model=/root/.ollama/models/blobs/sha256-b112e727c6f18875636c56a779790a590d705aec9e1c0eb5a97d51fc2a778583 runner.num_ctx=36864 refCount=0
[GIN] 2025/08/06 - 19:45:45 | 200 |      25.857µs |       127.0.0.1 | HEAD     "/"
[GIN] 2025/08/06 - 19:45:45 | 200 |      42.668µs |       127.0.0.1 | GET      "/api/ps"
time=2025-08-06T19:47:23.851Z level=DEBUG source=sched.go:613 msg="evaluating already loaded" model=/root/.ollama/models/blobs/sha256-b112e727c6f18875636c56a779790a590d705aec9e1c0eb5a97d51fc2a778583
time=2025-08-06T19:47:23.852Z level=DEBUG source=server.go:736 msg="completion request" images=0 prompt=137980 format=""
time=2025-08-06T19:47:23.957Z level=DEBUG source=cache.go:136 msg="loading cache slot" id=0 cache=1242 prompt=31570 used=64 remaining=31506
time=2025-08-06T19:47:29.468Z level=DEBUG source=causal.go:426 msg="defragmenting kv cache"
time=2025-08-06T19:47:35.977Z level=DEBUG source=causal.go:426 msg="defragmenting kv cache"
time=2025-08-06T19:47:36.908Z level=DEBUG source=causal.go:426 msg="defragmenting kv cache"
time=2025-08-06T19:47:44.951Z level=DEBUG source=causal.go:426 msg="defragmenting kv cache"
time=2025-08-06T19:47:46.040Z level=DEBUG source=causal.go:426 msg="defragmenting kv cache"
time=2025-08-06T19:47:55.345Z level=DEBUG source=causal.go:426 msg="defragmenting kv cache"
time=2025-08-06T19:47:56.641Z level=DEBUG source=causal.go:426 msg="defragmenting kv cache"
time=2025-08-06T19:48:07.482Z level=DEBUG source=causal.go:426 msg="defragmenting kv cache"
time=2025-08-06T19:48:08.981Z level=DEBUG source=causal.go:426 msg="defragmenting kv cache"
CUDA error: out of memory
  current device: 0, in function alloc at //ml/backend/ggml/ggml/src/ggml-cuda/ggml-cuda.cu:452
  cuMemCreate(&handle, reserve_size, &prop, 0)
//ml/backend/ggml/ggml/src/ggml-cuda/ggml-cuda.cu:77: CUDA error
SIGSEGV: segmentation violation
PC=0x7b5de1024fb7 m=14 sigcode=1 addr=0x204a03f7c
signal arrived during cgo execution

goroutine 22 gp=0xc000306e00 m=14 mp=0xc000581008 [syscall]:
runtime.cgocall(0x610fe9e51720, 0xc0042f3a58)
    runtime/cgocall.go:167 +0x4b fp=0xc0042f3a30 sp=0xc0042f39f8 pc=0x610fe91848cb
github.com/ollama/ollama/ml/backend/ggml._Cfunc_ggml_backend_sched_graph_compute_async(0x7b5df90110f0, 0x7b572c006150)
    _cgo_gotypes.go:886 +0x4a fp=0xc0042f3a58 sp=0xc0042f3a30 pc=0x610fe95bec6a
github.com/ollama/ollama/ml/backend/ggml.(*Context).Compute.func1(...)
    github.com/ollama/ollama/ml/backend/ggml/ggml.go:631
github.com/ollama/ollama/ml/backend/ggml.(*Context).Compute(0xc001a68000, {0xc034cbc360, 0x1, 0x0?})
    github.com/ollama/ollama/ml/backend/ggml/ggml.go:631 +0x9d fp=0xc0042f3b00 sp=0xc0042f3a58 pc=0x610fe95ca17d
github.com/ollama/ollama/model.Forward({0x610fea505a90, 0xc001a68000}, {0x610fea4fc2b0, 0xc0000e96b0}, {0xc0013b8800, 0x200, 0x200}, {{0x610fea5106e8, 0xc0018f0018}, {0x0, ...}, ...})
    github.com/ollama/ollama/model/model.go:305 +0x2a7 fp=0xc0042f3be8 sp=0xc0042f3b00 pc=0x610fe95d8147
github.com/ollama/ollama/runner/ollamarunner.(*Server).processBatch(0xc0007285a0)
    github.com/ollama/ollama/runner/ollamarunner/runner.go:480 +0x4c5 fp=0xc0042f3f98 sp=0xc0042f3be8 pc=0x610fe9679085
github.com/ollama/ollama/runner/ollamarunner.(*Server).run(0xc0007285a0, {0x610fea4fd790, 0xc000690690})
    github.com/ollama/ollama/runner/ollamarunner/runner.go:362 +0x4e fp=0xc0042f3fb8 sp=0xc0042f3f98 pc=0x610fe9678b6e
github.com/ollama/ollama/runner/ollamarunner.Execute.gowrap2()
    github.com/ollama/ollama/runner/ollamarunner/runner.go:960 +0x28 fp=0xc0042f3fe0 sp=0xc0042f3fb8 pc=0x610fe967e2c8
runtime.goexit({})
    runtime/asm_amd64.s:1700 +0x1 fp=0xc0042f3fe8 sp=0xc0042f3fe0 pc=0x610fe918f481
created by github.com/ollama/ollama/runner/ollamarunner.Execute in goroutine 1
    github.com/ollama/ollama/runner/ollamarunner/runner.go:960 +0xa74

goroutine 1 gp=0xc000002380 m=nil [IO wait, 2 minutes]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
    runtime/proc.go:435 +0xce fp=0xc000313650 sp=0xc000313630 pc=0x610fe9187d4e
runtime.netpollblock(0xc0003136a0?, 0xe9120b46?, 0xf?)
    runtime/netpoll.go:575 +0xf7 fp=0xc000313688 sp=0xc000313650 pc=0x610fe914c837
internal/poll.runtime_pollWait(0x7b5e085c6eb0, 0x72)
    runtime/netpoll.go:351 +0x85 fp=0xc0003136a8 sp=0xc000313688 pc=0x610fe9186f65
internal/poll.(*pollDesc).wait(0xc000716280?, 0x90012ae3e?, 0x0)
    internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc0003136d0 sp=0xc0003136a8 pc=0x610fe920e3a7
internal/poll.(*pollDesc).waitRead(...)
    internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Accept(0xc000716280)
    internal/poll/fd_unix.go:620 +0x295 fp=0xc000313778 sp=0xc0003136d0 pc=0x610fe9213775
net.(*netFD).accept(0xc000716280)
    net/fd_unix.go:172 +0x29 fp=0xc000313830 sp=0xc000313778 pc=0x610fe9285d89
net.(*TCPListener).accept(0xc0001406c0)
    net/tcpsock_posix.go:159 +0x1b fp=0xc000313880 sp=0xc000313830 pc=0x610fe929b73b
net.(*TCPListener).Accept(0xc0001406c0)
    net/tcpsock.go:380 +0x30 fp=0xc0003138b0 sp=0xc000313880 pc=0x610fe929a5f0
net/http.(*onceCloseListener).Accept(0xc0005443f0?)
    <autogenerated>:1 +0x24 fp=0xc0003138c8 sp=0xc0003138b0 pc=0x610fe94b1d44
net/http.(*Server).Serve(0xc0001ff400, {0x610fea4fb2e8, 0xc0001406c0})
    net/http/server.go:3424 +0x30c fp=0xc0003139f8 sp=0xc0003138c8 pc=0x610fe948960c
github.com/ollama/ollama/runner/ollamarunner.Execute({0xc000034150, 0xe, 0xf})
    github.com/ollama/ollama/runner/ollamarunner/runner.go:984 +0xe09 fp=0xc000313d08 sp=0xc0003139f8 pc=0x610fe967e029
github.com/ollama/ollama/runner.Execute({0xc000034130?, 0x0?, 0x0?})
    github.com/ollama/ollama/runner/runner.go:20 +0xc9 fp=0xc000313d30 sp=0xc000313d08 pc=0x610fe967e929
github.com/ollama/ollama/cmd.NewCLI.func2(0xc0001ff200?, {0x610fea03e07e?, 0x4?, 0x610fea03e082?})
    github.com/ollama/ollama/cmd/cmd.go:1583 +0x45 fp=0xc000313d58 sp=0xc000313d30 pc=0x610fe9de3685
github.com/spf13/cobra.(*Command).execute(0xc000546f08, {0xc0005a8870, 0xf, 0xf})
    github.com/spf13/cobra@v1.7.0/command.go:940 +0x85c fp=0xc000313e78 sp=0xc000313d58 pc=0x610fe92ff3dc
github.com/spf13/cobra.(*Command).ExecuteC(0xc000734908)
    github.com/spf13/cobra@v1.7.0/command.go:1068 +0x3a5 fp=0xc000313f30 sp=0xc000313e78 pc=0x610fe92ffc25
github.com/spf13/cobra.(*Command).Execute(...)
    github.com/spf13/cobra@v1.7.0/command.go:992
github.com/spf13/cobra.(*Command).ExecuteContext(...)
    github.com/spf13/cobra@v1.7.0/command.go:985
main.main()
    github.com/ollama/ollama/main.go:12 +0x4d fp=0xc000313f50 sp=0xc000313f30 pc=0x610fe9de416d
runtime.main()
    runtime/proc.go:283 +0x29d fp=0xc000313fe0 sp=0xc000313f50 pc=0x610fe9153ebd
runtime.goexit({})
    runtime/asm_amd64.s:1700 +0x1 fp=0xc000313fe8 sp=0xc000313fe0 pc=0x610fe918f481

goroutine 2 gp=0xc000002e00 m=nil [force gc (idle), 5 minutes]:
runtime.gopark(0x239b11b73f90?, 0x0?, 0x0?, 0x0?, 0x0?)
    runtime/proc.go:435 +0xce fp=0xc000084fa8 sp=0xc000084f88 pc=0x610fe9187d4e
runtime.goparkunlock(...)
    runtime/proc.go:441
runtime.forcegchelper()
    runtime/proc.go:348 +0xb8 fp=0xc000084fe0 sp=0xc000084fa8 pc=0x610fe91541f8
runtime.goexit({})
    runtime/asm_amd64.s:1700 +0x1 fp=0xc000084fe8 sp=0xc000084fe0 pc=0x610fe918f481
created by runtime.init.7 in goroutine 1
    runtime/proc.go:336 +0x1a

goroutine 3 gp=0xc000003340 m=nil [GC sweep wait]:
runtime.gopark(0x1?, 0x0?, 0x0?, 0x0?, 0x0?)
    runtime/proc.go:435 +0xce fp=0xc000085780 sp=0xc000085760 pc=0x610fe9187d4e
runtime.goparkunlock(...)
    runtime/proc.go:441
runtime.bgsweep(0xc0000aa000)
    runtime/mgcsweep.go:316 +0xdf fp=0xc0000857c8 sp=0xc000085780 pc=0x610fe913e99f
runtime.gcenable.gowrap1()
    runtime/mgc.go:204 +0x25 fp=0xc0000857e0 sp=0xc0000857c8 pc=0x610fe9132d85
runtime.goexit({})
    runtime/asm_amd64.s:1700 +0x1 fp=0xc0000857e8 sp=0xc0000857e0 pc=0x610fe918f481
created by runtime.gcenable in goroutine 1
    runtime/mgc.go:204 +0x66

goroutine 4 gp=0xc000003500 m=nil [GC scavenge wait]:
runtime.gopark(0x10000?, 0x496245?, 0x0?, 0x0?, 0x0?)
    runtime/proc.go:435 +0xce fp=0xc000085f78 sp=0xc000085f58 pc=0x610fe9187d4e
runtime.goparkunlock(...)
    runtime/proc.go:441
runtime.(*scavengerState).park(0x610fead929a0)
    runtime/mgcscavenge.go:425 +0x49 fp=0xc000085fa8 sp=0xc000085f78 pc=0x610fe913c3e9
runtime.bgscavenge(0xc0000aa000)
    runtime/mgcscavenge.go:658 +0x59 fp=0xc000085fc8 sp=0xc000085fa8 pc=0x610fe913c979
runtime.gcenable.gowrap2()
    runtime/mgc.go:205 +0x25 fp=0xc000085fe0 sp=0xc000085fc8 pc=0x610fe9132d25
runtime.goexit({})
    runtime/asm_amd64.s:1700 +0x1 fp=0xc000085fe8 sp=0xc000085fe0 pc=0x610fe918f481
created by runtime.gcenable in goroutine 1
    runtime/mgc.go:205 +0xa5

goroutine 5 gp=0xc000003dc0 m=nil [finalizer wait, 5 minutes]:
runtime.gopark(0x1b8?, 0xc000002380?, 0x1?, 0x23?, 0xc000084688?)
    runtime/proc.go:435 +0xce fp=0xc000084630 sp=0xc000084610 pc=0x610fe9187d4e
runtime.runfinq()
    runtime/mfinal.go:196 +0x107 fp=0xc0000847e0 sp=0xc000084630 pc=0x610fe9131d47
runtime.goexit({})
    runtime/asm_amd64.s:1700 +0x1 fp=0xc0000847e8 sp=0xc0000847e0 pc=0x610fe918f481
created by runtime.createfing in goroutine 1
    runtime/mfinal.go:166 +0x3d

goroutine 6 gp=0xc0001dc8c0 m=nil [chan receive]:
runtime.gopark(0xc000235540?, 0xc0018f08b8?, 0x60?, 0x67?, 0x610fe926c9c8?)
    runtime/proc.go:435 +0xce fp=0xc000086718 sp=0xc0000866f8 pc=0x610fe9187d4e
runtime.chanrecv(0xc0000b8310, 0x0, 0x1)
    runtime/chan.go:664 +0x445 fp=0xc000086790 sp=0xc000086718 pc=0x610fe9123725
runtime.chanrecv1(0x0?, 0x0?)
    runtime/chan.go:506 +0x12 fp=0xc0000867b8 sp=0xc000086790 pc=0x610fe91232b2
runtime.unique_runtime_registerUniqueMapCleanup.func2(...)
    runtime/mgc.go:1796
runtime.unique_runtime_registerUniqueMapCleanup.gowrap1()
    runtime/mgc.go:1799 +0x2f fp=0xc0000867e0 sp=0xc0000867b8 pc=0x610fe9135f2f
runtime.goexit({})
    runtime/asm_amd64.s:1700 +0x1 fp=0xc0000867e8 sp=0xc0000867e0 pc=0x610fe918f481
created by unique.runtime_registerUniqueMapCleanup in goroutine 1
    runtime/mgc.go:1794 +0x85

goroutine 7 gp=0xc0001dce00 m=nil [GC worker (idle)]:
runtime.gopark(0x23c1603d3d41?, 0x3?, 0xc3?, 0x71?, 0x0?)
    runtime/proc.go:435 +0xce fp=0xc000086f38 sp=0xc000086f18 pc=0x610fe9187d4e
runtime.gcBgMarkWorker(0xc0000b98f0)
    runtime/mgc.go:1423 +0xe9 fp=0xc000086fc8 sp=0xc000086f38 pc=0x610fe9135249
runtime.gcBgMarkStartWorkers.gowrap1()
    runtime/mgc.go:1339 +0x25 fp=0xc000086fe0 sp=0xc000086fc8 pc=0x610fe9135125
runtime.goexit({})
    runtime/asm_amd64.s:1700 +0x1 fp=0xc000086fe8 sp=0xc000086fe0 pc=0x610fe918f481
created by runtime.gcBgMarkStartWorkers in goroutine 1
    runtime/mgc.go:1339 +0x105

goroutine 8 gp=0xc0001dcfc0 m=nil [GC worker (idle)]:
runtime.gopark(0x610feae411e0?, 0x1?, 0x95?, 0x3a?, 0x0?)
    runtime/proc.go:435 +0xce fp=0xc000087738 sp=0xc000087718 pc=0x610fe9187d4e
runtime.gcBgMarkWorker(0xc0000b98f0)
    runtime/mgc.go:1423 +0xe9 fp=0xc0000877c8 sp=0xc000087738 pc=0x610fe9135249
runtime.gcBgMarkStartWorkers.gowrap1()
    runtime/mgc.go:1339 +0x25 fp=0xc0000877e0 sp=0xc0000877c8 pc=0x610fe9135125
runtime.goexit({})
    runtime/asm_amd64.s:1700 +0x1 fp=0xc0000877e8 sp=0xc0000877e0 pc=0x610fe918f481
created by runtime.gcBgMarkStartWorkers in goroutine 1
    runtime/mgc.go:1339 +0x105

goroutine 9 gp=0xc0001dd180 m=nil [GC worker (idle)]:
runtime.gopark(0x610feae411e0?, 0x1?, 0xab?, 0x3?, 0x0?)
    runtime/proc.go:435 +0xce fp=0xc000087f38 sp=0xc000087f18 pc=0x610fe9187d4e
runtime.gcBgMarkWorker(0xc0000b98f0)
    runtime/mgc.go:1423 +0xe9 fp=0xc000087fc8 sp=0xc000087f38 pc=0x610fe9135249
runtime.gcBgMarkStartWorkers.gowrap1()
    runtime/mgc.go:1339 +0x25 fp=0xc000087fe0 sp=0xc000087fc8 pc=0x610fe9135125
runtime.goexit({})
    runtime/asm_amd64.s:1700 +0x1 fp=0xc000087fe8 sp=0xc000087fe0 pc=0x610fe918f481
created by runtime.gcBgMarkStartWorkers in goroutine 1
    runtime/mgc.go:1339 +0x105

goroutine 10 gp=0xc0001dd340 m=nil [GC worker (idle)]:
runtime.gopark(0x23bef7e001da?, 0x3?, 0x95?, 0x7?, 0x0?)
    runtime/proc.go:435 +0xce fp=0xc000080738 sp=0xc000080718 pc=0x610fe9187d4e
runtime.gcBgMarkWorker(0xc0000b98f0)
    runtime/mgc.go:1423 +0xe9 fp=0xc0000807c8 sp=0xc000080738 pc=0x610fe9135249
runtime.gcBgMarkStartWorkers.gowrap1()
    runtime/mgc.go:1339 +0x25 fp=0xc0000807e0 sp=0xc0000807c8 pc=0x610fe9135125
runtime.goexit({})
    runtime/asm_amd64.s:1700 +0x1 fp=0xc0000807e8 sp=0xc0000807e0 pc=0x610fe918f481
created by runtime.gcBgMarkStartWorkers in goroutine 1
    runtime/mgc.go:1339 +0x105

goroutine 11 gp=0xc0001dd500 m=nil [GC worker (idle)]:
runtime.gopark(0x23c1603ccc08?, 0x3?, 0xa5?, 0xc7?, 0x0?)
    runtime/proc.go:435 +0xce fp=0xc000080f38 sp=0xc000080f18 pc=0x610fe9187d4e
runtime.gcBgMarkWorker(0xc0000b98f0)
    runtime/mgc.go:1423 +0xe9 fp=0xc000080fc8 sp=0xc000080f38 pc=0x610fe9135249
runtime.gcBgMarkStartWorkers.gowrap1()
    runtime/mgc.go:1339 +0x25 fp=0xc000080fe0 sp=0xc000080fc8 pc=0x610fe9135125
runtime.goexit({})
    runtime/asm_amd64.s:1700 +0x1 fp=0xc000080fe8 sp=0xc000080fe0 pc=0x610fe918f481
created by runtime.gcBgMarkStartWorkers in goroutine 1
    runtime/mgc.go:1339 +0x105

goroutine 12 gp=0xc0001dd6c0 m=nil [GC worker (idle)]:
runtime.gopark(0x23c15fb13c79?, 0x1?, 0x11?, 0x7e?, 0x0?)
    runtime/proc.go:435 +0xce fp=0xc000081738 sp=0xc000081718 pc=0x610fe9187d4e
runtime.gcBgMarkWorker(0xc0000b98f0)
    runtime/mgc.go:1423 +0xe9 fp=0xc0000817c8 sp=0xc000081738 pc=0x610fe9135249
runtime.gcBgMarkStartWorkers.gowrap1()
    runtime/mgc.go:1339 +0x25 fp=0xc0000817e0 sp=0xc0000817c8 pc=0x610fe9135125
runtime.goexit({})
    runtime/asm_amd64.s:1700 +0x1 fp=0xc0000817e8 sp=0xc0000817e0 pc=0x610fe918f481
created by runtime.gcBgMarkStartWorkers in goroutine 1
    runtime/mgc.go:1339 +0x105

goroutine 13 gp=0xc0001dd880 m=nil [GC worker (idle)]:
runtime.gopark(0x23c1603d1eae?, 0x3?, 0x8e?, 0xb2?, 0x0?)
    runtime/proc.go:435 +0xce fp=0xc000081f38 sp=0xc000081f18 pc=0x610fe9187d4e
runtime.gcBgMarkWorker(0xc0000b98f0)
    runtime/mgc.go:1423 +0xe9 fp=0xc000081fc8 sp=0xc000081f38 pc=0x610fe9135249
runtime.gcBgMarkStartWorkers.gowrap1()
    runtime/mgc.go:1339 +0x25 fp=0xc000081fe0 sp=0xc000081fc8 pc=0x610fe9135125
runtime.goexit({})
    runtime/asm_amd64.s:1700 +0x1 fp=0xc000081fe8 sp=0xc000081fe0 pc=0x610fe918f481
created by runtime.gcBgMarkStartWorkers in goroutine 1
    runtime/mgc.go:1339 +0x105

goroutine 14 gp=0xc0001dda40 m=nil [GC worker (idle)]:
runtime.gopark(0x610feae411e0?, 0x1?, 0x9d?, 0xd3?, 0x0?)
    runtime/proc.go:435 +0xce fp=0xc000082738 sp=0xc000082718 pc=0x610fe9187d4e
runtime.gcBgMarkWorker(0xc0000b98f0)
    runtime/mgc.go:1423 +0xe9 fp=0xc0000827c8 sp=0xc000082738 pc=0x610fe9135249
runtime.gcBgMarkStartWorkers.gowrap1()
    runtime/mgc.go:1339 +0x25 fp=0xc0000827e0 sp=0xc0000827c8 pc=0x610fe9135125
runtime.goexit({})
    runtime/asm_amd64.s:1700 +0x1 fp=0xc0000827e8 sp=0xc0000827e0 pc=0x610fe918f481
created by runtime.gcBgMarkStartWorkers in goroutine 1
    runtime/mgc.go:1339 +0x105

goroutine 15 gp=0xc0001ddc00 m=nil [GC worker (idle)]:
runtime.gopark(0x610feae411e0?, 0x1?, 0x88?, 0xc1?, 0x0?)
    runtime/proc.go:435 +0xce fp=0xc000082f38 sp=0xc000082f18 pc=0x610fe9187d4e
runtime.gcBgMarkWorker(0xc0000b98f0)
    runtime/mgc.go:1423 +0xe9 fp=0xc000082fc8 sp=0xc000082f38 pc=0x610fe9135249
runtime.gcBgMarkStartWorkers.gowrap1()
    runtime/mgc.go:1339 +0x25 fp=0xc000082fe0 sp=0xc000082fc8 pc=0x610fe9135125
runtime.goexit({})
    runtime/asm_amd64.s:1700 +0x1 fp=0xc000082fe8 sp=0xc000082fe0 pc=0x610fe918f481
created by runtime.gcBgMarkStartWorkers in goroutine 1
    runtime/mgc.go:1339 +0x105

goroutine 16 gp=0xc0001dddc0 m=nil [GC worker (idle)]:
runtime.gopark(0x23c1603f6bf7?, 0x1?, 0xfa?, 0xa7?, 0x0?)
    runtime/proc.go:435 +0xce fp=0xc000083738 sp=0xc000083718 pc=0x610fe9187d4e
runtime.gcBgMarkWorker(0xc0000b98f0)
    runtime/mgc.go:1423 +0xe9 fp=0xc0000837c8 sp=0xc000083738 pc=0x610fe9135249
runtime.gcBgMarkStartWorkers.gowrap1()
    runtime/mgc.go:1339 +0x25 fp=0xc0000837e0 sp=0xc0000837c8 pc=0x610fe9135125
runtime.goexit({})
    runtime/asm_amd64.s:1700 +0x1 fp=0xc0000837e8 sp=0xc0000837e0 pc=0x610fe918f481
created by runtime.gcBgMarkStartWorkers in goroutine 1
    runtime/mgc.go:1339 +0x105

goroutine 18 gp=0xc000524000 m=nil [GC worker (idle)]:
runtime.gopark(0x23c15fb13c5e?, 0x3?, 0x66?, 0xb1?, 0x0?)
    runtime/proc.go:435 +0xce fp=0xc000083f38 sp=0xc000083f18 pc=0x610fe9187d4e
runtime.gcBgMarkWorker(0xc0000b98f0)
    runtime/mgc.go:1423 +0xe9 fp=0xc000083fc8 sp=0xc000083f38 pc=0x610fe9135249
runtime.gcBgMarkStartWorkers.gowrap1()
    runtime/mgc.go:1339 +0x25 fp=0xc000083fe0 sp=0xc000083fc8 pc=0x610fe9135125
runtime.goexit({})
    runtime/asm_amd64.s:1700 +0x1 fp=0xc000083fe8 sp=0xc000083fe0 pc=0x610fe918f481
created by runtime.gcBgMarkStartWorkers in goroutine 1
    runtime/mgc.go:1339 +0x105

goroutine 19 gp=0xc0005241c0 m=nil [GC worker (idle)]:
runtime.gopark(0x23c15fb136df?, 0x3?, 0xbf?, 0xf9?, 0x0?)
    runtime/proc.go:435 +0xce fp=0xc00052a738 sp=0xc00052a718 pc=0x610fe9187d4e
runtime.gcBgMarkWorker(0xc0000b98f0)
    runtime/mgc.go:1423 +0xe9 fp=0xc00052a7c8 sp=0xc00052a738 pc=0x610fe9135249
runtime.gcBgMarkStartWorkers.gowrap1()
    runtime/mgc.go:1339 +0x25 fp=0xc00052a7e0 sp=0xc00052a7c8 pc=0x610fe9135125
runtime.goexit({})
    runtime/asm_amd64.s:1700 +0x1 fp=0xc00052a7e8 sp=0xc00052a7e0 pc=0x610fe918f481
created by runtime.gcBgMarkStartWorkers in goroutine 1
    runtime/mgc.go:1339 +0x105

goroutine 34 gp=0xc000306000 m=nil [GC worker (idle)]:
runtime.gopark(0x23c1603c83f3?, 0x3?, 0xbc?, 0xf1?, 0x0?)
    runtime/proc.go:435 +0xce fp=0xc000526738 sp=0xc000526718 pc=0x610fe9187d4e
runtime.gcBgMarkWorker(0xc0000b98f0)
    runtime/mgc.go:1423 +0xe9 fp=0xc0005267c8 sp=0xc000526738 pc=0x610fe9135249
runtime.gcBgMarkStartWorkers.gowrap1()
    runtime/mgc.go:1339 +0x25 fp=0xc0005267e0 sp=0xc0005267c8 pc=0x610fe9135125
runtime.goexit({})
    runtime/asm_amd64.s:1700 +0x1 fp=0xc0005267e8 sp=0xc0005267e0 pc=0x610fe918f481
created by runtime.gcBgMarkStartWorkers in goroutine 1
    runtime/mgc.go:1339 +0x105

goroutine 50 gp=0xc000102380 m=nil [GC worker (idle)]:
runtime.gopark(0x610feae411e0?, 0x1?, 0x9a?, 0xa1?, 0x0?)
    runtime/proc.go:435 +0xce fp=0xc00011a738 sp=0xc00011a718 pc=0x610fe9187d4e
runtime.gcBgMarkWorker(0xc0000b98f0)
    runtime/mgc.go:1423 +0xe9 fp=0xc00011a7c8 sp=0xc00011a738 pc=0x610fe9135249
runtime.gcBgMarkStartWorkers.gowrap1()
    runtime/mgc.go:1339 +0x25 fp=0xc00011a7e0 sp=0xc00011a7c8 pc=0x610fe9135125
runtime.goexit({})
    runtime/asm_amd64.s:1700 +0x1 fp=0xc00011a7e8 sp=0xc00011a7e0 pc=0x610fe918f481
created by runtime.gcBgMarkStartWorkers in goroutine 1
    runtime/mgc.go:1339 +0x105

goroutine 51 gp=0xc000102540 m=nil [GC worker (idle)]:
runtime.gopark(0x23c15fb1177d?, 0x1?, 0x0?, 0x1c?, 0x0?)
    runtime/proc.go:435 +0xce fp=0xc00011af38 sp=0xc00011af18 pc=0x610fe9187d4e
runtime.gcBgMarkWorker(0xc0000b98f0)
    runtime/mgc.go:1423 +0xe9 fp=0xc00011afc8 sp=0xc00011af38 pc=0x610fe9135249
runtime.gcBgMarkStartWorkers.gowrap1()
    runtime/mgc.go:1339 +0x25 fp=0xc00011afe0 sp=0xc00011afc8 pc=0x610fe9135125
runtime.goexit({})
    runtime/asm_amd64.s:1700 +0x1 fp=0xc00011afe8 sp=0xc00011afe0 pc=0x610fe918f481
created by runtime.gcBgMarkStartWorkers in goroutine 1
    runtime/mgc.go:1339 +0x105

goroutine 20 gp=0xc000524380 m=nil [GC worker (idle)]:
runtime.gopark(0x23c1607d4ede?, 0x3?, 0x5f?, 0x98?, 0x0?)
    runtime/proc.go:435 +0xce fp=0xc00052af38 sp=0xc00052af18 pc=0x610fe9187d4e
runtime.gcBgMarkWorker(0xc0000b98f0)
    runtime/mgc.go:1423 +0xe9 fp=0xc00052afc8 sp=0xc00052af38 pc=0x610fe9135249
runtime.gcBgMarkStartWorkers.gowrap1()
    runtime/mgc.go:1339 +0x25 fp=0xc00052afe0 sp=0xc00052afc8 pc=0x610fe9135125
runtime.goexit({})
    runtime/asm_amd64.s:1700 +0x1 fp=0xc00052afe8 sp=0xc00052afe0 pc=0x610fe918f481
created by runtime.gcBgMarkStartWorkers in goroutine 1
    runtime/mgc.go:1339 +0x105

goroutine 72 gp=0xc000583180 m=nil [select, 2 minutes]:
runtime.gopark(0xc000315a10?, 0x2?, 0x0?, 0x0?, 0xc000315874?)
    runtime/proc.go:435 +0xce fp=0xc0003156a0 sp=0xc000315680 pc=0x610fe9187d4e
runtime.selectgo(0xc000315a10, 0xc000315870, 0x7b52?, 0x0, 0x4?, 0x1)
    runtime/select.go:351 +0x837 fp=0xc0003157d8 sp=0xc0003156a0 pc=0x610fe91663b7
github.com/ollama/ollama/runner/ollamarunner.(*Server).completion(0xc0007285a0, {0x610fea4fb4c8, 0xc000000000}, 0xc001b00140)
    github.com/ollama/ollama/runner/ollamarunner/runner.go:680 +0xb65 fp=0xc000315ac0 sp=0xc0003157d8 pc=0x610fe967b3c5
github.com/ollama/ollama/runner/ollamarunner.(*Server).completion-fm({0x610fea4fb4c8?, 0xc000000000?}, 0xc000315b40?)
    <autogenerated>:1 +0x36 fp=0xc000315af0 sp=0xc000315ac0 pc=0x610fe967e796
net/http.HandlerFunc.ServeHTTP(0xc0005b52c0?, {0x610fea4fb4c8?, 0xc000000000?}, 0xc000315b60?)
    net/http/server.go:2294 +0x29 fp=0xc000315b18 sp=0xc000315af0 pc=0x610fe9485c49
net/http.(*ServeMux).ServeHTTP(0x610fe912c265?, {0x610fea4fb4c8, 0xc000000000}, 0xc001b00140)
    net/http/server.go:2822 +0x1c4 fp=0xc000315b68 sp=0xc000315b18 pc=0x610fe9487b44
net/http.serverHandler.ServeHTTP({0x610fea4f7b10?}, {0x610fea4fb4c8?, 0xc000000000?}, 0x1?)
    net/http/server.go:3301 +0x8e fp=0xc000315b98 sp=0xc000315b68 pc=0x610fe94a55ce
net/http.(*conn).serve(0xc0005443f0, {0x610fea4fd758, 0xc0000ffaa0})
    net/http/server.go:2102 +0x625 fp=0xc000315fb8 sp=0xc000315b98 pc=0x610fe9484145
net/http.(*Server).Serve.gowrap3()
    net/http/server.go:3454 +0x28 fp=0xc000315fe0 sp=0xc000315fb8 pc=0x610fe9489a08
runtime.goexit({})
    runtime/asm_amd64.s:1700 +0x1 fp=0xc000315fe8 sp=0xc000315fe0 pc=0x610fe918f481
created by net/http.(*Server).Serve in goroutine 1
    net/http/server.go:3454 +0x485

goroutine 73 gp=0xc000307a40 m=nil [IO wait, 2 minutes]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0xb?)
    runtime/proc.go:435 +0xce fp=0xc0014b55d8 sp=0xc0014b55b8 pc=0x610fe9187d4e
runtime.netpollblock(0x610fe91ab0b8?, 0xe9120b46?, 0xf?)
    runtime/netpoll.go:575 +0xf7 fp=0xc0014b5610 sp=0xc0014b55d8 pc=0x610fe914c837
internal/poll.runtime_pollWait(0x7b5e085c6d98, 0x72)
    runtime/netpoll.go:351 +0x85 fp=0xc0014b5630 sp=0xc0014b5610 pc=0x610fe9186f65
internal/poll.(*pollDesc).wait(0xc00004e080?, 0xc00221c0a1?, 0x0)
    internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc0014b5658 sp=0xc0014b5630 pc=0x610fe920e3a7
internal/poll.(*pollDesc).waitRead(...)
    internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Read(0xc00004e080, {0xc00221c0a1, 0x1, 0x1})
    internal/poll/fd_unix.go:165 +0x27a fp=0xc0014b56f0 sp=0xc0014b5658 pc=0x610fe920f69a
net.(*netFD).Read(0xc00004e080, {0xc00221c0a1?, 0xc0018f2058?, 0xc0014b5770?})
    net/fd_posix.go:55 +0x25 fp=0xc0014b5738 sp=0xc0014b56f0 pc=0x610fe9283de5
net.(*conn).Read(0xc001f10000, {0xc00221c0a1?, 0x0?, 0x0?})
    net/net.go:194 +0x45 fp=0xc0014b5780 sp=0xc0014b5738 pc=0x610fe92921a5
net/http.(*connReader).backgroundRead(0xc00221c090)
    net/http/server.go:690 +0x37 fp=0xc0014b57c8 sp=0xc0014b5780 pc=0x610fe947e017
net/http.(*connReader).startBackgroundRead.gowrap2()
    net/http/server.go:686 +0x25 fp=0xc0014b57e0 sp=0xc0014b57c8 pc=0x610fe947df45
runtime.goexit({})
    runtime/asm_amd64.s:1700 +0x1 fp=0xc0014b57e8 sp=0xc0014b57e0 pc=0x610fe918f481
created by net/http.(*connReader).startBackgroundRead in goroutine 72
    net/http/server.go:686 +0xb6

rax    0x204a03f7c
rbx    0x7b5df87b1d20
rcx    0xfdf
rdx    0x7b5df8649b10
rdi    0x7b5df8649b20
rsi    0x0
rbp    0x7b5d2d7fd500
rsp    0x7b5d2d7fd4e0
r8     0x0
r9     0x8d4a9743
r10    0x0
r11    0x246
r12    0x7b56f000e520
r13    0x7b5df8649b20
r14    0x0
r15    0x7b5df8002d10
rip    0x7b5de1024fb7
rflags 0x10297
cs     0x33
fs     0x0
gs     0x0
SIGABRT: abort
PC=0x7b5e4f96db2c m=14 sigcode=18446744073709551610
signal arrived during cgo execution

goroutine 22 gp=0xc000306e00 m=14 mp=0xc000581008 [syscall]:
runtime.cgocall(0x610fe9e51720, 0xc0042f3a58)
    runtime/cgocall.go:167 +0x4b fp=0xc0042f3a30 sp=0xc0042f39f8 pc=0x610fe91848cb
github.com/ollama/ollama/ml/backend/ggml._Cfunc_ggml_backend_sched_graph_compute_async(0x7b5df90110f0, 0x7b572c006150)
    _cgo_gotypes.go:886 +0x4a fp=0xc0042f3a58 sp=0xc0042f3a30 pc=0x610fe95bec6a
github.com/ollama/ollama/ml/backend/ggml.(*Context).Compute.func1(...)
    github.com/ollama/ollama/ml/backend/ggml/ggml.go:631
github.com/ollama/ollama/ml/backend/ggml.(*Context).Compute(0xc001a68000, {0xc034cbc360, 0x1, 0x0?})
    github.com/ollama/ollama/ml/backend/ggml/ggml.go:631 +0x9d fp=0xc0042f3b00 sp=0xc0042f3a58 pc=0x610fe95ca17d
github.com/ollama/ollama/model.Forward({0x610fea505a90, 0xc001a68000}, {0x610fea4fc2b0, 0xc0000e96b0}, {0xc0013b8800, 0x200, 0x200}, {{0x610fea5106e8, 0xc0018f0018}, {0x0, ...}, ...})
    github.com/ollama/ollama/model/model.go:305 +0x2a7 fp=0xc0042f3be8 sp=0xc0042f3b00 pc=0x610fe95d8147
github.com/ollama/ollama/runner/ollamarunner.(*Server).processBatch(0xc0007285a0)
    github.com/ollama/ollama/runner/ollamarunner/runner.go:480 +0x4c5 fp=0xc0042f3f98 sp=0xc0042f3be8 pc=0x610fe9679085
github.com/ollama/ollama/runner/ollamarunner.(*Server).run(0xc0007285a0, {0x610fea4fd790, 0xc000690690})
    github.com/ollama/ollama/runner/ollamarunner/runner.go:362 +0x4e fp=0xc0042f3fb8 sp=0xc0042f3f98 pc=0x610fe9678b6e
github.com/ollama/ollama/runner/ollamarunner.Execute.gowrap2()
    github.com/ollama/ollama/runner/ollamarunner/runner.go:960 +0x28 fp=0xc0042f3fe0 sp=0xc0042f3fb8 pc=0x610fe967e2c8
runtime.goexit({})
    runtime/asm_amd64.s:1700 +0x1 fp=0xc0042f3fe8 sp=0xc0042f3fe0 pc=0x610fe918f481
created by github.com/ollama/ollama/runner/ollamarunner.Execute in goroutine 1
    github.com/ollama/ollama/runner/ollamarunner/runner.go:960 +0xa74

goroutine 1 gp=0xc000002380 m=nil [IO wait, 2 minutes]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
    runtime/proc.go:435 +0xce fp=0xc000313650 sp=0xc000313630 pc=0x610fe9187d4e
runtime.netpollblock(0xc0003136a0?, 0xe9120b46?, 0xf?)
    runtime/netpoll.go:575 +0xf7 fp=0xc000313688 sp=0xc000313650 pc=0x610fe914c837
internal/poll.runtime_pollWait(0x7b5e085c6eb0, 0x72)
    runtime/netpoll.go:351 +0x85 fp=0xc0003136a8 sp=0xc000313688 pc=0x610fe9186f65
internal/poll.(*pollDesc).wait(0xc000716280?, 0x90012ae3e?, 0x0)
    internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc0003136d0 sp=0xc0003136a8 pc=0x610fe920e3a7
internal/poll.(*pollDesc).waitRead(...)
    internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Accept(0xc000716280)
    internal/poll/fd_unix.go:620 +0x295 fp=0xc000313778 sp=0xc0003136d0 pc=0x610fe9213775
net.(*netFD).accept(0xc000716280)
    net/fd_unix.go:172 +0x29 fp=0xc000313830 sp=0xc000313778 pc=0x610fe9285d89
net.(*TCPListener).accept(0xc0001406c0)
    net/tcpsock_posix.go:159 +0x1b fp=0xc000313880 sp=0xc000313830 pc=0x610fe929b73b
net.(*TCPListener).Accept(0xc0001406c0)
    net/tcpsock.go:380 +0x30 fp=0xc0003138b0 sp=0xc000313880 pc=0x610fe929a5f0
net/http.(*onceCloseListener).Accept(0xc0005443f0?)
    <autogenerated>:1 +0x24 fp=0xc0003138c8 sp=0xc0003138b0 pc=0x610fe94b1d44
net/http.(*Server).Serve(0xc0001ff400, {0x610fea4fb2e8, 0xc0001406c0})
    net/http/server.go:3424 +0x30c fp=0xc0003139f8 sp=0xc0003138c8 pc=0x610fe948960c
github.com/ollama/ollama/runner/ollamarunner.Execute({0xc000034150, 0xe, 0xf})
    github.com/ollama/ollama/runner/ollamarunner/runner.go:984 +0xe09 fp=0xc000313d08 sp=0xc0003139f8 pc=0x610fe967e029
github.com/ollama/ollama/runner.Execute({0xc000034130?, 0x0?, 0x0?})
    github.com/ollama/ollama/runner/runner.go:20 +0xc9 fp=0xc000313d30 sp=0xc000313d08 pc=0x610fe967e929
github.com/ollama/ollama/cmd.NewCLI.func2(0xc0001ff200?, {0x610fea03e07e?, 0x4?, 0x610fea03e082?})
    github.com/ollama/ollama/cmd/cmd.go:1583 +0x45 fp=0xc000313d58 sp=0xc000313d30 pc=0x610fe9de3685
github.com/spf13/cobra.(*Command).execute(0xc000546f08, {0xc0005a8870, 0xf, 0xf})
    github.com/spf13/cobra@v1.7.0/command.go:940 +0x85c fp=0xc000313e78 sp=0xc000313d58 pc=0x610fe92ff3dc
github.com/spf13/cobra.(*Command).ExecuteC(0xc000734908)
    github.com/spf13/cobra@v1.7.0/command.go:1068 +0x3a5 fp=0xc000313f30 sp=0xc000313e78 pc=0x610fe92ffc25
github.com/spf13/cobra.(*Command).Execute(...)
    github.com/spf13/cobra@v1.7.0/command.go:992
github.com/spf13/cobra.(*Command).ExecuteContext(...)
    github.com/spf13/cobra@v1.7.0/command.go:985
main.main()
    github.com/ollama/ollama/main.go:12 +0x4d fp=0xc000313f50 sp=0xc000313f30 pc=0x610fe9de416d
runtime.main()
    runtime/proc.go:283 +0x29d fp=0xc000313fe0 sp=0xc000313f50 pc=0x610fe9153ebd
runtime.goexit({})
    runtime/asm_amd64.s:1700 +0x1 fp=0xc000313fe8 sp=0xc000313fe0 pc=0x610fe918f481

goroutine 2 gp=0xc000002e00 m=nil [force gc (idle), 5 minutes]:
runtime.gopark(0x239b11b73f90?, 0x0?, 0x0?, 0x0?, 0x0?)
    runtime/proc.go:435 +0xce fp=0xc000084fa8 sp=0xc000084f88 pc=0x610fe9187d4e
runtime.goparkunlock(...)
    runtime/proc.go:441
runtime.forcegchelper()
    runtime/proc.go:348 +0xb8 fp=0xc000084fe0 sp=0xc000084fa8 pc=0x610fe91541f8
runtime.goexit({})
    runtime/asm_amd64.s:1700 +0x1 fp=0xc000084fe8 sp=0xc000084fe0 pc=0x610fe918f481
created by runtime.init.7 in goroutine 1
    runtime/proc.go:336 +0x1a

goroutine 3 gp=0xc000003340 m=nil [GC sweep wait]:
runtime.gopark(0x1?, 0x0?, 0x0?, 0x0?, 0x0?)
    runtime/proc.go:435 +0xce fp=0xc000085780 sp=0xc000085760 pc=0x610fe9187d4e
runtime.goparkunlock(...)
    runtime/proc.go:441
runtime.bgsweep(0xc0000aa000)
    runtime/mgcsweep.go:316 +0xdf fp=0xc0000857c8 sp=0xc000085780 pc=0x610fe913e99f
runtime.gcenable.gowrap1()
    runtime/mgc.go:204 +0x25 fp=0xc0000857e0 sp=0xc0000857c8 pc=0x610fe9132d85
runtime.goexit({})
    runtime/asm_amd64.s:1700 +0x1 fp=0xc0000857e8 sp=0xc0000857e0 pc=0x610fe918f481
created by runtime.gcenable in goroutine 1
    runtime/mgc.go:204 +0x66

goroutine 4 gp=0xc000003500 m=nil [GC scavenge wait]:
runtime.gopark(0x10000?, 0x496245?, 0x0?, 0x0?, 0x0?)
    runtime/proc.go:435 +0xce fp=0xc000085f78 sp=0xc000085f58 pc=0x610fe9187d4e
runtime.goparkunlock(...)
    runtime/proc.go:441
runtime.(*scavengerState).park(0x610fead929a0)
    runtime/mgcscavenge.go:425 +0x49 fp=0xc000085fa8 sp=0xc000085f78 pc=0x610fe913c3e9
runtime.bgscavenge(0xc0000aa000)
    runtime/mgcscavenge.go:658 +0x59 fp=0xc000085fc8 sp=0xc000085fa8 pc=0x610fe913c979
runtime.gcenable.gowrap2()
    runtime/mgc.go:205 +0x25 fp=0xc000085fe0 sp=0xc000085fc8 pc=0x610fe9132d25
runtime.goexit({})
    runtime/asm_amd64.s:1700 +0x1 fp=0xc000085fe8 sp=0xc000085fe0 pc=0x610fe918f481
created by runtime.gcenable in goroutine 1
    runtime/mgc.go:205 +0xa5

goroutine 5 gp=0xc000003dc0 m=nil [finalizer wait, 5 minutes]:
runtime.gopark(0x1b8?, 0xc000002380?, 0x1?, 0x23?, 0xc000084688?)
    runtime/proc.go:435 +0xce fp=0xc000084630 sp=0xc000084610 pc=0x610fe9187d4e
runtime.runfinq()
    runtime/mfinal.go:196 +0x107 fp=0xc0000847e0 sp=0xc000084630 pc=0x610fe9131d47
runtime.goexit({})
    runtime/asm_amd64.s:1700 +0x1 fp=0xc0000847e8 sp=0xc0000847e0 pc=0x610fe918f481
created by runtime.createfing in goroutine 1
    runtime/mfinal.go:166 +0x3d

goroutine 6 gp=0xc0001dc8c0 m=nil [chan receive]:
runtime.gopark(0xc000235540?, 0xc0018f08b8?, 0x60?, 0x67?, 0x610fe926c9c8?)
    runtime/proc.go:435 +0xce fp=0xc000086718 sp=0xc0000866f8 pc=0x610fe9187d4e
runtime.chanrecv(0xc0000b8310, 0x0, 0x1)
    runtime/chan.go:664 +0x445 fp=0xc000086790 sp=0xc000086718 pc=0x610fe9123725
runtime.chanrecv1(0x0?, 0x0?)
    runtime/chan.go:506 +0x12 fp=0xc0000867b8 sp=0xc000086790 pc=0x610fe91232b2
runtime.unique_runtime_registerUniqueMapCleanup.func2(...)
    runtime/mgc.go:1796
runtime.unique_runtime_registerUniqueMapCleanup.gowrap1()
    runtime/mgc.go:1799 +0x2f fp=0xc0000867e0 sp=0xc0000867b8 pc=0x610fe9135f2f
runtime.goexit({})
    runtime/asm_amd64.s:1700 +0x1 fp=0xc0000867e8 sp=0xc0000867e0 pc=0x610fe918f481
created by unique.runtime_registerUniqueMapCleanup in goroutine 1
    runtime/mgc.go:1794 +0x85

goroutine 7 gp=0xc0001dce00 m=nil [GC worker (idle)]:
runtime.gopark(0x23c1603d3d41?, 0x3?, 0xc3?, 0x71?, 0x0?)
    runtime/proc.go:435 +0xce fp=0xc000086f38 sp=0xc000086f18 pc=0x610fe9187d4e
runtime.gcBgMarkWorker(0xc0000b98f0)
    runtime/mgc.go:1423 +0xe9 fp=0xc000086fc8 sp=0xc000086f38 pc=0x610fe9135249
runtime.gcBgMarkStartWorkers.gowrap1()
    runtime/mgc.go:1339 +0x25 fp=0xc000086fe0 sp=0xc000086fc8 pc=0x610fe9135125
runtime.goexit({})
    runtime/asm_amd64.s:1700 +0x1 fp=0xc000086fe8 sp=0xc000086fe0 pc=0x610fe918f481
created by runtime.gcBgMarkStartWorkers in goroutine 1
    runtime/mgc.go:1339 +0x105

goroutine 8 gp=0xc0001dcfc0 m=nil [GC worker (idle)]:
runtime.gopark(0x610feae411e0?, 0x1?, 0x95?, 0x3a?, 0x0?)
    runtime/proc.go:435 +0xce fp=0xc000087738 sp=0xc000087718 pc=0x610fe9187d4e
runtime.gcBgMarkWorker(0xc0000b98f0)
    runtime/mgc.go:1423 +0xe9 fp=0xc0000877c8 sp=0xc000087738 pc=0x610fe9135249
runtime.gcBgMarkStartWorkers.gowrap1()
    runtime/mgc.go:1339 +0x25 fp=0xc0000877e0 sp=0xc0000877c8 pc=0x610fe9135125
runtime.goexit({})
    runtime/asm_amd64.s:1700 +0x1 fp=0xc0000877e8 sp=0xc0000877e0 pc=0x610fe918f481
created by runtime.gcBgMarkStartWorkers in goroutine 1
    runtime/mgc.go:1339 +0x105

goroutine 9 gp=0xc0001dd180 m=nil [GC worker (idle)]:
runtime.gopark(0x610feae411e0?, 0x1?, 0xab?, 0x3?, 0x0?)
    runtime/proc.go:435 +0xce fp=0xc000087f38 sp=0xc000087f18 pc=0x610fe9187d4e
runtime.gcBgMarkWorker(0xc0000b98f0)
    runtime/mgc.go:1423 +0xe9 fp=0xc000087fc8 sp=0xc000087f38 pc=0x610fe9135249
runtime.gcBgMarkStartWorkers.gowrap1()
    runtime/mgc.go:1339 +0x25 fp=0xc000087fe0 sp=0xc000087fc8 pc=0x610fe9135125
runtime.goexit({})
    runtime/asm_amd64.s:1700 +0x1 fp=0xc000087fe8 sp=0xc000087fe0 pc=0x610fe918f481
created by runtime.gcBgMarkStartWorkers in goroutine 1
    runtime/mgc.go:1339 +0x105

goroutine 10 gp=0xc0001dd340 m=nil [GC worker (idle)]:
runtime.gopark(0x23bef7e001da?, 0x3?, 0x95?, 0x7?, 0x0?)
    runtime/proc.go:435 +0xce fp=0xc000080738 sp=0xc000080718 pc=0x610fe9187d4e
runtime.gcBgMarkWorker(0xc0000b98f0)
    runtime/mgc.go:1423 +0xe9 fp=0xc0000807c8 sp=0xc000080738 pc=0x610fe9135249
runtime.gcBgMarkStartWorkers.gowrap1()
    runtime/mgc.go:1339 +0x25 fp=0xc0000807e0 sp=0xc0000807c8 pc=0x610fe9135125
runtime.goexit({})
    runtime/asm_amd64.s:1700 +0x1 fp=0xc0000807e8 sp=0xc0000807e0 pc=0x610fe918f481
created by runtime.gcBgMarkStartWorkers in goroutine 1
    runtime/mgc.go:1339 +0x105

goroutine 11 gp=0xc0001dd500 m=nil [GC worker (idle)]:
runtime.gopark(0x23c1603ccc08?, 0x3?, 0xa5?, 0xc7?, 0x0?)
    runtime/proc.go:435 +0xce fp=0xc000080f38 sp=0xc000080f18 pc=0x610fe9187d4e
runtime.gcBgMarkWorker(0xc0000b98f0)
    runtime/mgc.go:1423 +0xe9 fp=0xc000080fc8 sp=0xc000080f38 pc=0x610fe9135249
runtime.gcBgMarkStartWorkers.gowrap1()
    runtime/mgc.go:1339 +0x25 fp=0xc000080fe0 sp=0xc000080fc8 pc=0x610fe9135125
runtime.goexit({})
    runtime/asm_amd64.s:1700 +0x1 fp=0xc000080fe8 sp=0xc000080fe0 pc=0x610fe918f481
created by runtime.gcBgMarkStartWorkers in goroutine 1
    runtime/mgc.go:1339 +0x105

goroutine 12 gp=0xc0001dd6c0 m=nil [GC worker (idle)]:
runtime.gopark(0x23c15fb13c79?, 0x1?, 0x11?, 0x7e?, 0x0?)
    runtime/proc.go:435 +0xce fp=0xc000081738 sp=0xc000081718 pc=0x610fe9187d4e
runtime.gcBgMarkWorker(0xc0000b98f0)
    runtime/mgc.go:1423 +0xe9 fp=0xc0000817c8 sp=0xc000081738 pc=0x610fe9135249
runtime.gcBgMarkStartWorkers.gowrap1()
    runtime/mgc.go:1339 +0x25 fp=0xc0000817e0 sp=0xc0000817c8 pc=0x610fe9135125
runtime.goexit({})
    runtime/asm_amd64.s:1700 +0x1 fp=0xc0000817e8 sp=0xc0000817e0 pc=0x610fe918f481
created by runtime.gcBgMarkStartWorkers in goroutine 1
    runtime/mgc.go:1339 +0x105

goroutine 13 gp=0xc0001dd880 m=nil [GC worker (idle)]:
runtime.gopark(0x23c1603d1eae?, 0x3?, 0x8e?, 0xb2?, 0x0?)
    runtime/proc.go:435 +0xce fp=0xc000081f38 sp=0xc000081f18 pc=0x610fe9187d4e
runtime.gcBgMarkWorker(0xc0000b98f0)
    runtime/mgc.go:1423 +0xe9 fp=0xc000081fc8 sp=0xc000081f38 pc=0x610fe9135249
runtime.gcBgMarkStartWorkers.gowrap1()
    runtime/mgc.go:1339 +0x25 fp=0xc000081fe0 sp=0xc000081fc8 pc=0x610fe9135125
runtime.goexit({})
    runtime/asm_amd64.s:1700 +0x1 fp=0xc000081fe8 sp=0xc000081fe0 pc=0x610fe918f481
created by runtime.gcBgMarkStartWorkers in goroutine 1
    runtime/mgc.go:1339 +0x105

goroutine 14 gp=0xc0001dda40 m=nil [GC worker (idle)]:
runtime.gopark(0x610feae411e0?, 0x1?, 0x9d?, 0xd3?, 0x0?)
    runtime/proc.go:435 +0xce fp=0xc000082738 sp=0xc000082718 pc=0x610fe9187d4e
runtime.gcBgMarkWorker(0xc0000b98f0)
    runtime/mgc.go:1423 +0xe9 fp=0xc0000827c8 sp=0xc000082738 pc=0x610fe9135249
runtime.gcBgMarkStartWorkers.gowrap1()
    runtime/mgc.go:1339 +0x25 fp=0xc0000827e0 sp=0xc0000827c8 pc=0x610fe9135125
runtime.goexit({})
    runtime/asm_amd64.s:1700 +0x1 fp=0xc0000827e8 sp=0xc0000827e0 pc=0x610fe918f481
created by runtime.gcBgMarkStartWorkers in goroutine 1
    runtime/mgc.go:1339 +0x105

goroutine 15 gp=0xc0001ddc00 m=nil [GC worker (idle)]:
runtime.gopark(0x610feae411e0?, 0x1?, 0x88?, 0xc1?, 0x0?)
    runtime/proc.go:435 +0xce fp=0xc000082f38 sp=0xc000082f18 pc=0x610fe9187d4e
runtime.gcBgMarkWorker(0xc0000b98f0)
    runtime/mgc.go:1423 +0xe9 fp=0xc000082fc8 sp=0xc000082f38 pc=0x610fe9135249
runtime.gcBgMarkStartWorkers.gowrap1()
    runtime/mgc.go:1339 +0x25 fp=0xc000082fe0 sp=0xc000082fc8 pc=0x610fe9135125
runtime.goexit({})
    runtime/asm_amd64.s:1700 +0x1 fp=0xc000082fe8 sp=0xc000082fe0 pc=0x610fe918f481
created by runtime.gcBgMarkStartWorkers in goroutine 1
    runtime/mgc.go:1339 +0x105

goroutine 16 gp=0xc0001dddc0 m=nil [GC worker (idle)]:
runtime.gopark(0x23c1603f6bf7?, 0x1?, 0xfa?, 0xa7?, 0x0?)
    runtime/proc.go:435 +0xce fp=0xc000083738 sp=0xc000083718 pc=0x610fe9187d4e
runtime.gcBgMarkWorker(0xc0000b98f0)
    runtime/mgc.go:1423 +0xe9 fp=0xc0000837c8 sp=0xc000083738 pc=0x610fe9135249
runtime.gcBgMarkStartWorkers.gowrap1()
    runtime/mgc.go:1339 +0x25 fp=0xc0000837e0 sp=0xc0000837c8 pc=0x610fe9135125
runtime.goexit({})
    runtime/asm_amd64.s:1700 +0x1 fp=0xc0000837e8 sp=0xc0000837e0 pc=0x610fe918f481
created by runtime.gcBgMarkStartWorkers in goroutine 1
    runtime/mgc.go:1339 +0x105

goroutine 18 gp=0xc000524000 m=nil [GC worker (idle)]:
runtime.gopark(0x23c15fb13c5e?, 0x3?, 0x66?, 0xb1?, 0x0?)
    runtime/proc.go:435 +0xce fp=0xc000083f38 sp=0xc000083f18 pc=0x610fe9187d4e
runtime.gcBgMarkWorker(0xc0000b98f0)
    runtime/mgc.go:1423 +0xe9 fp=0xc000083fc8 sp=0xc000083f38 pc=0x610fe9135249
runtime.gcBgMarkStartWorkers.gowrap1()
    runtime/mgc.go:1339 +0x25 fp=0xc000083fe0 sp=0xc000083fc8 pc=0x610fe9135125
runtime.goexit({})
    runtime/asm_amd64.s:1700 +0x1 fp=0xc000083fe8 sp=0xc000083fe0 pc=0x610fe918f481
created by runtime.gcBgMarkStartWorkers in goroutine 1
    runtime/mgc.go:1339 +0x105

goroutine 19 gp=0xc0005241c0 m=nil [GC worker (idle)]:
runtime.gopark(0x23c15fb136df?, 0x3?, 0xbf?, 0xf9?, 0x0?)
    runtime/proc.go:435 +0xce fp=0xc00052a738 sp=0xc00052a718 pc=0x610fe9187d4e
runtime.gcBgMarkWorker(0xc0000b98f0)
    runtime/mgc.go:1423 +0xe9 fp=0xc00052a7c8 sp=0xc00052a738 pc=0x610fe9135249
runtime.gcBgMarkStartWorkers.gowrap1()
    runtime/mgc.go:1339 +0x25 fp=0xc00052a7e0 sp=0xc00052a7c8 pc=0x610fe9135125
runtime.goexit({})
    runtime/asm_amd64.s:1700 +0x1 fp=0xc00052a7e8 sp=0xc00052a7e0 pc=0x610fe918f481
created by runtime.gcBgMarkStartWorkers in goroutine 1
    runtime/mgc.go:1339 +0x105

goroutine 34 gp=0xc000306000 m=nil [GC worker (idle)]:
runtime.gopark(0x23c1603c83f3?, 0x3?, 0xbc?, 0xf1?, 0x0?)
    runtime/proc.go:435 +0xce fp=0xc000526738 sp=0xc000526718 pc=0x610fe9187d4e
runtime.gcBgMarkWorker(0xc0000b98f0)
    runtime/mgc.go:1423 +0xe9 fp=0xc0005267c8 sp=0xc000526738 pc=0x610fe9135249
runtime.gcBgMarkStartWorkers.gowrap1()
    runtime/mgc.go:1339 +0x25 fp=0xc0005267e0 sp=0xc0005267c8 pc=0x610fe9135125
runtime.goexit({})
    runtime/asm_amd64.s:1700 +0x1 fp=0xc0005267e8 sp=0xc0005267e0 pc=0x610fe918f481
created by runtime.gcBgMarkStartWorkers in goroutine 1
    runtime/mgc.go:1339 +0x105

goroutine 50 gp=0xc000102380 m=nil [GC worker (idle)]:
runtime.gopark(0x610feae411e0?, 0x1?, 0x9a?, 0xa1?, 0x0?)
    runtime/proc.go:435 +0xce fp=0xc00011a738 sp=0xc00011a718 pc=0x610fe9187d4e
runtime.gcBgMarkWorker(0xc0000b98f0)
    runtime/mgc.go:1423 +0xe9 fp=0xc00011a7c8 sp=0xc00011a738 pc=0x610fe9135249
runtime.gcBgMarkStartWorkers.gowrap1()
    runtime/mgc.go:1339 +0x25 fp=0xc00011a7e0 sp=0xc00011a7c8 pc=0x610fe9135125
runtime.goexit({})
    runtime/asm_amd64.s:1700 +0x1 fp=0xc00011a7e8 sp=0xc00011a7e0 pc=0x610fe918f481
created by runtime.gcBgMarkStartWorkers in goroutine 1
    runtime/mgc.go:1339 +0x105

goroutine 51 gp=0xc000102540 m=nil [GC worker (idle)]:
runtime.gopark(0x23c15fb1177d?, 0x1?, 0x0?, 0x1c?, 0x0?)
    runtime/proc.go:435 +0xce fp=0xc00011af38 sp=0xc00011af18 pc=0x610fe9187d4e
runtime.gcBgMarkWorker(0xc0000b98f0)
    runtime/mgc.go:1423 +0xe9 fp=0xc00011afc8 sp=0xc00011af38 pc=0x610fe9135249
runtime.gcBgMarkStartWorkers.gowrap1()
    runtime/mgc.go:1339 +0x25 fp=0xc00011afe0 sp=0xc00011afc8 pc=0x610fe9135125
runtime.goexit({})
    runtime/asm_amd64.s:1700 +0x1 fp=0xc00011afe8 sp=0xc00011afe0 pc=0x610fe918f481
created by runtime.gcBgMarkStartWorkers in goroutine 1
    runtime/mgc.go:1339 +0x105

goroutine 20 gp=0xc000524380 m=nil [GC worker (idle)]:
runtime.gopark(0x23c1607d4ede?, 0x3?, 0x5f?, 0x98?, 0x0?)
    runtime/proc.go:435 +0xce fp=0xc00052af38 sp=0xc00052af18 pc=0x610fe9187d4e
runtime.gcBgMarkWorker(0xc0000b98f0)
    runtime/mgc.go:1423 +0xe9 fp=0xc00052afc8 sp=0xc00052af38 pc=0x610fe9135249
runtime.gcBgMarkStartWorkers.gowrap1()
    runtime/mgc.go:1339 +0x25 fp=0xc00052afe0 sp=0xc00052afc8 pc=0x610fe9135125
runtime.goexit({})
    runtime/asm_amd64.s:1700 +0x1 fp=0xc00052afe8 sp=0xc00052afe0 pc=0x610fe918f481
created by runtime.gcBgMarkStartWorkers in goroutine 1
    runtime/mgc.go:1339 +0x105

goroutine 72 gp=0xc000583180 m=nil [select, 2 minutes]:
runtime.gopark(0xc000315a10?, 0x2?, 0x0?, 0x0?, 0xc000315874?)
    runtime/proc.go:435 +0xce fp=0xc0003156a0 sp=0xc000315680 pc=0x610fe9187d4e
runtime.selectgo(0xc000315a10, 0xc000315870, 0x7b52?, 0x0, 0x4?, 0x1)
    runtime/select.go:351 +0x837 fp=0xc0003157d8 sp=0xc0003156a0 pc=0x610fe91663b7
github.com/ollama/ollama/runner/ollamarunner.(*Server).completion(0xc0007285a0, {0x610fea4fb4c8, 0xc000000000}, 0xc001b00140)
    github.com/ollama/ollama/runner/ollamarunner/runner.go:680 +0xb65 fp=0xc000315ac0 sp=0xc0003157d8 pc=0x610fe967b3c5
github.com/ollama/ollama/runner/ollamarunner.(*Server).completion-fm({0x610fea4fb4c8?, 0xc000000000?}, 0xc000315b40?)
    <autogenerated>:1 +0x36 fp=0xc000315af0 sp=0xc000315ac0 pc=0x610fe967e796
net/http.HandlerFunc.ServeHTTP(0xc0005b52c0?, {0x610fea4fb4c8?, 0xc000000000?}, 0xc000315b60?)
    net/http/server.go:2294 +0x29 fp=0xc000315b18 sp=0xc000315af0 pc=0x610fe9485c49
net/http.(*ServeMux).ServeHTTP(0x610fe912c265?, {0x610fea4fb4c8, 0xc000000000}, 0xc001b00140)
    net/http/server.go:2822 +0x1c4 fp=0xc000315b68 sp=0xc000315b18 pc=0x610fe9487b44
net/http.serverHandler.ServeHTTP({0x610fea4f7b10?}, {0x610fea4fb4c8?, 0xc000000000?}, 0x1?)
    net/http/server.go:3301 +0x8e fp=0xc000315b98 sp=0xc000315b68 pc=0x610fe94a55ce
net/http.(*conn).serve(0xc0005443f0, {0x610fea4fd758, 0xc0000ffaa0})
    net/http/server.go:2102 +0x625 fp=0xc000315fb8 sp=0xc000315b98 pc=0x610fe9484145
net/http.(*Server).Serve.gowrap3()
    net/http/server.go:3454 +0x28 fp=0xc000315fe0 sp=0xc000315fb8 pc=0x610fe9489a08
runtime.goexit({})
    runtime/asm_amd64.s:1700 +0x1 fp=0xc000315fe8 sp=0xc000315fe0 pc=0x610fe918f481
created by net/http.(*Server).Serve in goroutine 1
    net/http/server.go:3454 +0x485

goroutine 73 gp=0xc000307a40 m=nil [IO wait, 2 minutes]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0xb?)
    runtime/proc.go:435 +0xce fp=0xc0014b55d8 sp=0xc0014b55b8 pc=0x610fe9187d4e
runtime.netpollblock(0x610fe91ab0b8?, 0xe9120b46?, 0xf?)
    runtime/netpoll.go:575 +0xf7 fp=0xc0014b5610 sp=0xc0014b55d8 pc=0x610fe914c837
internal/poll.runtime_pollWait(0x7b5e085c6d98, 0x72)
    runtime/netpoll.go:351 +0x85 fp=0xc0014b5630 sp=0xc0014b5610 pc=0x610fe9186f65
internal/poll.(*pollDesc).wait(0xc00004e080?, 0xc00221c0a1?, 0x0)
    internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc0014b5658 sp=0xc0014b5630 pc=0x610fe920e3a7
internal/poll.(*pollDesc).waitRead(...)
    internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Read(0xc00004e080, {0xc00221c0a1, 0x1, 0x1})
    internal/poll/fd_unix.go:165 +0x27a fp=0xc0014b56f0 sp=0xc0014b5658 pc=0x610fe920f69a
net.(*netFD).Read(0xc00004e080, {0xc00221c0a1?, 0xc0018f2058?, 0xc0014b5770?})
    net/fd_posix.go:55 +0x25 fp=0xc0014b5738 sp=0xc0014b56f0 pc=0x610fe9283de5
net.(*conn).Read(0xc001f10000, {0xc00221c0a1?, 0x0?, 0x0?})
    net/net.go:194 +0x45 fp=0xc0014b5780 sp=0xc0014b5738 pc=0x610fe92921a5
net/http.(*connReader).backgroundRead(0xc00221c090)
    net/http/server.go:690 +0x37 fp=0xc0014b57c8 sp=0xc0014b5780 pc=0x610fe947e017
net/http.(*connReader).startBackgroundRead.gowrap2()
    net/http/server.go:686 +0x25 fp=0xc0014b57e0 sp=0xc0014b57c8 pc=0x610fe947df45
runtime.goexit({})
    runtime/asm_amd64.s:1700 +0x1 fp=0xc0014b57e8 sp=0xc0014b57e0 pc=0x610fe918f481
created by net/http.(*connReader).startBackgroundRead in goroutine 72
    net/http/server.go:686 +0xb6

rax    0x0
rbx    0x298
rcx    0x7b5e4f96db2c
rdx    0x6
rdi    0x288
rsi    0x298
rbp    0x7b5d2d7fd670
rsp    0x7b5d2d7fd630
r8     0x0
r9     0x0
r10    0x8
r11    0x246
r12    0x6
r13    0x4d
r14    0x16
r15    0x7b572c04f1c0
rip    0x7b5e4f96db2c
rflags 0x246
cs     0x33
fs     0x0
gs     0x0
time=2025-08-06T19:48:17.187Z level=ERROR source=server.go:807 msg="post predict" error="Post \"http://127.0.0.1:44317/completion\": EOF"
[GIN] 2025/08/06 - 19:48:17 | 200 | 53.432952415s |      172.18.0.1 | POST     "/api/chat"
time=2025-08-06T19:48:17.187Z level=DEBUG source=sched.go:432 msg="context for request finished" runner.name=registry.ollama.ai/library/gpt-oss:20b runner.inference=cuda runner.devices=1 runner.size="22.6 GiB" runner.vram="22.6 GiB" runner.parallel=1 runner.pid=648 runner.model=/root/.ollama/models/blobs/sha256-b112e727c6f18875636c56a779790a590d705aec9e1c0eb5a97d51fc2a778583 runner.num_ctx=36864
time=2025-08-06T19:48:17.187Z level=DEBUG source=sched.go:341 msg="runner with non-zero duration has gone idle, adding timer" runner.name=registry.ollama.ai/library/gpt-oss:20b runner.inference=cuda runner.devices=1 runner.size="22.6 GiB" runner.vram="22.6 GiB" runner.parallel=1 runner.pid=648 runner.model=/root/.ollama/models/blobs/sha256-b112e727c6f18875636c56a779790a590d705aec9e1c0eb5a97d51fc2a778583 runner.num_ctx=36864 duration=2562047h47m16.854775807s
time=2025-08-06T19:48:17.187Z level=DEBUG source=sched.go:359 msg="after processing request finished event" runner.name=registry.ollama.ai/library/gpt-oss:20b runner.inference=cuda runner.devices=1 runner.size="22.6 GiB" runner.vram="22.6 GiB" runner.parallel=1 runner.pid=648 runner.model=/root/.ollama/models/blobs/sha256-b112e727c6f18875636c56a779790a590d705aec9e1c0eb5a97d51fc2a778583 runner.num_ctx=36864 refCount=0
time=2025-08-06T19:48:17.240Z level=ERROR source=server.go:464 msg="llama runner terminated" error="exit status 2"

OS

Linux

GPU

Nvidia

CPU

Intel

Ollama version

0.11.3

Originally created by @mikhail-shevtsov-wiregate on GitHub (Aug 6, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/11753 ### What is the issue? I've noticed memory leak when using large context with gpt-oss:20b on Nvidia RTX 3090 In my example I had `num_ctx: 36864` and actual context `~34k` After 30 seconds it crashes with OOM. I don't see such problem with qwen3 models. Memory consumption is static. https://github.com/user-attachments/assets/164417bd-7e40-46a9-abc0-15f18272e5af ### Relevant log output <details> <summary>Full log</summary> ```shell releasing cuda driver library time=2025-08-06T19:42:45.813Z level=INFO source=server.go:175 msg=offload library=cuda layers.requested=-1 layers.model=25 layers.offload=25 layers.split="" memory.available="[23.4 GiB]" memory.gpu_overhead="0 B" memory.required.full="22.6 GiB" memory.required.partial="22.6 GiB" memory.required.kv="972.0 MiB" memory.required.allocations="[22.6 GiB]" memory.weights.total="11.7 GiB" memory.weights.repeating="10.7 GiB" memory.weights.nonrepeating="1.1 GiB" memory.graph.full="9.0 GiB" memory.graph.partial="9.0 GiB" time=2025-08-06T19:42:45.813Z level=WARN source=server.go:211 msg="flash attention enabled but not supported by model" time=2025-08-06T19:42:45.813Z level=WARN source=server.go:229 msg="quantized kv cache requested but flash attention disabled" type=q8_0 time=2025-08-06T19:42:45.813Z level=DEBUG source=server.go:291 msg="compatible gpu libraries" compatible=[] time=2025-08-06T19:42:45.882Z level=DEBUG source=ggml.go:208 msg="key with type not found" key=general.alignment default=32 time=2025-08-06T19:42:45.883Z level=DEBUG source=ggml.go:208 msg="key with type not found" key=tokenizer.ggml.pretokenizer default="[^\\r\\n\\p{L}\\p{N}]?[\\p{Lu}\\p{Lt}\\p{Lm}\\p{Lo}\\p{M}]*[\\p{Ll}\\p{Lm}\\p{Lo}\\p{M}]+(?i:'s|'t|'re|'ve|'m|'ll|'d)?|[^\\r\\n\\p{L}\\p{N}]?[\\p{Lu}\\p{Lt}\\p{Lm}\\p{Lo}\\p{M}]+[\\p{Ll}\\p{Lm}\\p{Lo}\\p{M}]*(?i:'s|'t|'re|'ve|'m|'ll|'d)?|\\p{N}{1,3}| ?[^\\s\\p{L}\\p{N}]+[\\r\\n/]*|\\s*[\\r\\n]+|\\s+(?!\\S)|\\s+" time=2025-08-06T19:42:45.883Z level=INFO source=server.go:438 msg="starting llama server" cmd="/usr/bin/ollama runner --ollama-engine --model /root/.ollama/models/blobs/sha256-b112e727c6f18875636c56a779790a590d705aec9e1c0eb5a97d51fc2a778583 --ctx-size 36864 --batch-size 512 --n-gpu-layers 25 --threads 8 --parallel 1 --port 44317" time=2025-08-06T19:42:45.883Z level=DEBUG source=server.go:439 msg=subprocess PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin OLLAMA_KV_CACHE_TYPE=q8_0 OLLAMA_DEBUG=1 OLLAMA_FLASH_ATTENTION=1 LD_LIBRARY_PATH=/usr/lib/ollama:/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/usr/lib/ollama OLLAMA_HOST=0.0.0.0:11434 OLLAMA_MAX_LOADED_MODELS=3 OLLAMA_LIBRARY_PATH=/usr/lib/ollama CUDA_VISIBLE_DEVICES=GPU-4a702f36-6e27-c7db-15af-79cfe15ee9df time=2025-08-06T19:42:45.884Z level=INFO source=sched.go:481 msg="loaded runners" count=1 time=2025-08-06T19:42:45.884Z level=INFO source=server.go:598 msg="waiting for llama runner to start responding" time=2025-08-06T19:42:45.884Z level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server not responding" time=2025-08-06T19:42:45.900Z level=INFO source=runner.go:925 msg="starting ollama engine" time=2025-08-06T19:42:45.901Z level=INFO source=runner.go:983 msg="Server listening on 127.0.0.1:44317" time=2025-08-06T19:42:45.983Z level=DEBUG source=ggml.go:208 msg="key with type not found" key=general.alignment default=32 time=2025-08-06T19:42:45.983Z level=DEBUG source=ggml.go:208 msg="key with type not found" key=general.name default="" time=2025-08-06T19:42:45.984Z level=DEBUG source=ggml.go:208 msg="key with type not found" key=general.description default="" time=2025-08-06T19:42:45.984Z level=INFO source=ggml.go:92 msg="" architecture=gptoss file_type=MXFP4 name="" description="" num_tensors=315 num_key_values=30 time=2025-08-06T19:42:45.984Z level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/lib/ollama ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 CUDA devices: Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes load_backend: loaded CUDA backend from /usr/lib/ollama/libggml-cuda.so load_backend: loaded CPU backend from /usr/lib/ollama/libggml-cpu-haswell.so time=2025-08-06T19:42:46.050Z level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=500,600,610,700,750,800,860,870,890,900,1200 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc) time=2025-08-06T19:42:46.135Z level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server loading model" time=2025-08-06T19:42:46.141Z level=INFO source=ggml.go:367 msg="offloading 24 repeating layers to GPU" time=2025-08-06T19:42:46.141Z level=INFO source=ggml.go:373 msg="offloading output layer to GPU" time=2025-08-06T19:42:46.141Z level=INFO source=ggml.go:378 msg="offloaded 25/25 layers to GPU" time=2025-08-06T19:42:46.141Z level=INFO source=ggml.go:381 msg="model weights" buffer=CUDA0 size="11.7 GiB" time=2025-08-06T19:42:46.141Z level=INFO source=ggml.go:381 msg="model weights" buffer=CPU size="1.1 GiB" time=2025-08-06T19:42:46.142Z level=DEBUG source=ggml.go:208 msg="key with type not found" key=tokenizer.ggml.pretokenizer default="[^\\r\\n\\p{L}\\p{N}]?[\\p{Lu}\\p{Lt}\\p{Lm}\\p{Lo}\\p{M}]*[\\p{Ll}\\p{Lm}\\p{Lo}\\p{M}]+(?i:'s|'t|'re|'ve|'m|'ll|'d)?|[^\\r\\n\\p{L}\\p{N}]?[\\p{Lu}\\p{Lt}\\p{Lm}\\p{Lo}\\p{M}]+[\\p{Ll}\\p{Lm}\\p{Lo}\\p{M}]*(?i:'s|'t|'re|'ve|'m|'ll|'d)?|\\p{N}{1,3}| ?[^\\s\\p{L}\\p{N}]+[\\r\\n/]*|\\s*[\\r\\n]+|\\s+(?!\\S)|\\s+" time=2025-08-06T19:42:46.158Z level=DEBUG source=ggml.go:654 msg="compute graph" nodes=1847 splits=2 time=2025-08-06T19:42:46.158Z level=INFO source=ggml.go:672 msg="compute graph" backend=CUDA0 buffer_type=CUDA0 size="9.1 GiB" time=2025-08-06T19:42:46.158Z level=INFO source=ggml.go:672 msg="compute graph" backend=CPU buffer_type=CPU size="5.6 MiB" time=2025-08-06T19:42:46.158Z level=DEBUG source=runner.go:883 msg=memory allocated.InputWeights=1158266880A allocated.CPU.Graph=5898240A allocated.CUDA0.ID=GPU-4a702f36-6e27-c7db-15af-79cfe15ee9df allocated.CUDA0.Weights="[477075840A 477075840A 477075840A 477075840A 477075840A 477075840A 477075840A 477075840A 477075840A 477075840A 477075840A 477075840A 477075840A 477075840A 477075840A 477075840A 477075840A 477075840A 477075840A 477075840A 477075840A 477075840A 477075840A 477075840A 1158278400A]" allocated.CUDA0.Cache="[9437184A 75497472A 9437184A 75497472A 9437184A 75497472A 9437184A 75497472A 9437184A 75497472A 9437184A 75497472A 9437184A 75497472A 9437184A 75497472A 9437184A 75497472A 9437184A 75497472A 9437184A 75497472A 9437184A 75497472A 0U]" allocated.CUDA0.Graph=9779415296A time=2025-08-06T19:42:46.386Z level=DEBUG source=server.go:643 msg="model load progress 0.03" time=2025-08-06T19:42:46.637Z level=DEBUG source=server.go:643 msg="model load progress 0.07" time=2025-08-06T19:42:46.888Z level=DEBUG source=server.go:643 msg="model load progress 0.10" time=2025-08-06T19:42:47.139Z level=DEBUG source=server.go:643 msg="model load progress 0.13" time=2025-08-06T19:42:47.390Z level=DEBUG source=server.go:643 msg="model load progress 0.16" time=2025-08-06T19:42:47.641Z level=DEBUG source=server.go:643 msg="model load progress 0.20" time=2025-08-06T19:42:47.892Z level=DEBUG source=server.go:643 msg="model load progress 0.23" time=2025-08-06T19:42:48.143Z level=DEBUG source=server.go:643 msg="model load progress 0.27" time=2025-08-06T19:42:48.393Z level=DEBUG source=server.go:643 msg="model load progress 0.30" time=2025-08-06T19:42:48.644Z level=DEBUG source=server.go:643 msg="model load progress 0.34" time=2025-08-06T19:42:48.895Z level=DEBUG source=server.go:643 msg="model load progress 0.37" time=2025-08-06T19:42:49.146Z level=DEBUG source=server.go:643 msg="model load progress 0.40" time=2025-08-06T19:42:49.397Z level=DEBUG source=server.go:643 msg="model load progress 0.43" time=2025-08-06T19:42:49.648Z level=DEBUG source=server.go:643 msg="model load progress 0.47" time=2025-08-06T19:42:49.899Z level=DEBUG source=server.go:643 msg="model load progress 0.50" time=2025-08-06T19:42:50.149Z level=DEBUG source=server.go:643 msg="model load progress 0.53" time=2025-08-06T19:42:50.400Z level=DEBUG source=server.go:643 msg="model load progress 0.56" time=2025-08-06T19:42:50.651Z level=DEBUG source=server.go:643 msg="model load progress 0.60" time=2025-08-06T19:42:50.902Z level=DEBUG source=server.go:643 msg="model load progress 0.63" time=2025-08-06T19:42:51.153Z level=DEBUG source=server.go:643 msg="model load progress 0.66" time=2025-08-06T19:42:51.404Z level=DEBUG source=server.go:643 msg="model load progress 0.69" time=2025-08-06T19:42:51.655Z level=DEBUG source=server.go:643 msg="model load progress 0.73" time=2025-08-06T19:42:51.906Z level=DEBUG source=server.go:643 msg="model load progress 0.79" time=2025-08-06T19:42:52.156Z level=DEBUG source=server.go:643 msg="model load progress 0.85" time=2025-08-06T19:42:52.407Z level=DEBUG source=server.go:643 msg="model load progress 0.92" time=2025-08-06T19:42:52.658Z level=DEBUG source=server.go:643 msg="model load progress 0.95" time=2025-08-06T19:42:52.909Z level=DEBUG source=server.go:643 msg="model load progress 0.97" time=2025-08-06T19:42:53.160Z level=DEBUG source=server.go:643 msg="model load progress 1.00" time=2025-08-06T19:42:53.411Z level=INFO source=server.go:637 msg="llama runner started in 7.53 seconds" time=2025-08-06T19:42:53.411Z level=DEBUG source=sched.go:493 msg="finished setting up" runner.name=registry.ollama.ai/library/gpt-oss:20b runner.inference=cuda runner.devices=1 runner.size="22.6 GiB" runner.vram="22.6 GiB" runner.parallel=1 runner.pid=648 runner.model=/root/.ollama/models/blobs/sha256-b112e727c6f18875636c56a779790a590d705aec9e1c0eb5a97d51fc2a778583 runner.num_ctx=36864 time=2025-08-06T19:42:53.411Z level=DEBUG source=server.go:736 msg="completion request" images=0 prompt=2876 format="" time=2025-08-06T19:42:53.491Z level=DEBUG source=cache.go:136 msg="loading cache slot" id=0 cache=0 prompt=768 used=0 remaining=768 [GIN] 2025/08/06 - 19:43:01 | 200 | 18.015478674s | 172.18.0.1 | POST "/api/chat" time=2025-08-06T19:43:01.888Z level=DEBUG source=sched.go:501 msg="context for request finished" time=2025-08-06T19:43:01.889Z level=DEBUG source=sched.go:341 msg="runner with non-zero duration has gone idle, adding timer" runner.name=registry.ollama.ai/library/gpt-oss:20b runner.inference=cuda runner.devices=1 runner.size="22.6 GiB" runner.vram="22.6 GiB" runner.parallel=1 runner.pid=648 runner.model=/root/.ollama/models/blobs/sha256-b112e727c6f18875636c56a779790a590d705aec9e1c0eb5a97d51fc2a778583 runner.num_ctx=36864 duration=2562047h47m16.854775807s time=2025-08-06T19:43:01.889Z level=DEBUG source=sched.go:359 msg="after processing request finished event" runner.name=registry.ollama.ai/library/gpt-oss:20b runner.inference=cuda runner.devices=1 runner.size="22.6 GiB" runner.vram="22.6 GiB" runner.parallel=1 runner.pid=648 runner.model=/root/.ollama/models/blobs/sha256-b112e727c6f18875636c56a779790a590d705aec9e1c0eb5a97d51fc2a778583 runner.num_ctx=36864 refCount=0 [GIN] 2025/08/06 - 19:45:45 | 200 | 25.857µs | 127.0.0.1 | HEAD "/" [GIN] 2025/08/06 - 19:45:45 | 200 | 42.668µs | 127.0.0.1 | GET "/api/ps" time=2025-08-06T19:47:23.851Z level=DEBUG source=sched.go:613 msg="evaluating already loaded" model=/root/.ollama/models/blobs/sha256-b112e727c6f18875636c56a779790a590d705aec9e1c0eb5a97d51fc2a778583 time=2025-08-06T19:47:23.852Z level=DEBUG source=server.go:736 msg="completion request" images=0 prompt=137980 format="" time=2025-08-06T19:47:23.957Z level=DEBUG source=cache.go:136 msg="loading cache slot" id=0 cache=1242 prompt=31570 used=64 remaining=31506 time=2025-08-06T19:47:29.468Z level=DEBUG source=causal.go:426 msg="defragmenting kv cache" time=2025-08-06T19:47:35.977Z level=DEBUG source=causal.go:426 msg="defragmenting kv cache" time=2025-08-06T19:47:36.908Z level=DEBUG source=causal.go:426 msg="defragmenting kv cache" time=2025-08-06T19:47:44.951Z level=DEBUG source=causal.go:426 msg="defragmenting kv cache" time=2025-08-06T19:47:46.040Z level=DEBUG source=causal.go:426 msg="defragmenting kv cache" time=2025-08-06T19:47:55.345Z level=DEBUG source=causal.go:426 msg="defragmenting kv cache" time=2025-08-06T19:47:56.641Z level=DEBUG source=causal.go:426 msg="defragmenting kv cache" time=2025-08-06T19:48:07.482Z level=DEBUG source=causal.go:426 msg="defragmenting kv cache" time=2025-08-06T19:48:08.981Z level=DEBUG source=causal.go:426 msg="defragmenting kv cache" CUDA error: out of memory current device: 0, in function alloc at //ml/backend/ggml/ggml/src/ggml-cuda/ggml-cuda.cu:452 cuMemCreate(&handle, reserve_size, &prop, 0) //ml/backend/ggml/ggml/src/ggml-cuda/ggml-cuda.cu:77: CUDA error SIGSEGV: segmentation violation PC=0x7b5de1024fb7 m=14 sigcode=1 addr=0x204a03f7c signal arrived during cgo execution goroutine 22 gp=0xc000306e00 m=14 mp=0xc000581008 [syscall]: runtime.cgocall(0x610fe9e51720, 0xc0042f3a58) runtime/cgocall.go:167 +0x4b fp=0xc0042f3a30 sp=0xc0042f39f8 pc=0x610fe91848cb github.com/ollama/ollama/ml/backend/ggml._Cfunc_ggml_backend_sched_graph_compute_async(0x7b5df90110f0, 0x7b572c006150) _cgo_gotypes.go:886 +0x4a fp=0xc0042f3a58 sp=0xc0042f3a30 pc=0x610fe95bec6a github.com/ollama/ollama/ml/backend/ggml.(*Context).Compute.func1(...) github.com/ollama/ollama/ml/backend/ggml/ggml.go:631 github.com/ollama/ollama/ml/backend/ggml.(*Context).Compute(0xc001a68000, {0xc034cbc360, 0x1, 0x0?}) github.com/ollama/ollama/ml/backend/ggml/ggml.go:631 +0x9d fp=0xc0042f3b00 sp=0xc0042f3a58 pc=0x610fe95ca17d github.com/ollama/ollama/model.Forward({0x610fea505a90, 0xc001a68000}, {0x610fea4fc2b0, 0xc0000e96b0}, {0xc0013b8800, 0x200, 0x200}, {{0x610fea5106e8, 0xc0018f0018}, {0x0, ...}, ...}) github.com/ollama/ollama/model/model.go:305 +0x2a7 fp=0xc0042f3be8 sp=0xc0042f3b00 pc=0x610fe95d8147 github.com/ollama/ollama/runner/ollamarunner.(*Server).processBatch(0xc0007285a0) github.com/ollama/ollama/runner/ollamarunner/runner.go:480 +0x4c5 fp=0xc0042f3f98 sp=0xc0042f3be8 pc=0x610fe9679085 github.com/ollama/ollama/runner/ollamarunner.(*Server).run(0xc0007285a0, {0x610fea4fd790, 0xc000690690}) github.com/ollama/ollama/runner/ollamarunner/runner.go:362 +0x4e fp=0xc0042f3fb8 sp=0xc0042f3f98 pc=0x610fe9678b6e github.com/ollama/ollama/runner/ollamarunner.Execute.gowrap2() github.com/ollama/ollama/runner/ollamarunner/runner.go:960 +0x28 fp=0xc0042f3fe0 sp=0xc0042f3fb8 pc=0x610fe967e2c8 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0042f3fe8 sp=0xc0042f3fe0 pc=0x610fe918f481 created by github.com/ollama/ollama/runner/ollamarunner.Execute in goroutine 1 github.com/ollama/ollama/runner/ollamarunner/runner.go:960 +0xa74 goroutine 1 gp=0xc000002380 m=nil [IO wait, 2 minutes]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000313650 sp=0xc000313630 pc=0x610fe9187d4e runtime.netpollblock(0xc0003136a0?, 0xe9120b46?, 0xf?) runtime/netpoll.go:575 +0xf7 fp=0xc000313688 sp=0xc000313650 pc=0x610fe914c837 internal/poll.runtime_pollWait(0x7b5e085c6eb0, 0x72) runtime/netpoll.go:351 +0x85 fp=0xc0003136a8 sp=0xc000313688 pc=0x610fe9186f65 internal/poll.(*pollDesc).wait(0xc000716280?, 0x90012ae3e?, 0x0) internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc0003136d0 sp=0xc0003136a8 pc=0x610fe920e3a7 internal/poll.(*pollDesc).waitRead(...) internal/poll/fd_poll_runtime.go:89 internal/poll.(*FD).Accept(0xc000716280) internal/poll/fd_unix.go:620 +0x295 fp=0xc000313778 sp=0xc0003136d0 pc=0x610fe9213775 net.(*netFD).accept(0xc000716280) net/fd_unix.go:172 +0x29 fp=0xc000313830 sp=0xc000313778 pc=0x610fe9285d89 net.(*TCPListener).accept(0xc0001406c0) net/tcpsock_posix.go:159 +0x1b fp=0xc000313880 sp=0xc000313830 pc=0x610fe929b73b net.(*TCPListener).Accept(0xc0001406c0) net/tcpsock.go:380 +0x30 fp=0xc0003138b0 sp=0xc000313880 pc=0x610fe929a5f0 net/http.(*onceCloseListener).Accept(0xc0005443f0?) <autogenerated>:1 +0x24 fp=0xc0003138c8 sp=0xc0003138b0 pc=0x610fe94b1d44 net/http.(*Server).Serve(0xc0001ff400, {0x610fea4fb2e8, 0xc0001406c0}) net/http/server.go:3424 +0x30c fp=0xc0003139f8 sp=0xc0003138c8 pc=0x610fe948960c github.com/ollama/ollama/runner/ollamarunner.Execute({0xc000034150, 0xe, 0xf}) github.com/ollama/ollama/runner/ollamarunner/runner.go:984 +0xe09 fp=0xc000313d08 sp=0xc0003139f8 pc=0x610fe967e029 github.com/ollama/ollama/runner.Execute({0xc000034130?, 0x0?, 0x0?}) github.com/ollama/ollama/runner/runner.go:20 +0xc9 fp=0xc000313d30 sp=0xc000313d08 pc=0x610fe967e929 github.com/ollama/ollama/cmd.NewCLI.func2(0xc0001ff200?, {0x610fea03e07e?, 0x4?, 0x610fea03e082?}) github.com/ollama/ollama/cmd/cmd.go:1583 +0x45 fp=0xc000313d58 sp=0xc000313d30 pc=0x610fe9de3685 github.com/spf13/cobra.(*Command).execute(0xc000546f08, {0xc0005a8870, 0xf, 0xf}) github.com/spf13/cobra@v1.7.0/command.go:940 +0x85c fp=0xc000313e78 sp=0xc000313d58 pc=0x610fe92ff3dc github.com/spf13/cobra.(*Command).ExecuteC(0xc000734908) github.com/spf13/cobra@v1.7.0/command.go:1068 +0x3a5 fp=0xc000313f30 sp=0xc000313e78 pc=0x610fe92ffc25 github.com/spf13/cobra.(*Command).Execute(...) github.com/spf13/cobra@v1.7.0/command.go:992 github.com/spf13/cobra.(*Command).ExecuteContext(...) github.com/spf13/cobra@v1.7.0/command.go:985 main.main() github.com/ollama/ollama/main.go:12 +0x4d fp=0xc000313f50 sp=0xc000313f30 pc=0x610fe9de416d runtime.main() runtime/proc.go:283 +0x29d fp=0xc000313fe0 sp=0xc000313f50 pc=0x610fe9153ebd runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000313fe8 sp=0xc000313fe0 pc=0x610fe918f481 goroutine 2 gp=0xc000002e00 m=nil [force gc (idle), 5 minutes]: runtime.gopark(0x239b11b73f90?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000084fa8 sp=0xc000084f88 pc=0x610fe9187d4e runtime.goparkunlock(...) runtime/proc.go:441 runtime.forcegchelper() runtime/proc.go:348 +0xb8 fp=0xc000084fe0 sp=0xc000084fa8 pc=0x610fe91541f8 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000084fe8 sp=0xc000084fe0 pc=0x610fe918f481 created by runtime.init.7 in goroutine 1 runtime/proc.go:336 +0x1a goroutine 3 gp=0xc000003340 m=nil [GC sweep wait]: runtime.gopark(0x1?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000085780 sp=0xc000085760 pc=0x610fe9187d4e runtime.goparkunlock(...) runtime/proc.go:441 runtime.bgsweep(0xc0000aa000) runtime/mgcsweep.go:316 +0xdf fp=0xc0000857c8 sp=0xc000085780 pc=0x610fe913e99f runtime.gcenable.gowrap1() runtime/mgc.go:204 +0x25 fp=0xc0000857e0 sp=0xc0000857c8 pc=0x610fe9132d85 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0000857e8 sp=0xc0000857e0 pc=0x610fe918f481 created by runtime.gcenable in goroutine 1 runtime/mgc.go:204 +0x66 goroutine 4 gp=0xc000003500 m=nil [GC scavenge wait]: runtime.gopark(0x10000?, 0x496245?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000085f78 sp=0xc000085f58 pc=0x610fe9187d4e runtime.goparkunlock(...) runtime/proc.go:441 runtime.(*scavengerState).park(0x610fead929a0) runtime/mgcscavenge.go:425 +0x49 fp=0xc000085fa8 sp=0xc000085f78 pc=0x610fe913c3e9 runtime.bgscavenge(0xc0000aa000) runtime/mgcscavenge.go:658 +0x59 fp=0xc000085fc8 sp=0xc000085fa8 pc=0x610fe913c979 runtime.gcenable.gowrap2() runtime/mgc.go:205 +0x25 fp=0xc000085fe0 sp=0xc000085fc8 pc=0x610fe9132d25 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000085fe8 sp=0xc000085fe0 pc=0x610fe918f481 created by runtime.gcenable in goroutine 1 runtime/mgc.go:205 +0xa5 goroutine 5 gp=0xc000003dc0 m=nil [finalizer wait, 5 minutes]: runtime.gopark(0x1b8?, 0xc000002380?, 0x1?, 0x23?, 0xc000084688?) runtime/proc.go:435 +0xce fp=0xc000084630 sp=0xc000084610 pc=0x610fe9187d4e runtime.runfinq() runtime/mfinal.go:196 +0x107 fp=0xc0000847e0 sp=0xc000084630 pc=0x610fe9131d47 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0000847e8 sp=0xc0000847e0 pc=0x610fe918f481 created by runtime.createfing in goroutine 1 runtime/mfinal.go:166 +0x3d goroutine 6 gp=0xc0001dc8c0 m=nil [chan receive]: runtime.gopark(0xc000235540?, 0xc0018f08b8?, 0x60?, 0x67?, 0x610fe926c9c8?) runtime/proc.go:435 +0xce fp=0xc000086718 sp=0xc0000866f8 pc=0x610fe9187d4e runtime.chanrecv(0xc0000b8310, 0x0, 0x1) runtime/chan.go:664 +0x445 fp=0xc000086790 sp=0xc000086718 pc=0x610fe9123725 runtime.chanrecv1(0x0?, 0x0?) runtime/chan.go:506 +0x12 fp=0xc0000867b8 sp=0xc000086790 pc=0x610fe91232b2 runtime.unique_runtime_registerUniqueMapCleanup.func2(...) runtime/mgc.go:1796 runtime.unique_runtime_registerUniqueMapCleanup.gowrap1() runtime/mgc.go:1799 +0x2f fp=0xc0000867e0 sp=0xc0000867b8 pc=0x610fe9135f2f runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0000867e8 sp=0xc0000867e0 pc=0x610fe918f481 created by unique.runtime_registerUniqueMapCleanup in goroutine 1 runtime/mgc.go:1794 +0x85 goroutine 7 gp=0xc0001dce00 m=nil [GC worker (idle)]: runtime.gopark(0x23c1603d3d41?, 0x3?, 0xc3?, 0x71?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000086f38 sp=0xc000086f18 pc=0x610fe9187d4e runtime.gcBgMarkWorker(0xc0000b98f0) runtime/mgc.go:1423 +0xe9 fp=0xc000086fc8 sp=0xc000086f38 pc=0x610fe9135249 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc000086fe0 sp=0xc000086fc8 pc=0x610fe9135125 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000086fe8 sp=0xc000086fe0 pc=0x610fe918f481 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 8 gp=0xc0001dcfc0 m=nil [GC worker (idle)]: runtime.gopark(0x610feae411e0?, 0x1?, 0x95?, 0x3a?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000087738 sp=0xc000087718 pc=0x610fe9187d4e runtime.gcBgMarkWorker(0xc0000b98f0) runtime/mgc.go:1423 +0xe9 fp=0xc0000877c8 sp=0xc000087738 pc=0x610fe9135249 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc0000877e0 sp=0xc0000877c8 pc=0x610fe9135125 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0000877e8 sp=0xc0000877e0 pc=0x610fe918f481 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 9 gp=0xc0001dd180 m=nil [GC worker (idle)]: runtime.gopark(0x610feae411e0?, 0x1?, 0xab?, 0x3?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000087f38 sp=0xc000087f18 pc=0x610fe9187d4e runtime.gcBgMarkWorker(0xc0000b98f0) runtime/mgc.go:1423 +0xe9 fp=0xc000087fc8 sp=0xc000087f38 pc=0x610fe9135249 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc000087fe0 sp=0xc000087fc8 pc=0x610fe9135125 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000087fe8 sp=0xc000087fe0 pc=0x610fe918f481 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 10 gp=0xc0001dd340 m=nil [GC worker (idle)]: runtime.gopark(0x23bef7e001da?, 0x3?, 0x95?, 0x7?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000080738 sp=0xc000080718 pc=0x610fe9187d4e runtime.gcBgMarkWorker(0xc0000b98f0) runtime/mgc.go:1423 +0xe9 fp=0xc0000807c8 sp=0xc000080738 pc=0x610fe9135249 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc0000807e0 sp=0xc0000807c8 pc=0x610fe9135125 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0000807e8 sp=0xc0000807e0 pc=0x610fe918f481 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 11 gp=0xc0001dd500 m=nil [GC worker (idle)]: runtime.gopark(0x23c1603ccc08?, 0x3?, 0xa5?, 0xc7?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000080f38 sp=0xc000080f18 pc=0x610fe9187d4e runtime.gcBgMarkWorker(0xc0000b98f0) runtime/mgc.go:1423 +0xe9 fp=0xc000080fc8 sp=0xc000080f38 pc=0x610fe9135249 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc000080fe0 sp=0xc000080fc8 pc=0x610fe9135125 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000080fe8 sp=0xc000080fe0 pc=0x610fe918f481 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 12 gp=0xc0001dd6c0 m=nil [GC worker (idle)]: runtime.gopark(0x23c15fb13c79?, 0x1?, 0x11?, 0x7e?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000081738 sp=0xc000081718 pc=0x610fe9187d4e runtime.gcBgMarkWorker(0xc0000b98f0) runtime/mgc.go:1423 +0xe9 fp=0xc0000817c8 sp=0xc000081738 pc=0x610fe9135249 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc0000817e0 sp=0xc0000817c8 pc=0x610fe9135125 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0000817e8 sp=0xc0000817e0 pc=0x610fe918f481 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 13 gp=0xc0001dd880 m=nil [GC worker (idle)]: runtime.gopark(0x23c1603d1eae?, 0x3?, 0x8e?, 0xb2?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000081f38 sp=0xc000081f18 pc=0x610fe9187d4e runtime.gcBgMarkWorker(0xc0000b98f0) runtime/mgc.go:1423 +0xe9 fp=0xc000081fc8 sp=0xc000081f38 pc=0x610fe9135249 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc000081fe0 sp=0xc000081fc8 pc=0x610fe9135125 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000081fe8 sp=0xc000081fe0 pc=0x610fe918f481 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 14 gp=0xc0001dda40 m=nil [GC worker (idle)]: runtime.gopark(0x610feae411e0?, 0x1?, 0x9d?, 0xd3?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000082738 sp=0xc000082718 pc=0x610fe9187d4e runtime.gcBgMarkWorker(0xc0000b98f0) runtime/mgc.go:1423 +0xe9 fp=0xc0000827c8 sp=0xc000082738 pc=0x610fe9135249 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc0000827e0 sp=0xc0000827c8 pc=0x610fe9135125 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0000827e8 sp=0xc0000827e0 pc=0x610fe918f481 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 15 gp=0xc0001ddc00 m=nil [GC worker (idle)]: runtime.gopark(0x610feae411e0?, 0x1?, 0x88?, 0xc1?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000082f38 sp=0xc000082f18 pc=0x610fe9187d4e runtime.gcBgMarkWorker(0xc0000b98f0) runtime/mgc.go:1423 +0xe9 fp=0xc000082fc8 sp=0xc000082f38 pc=0x610fe9135249 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc000082fe0 sp=0xc000082fc8 pc=0x610fe9135125 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000082fe8 sp=0xc000082fe0 pc=0x610fe918f481 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 16 gp=0xc0001dddc0 m=nil [GC worker (idle)]: runtime.gopark(0x23c1603f6bf7?, 0x1?, 0xfa?, 0xa7?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000083738 sp=0xc000083718 pc=0x610fe9187d4e runtime.gcBgMarkWorker(0xc0000b98f0) runtime/mgc.go:1423 +0xe9 fp=0xc0000837c8 sp=0xc000083738 pc=0x610fe9135249 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc0000837e0 sp=0xc0000837c8 pc=0x610fe9135125 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0000837e8 sp=0xc0000837e0 pc=0x610fe918f481 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 18 gp=0xc000524000 m=nil [GC worker (idle)]: runtime.gopark(0x23c15fb13c5e?, 0x3?, 0x66?, 0xb1?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000083f38 sp=0xc000083f18 pc=0x610fe9187d4e runtime.gcBgMarkWorker(0xc0000b98f0) runtime/mgc.go:1423 +0xe9 fp=0xc000083fc8 sp=0xc000083f38 pc=0x610fe9135249 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc000083fe0 sp=0xc000083fc8 pc=0x610fe9135125 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000083fe8 sp=0xc000083fe0 pc=0x610fe918f481 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 19 gp=0xc0005241c0 m=nil [GC worker (idle)]: runtime.gopark(0x23c15fb136df?, 0x3?, 0xbf?, 0xf9?, 0x0?) runtime/proc.go:435 +0xce fp=0xc00052a738 sp=0xc00052a718 pc=0x610fe9187d4e runtime.gcBgMarkWorker(0xc0000b98f0) runtime/mgc.go:1423 +0xe9 fp=0xc00052a7c8 sp=0xc00052a738 pc=0x610fe9135249 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc00052a7e0 sp=0xc00052a7c8 pc=0x610fe9135125 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc00052a7e8 sp=0xc00052a7e0 pc=0x610fe918f481 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 34 gp=0xc000306000 m=nil [GC worker (idle)]: runtime.gopark(0x23c1603c83f3?, 0x3?, 0xbc?, 0xf1?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000526738 sp=0xc000526718 pc=0x610fe9187d4e runtime.gcBgMarkWorker(0xc0000b98f0) runtime/mgc.go:1423 +0xe9 fp=0xc0005267c8 sp=0xc000526738 pc=0x610fe9135249 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc0005267e0 sp=0xc0005267c8 pc=0x610fe9135125 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0005267e8 sp=0xc0005267e0 pc=0x610fe918f481 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 50 gp=0xc000102380 m=nil [GC worker (idle)]: runtime.gopark(0x610feae411e0?, 0x1?, 0x9a?, 0xa1?, 0x0?) runtime/proc.go:435 +0xce fp=0xc00011a738 sp=0xc00011a718 pc=0x610fe9187d4e runtime.gcBgMarkWorker(0xc0000b98f0) runtime/mgc.go:1423 +0xe9 fp=0xc00011a7c8 sp=0xc00011a738 pc=0x610fe9135249 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc00011a7e0 sp=0xc00011a7c8 pc=0x610fe9135125 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc00011a7e8 sp=0xc00011a7e0 pc=0x610fe918f481 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 51 gp=0xc000102540 m=nil [GC worker (idle)]: runtime.gopark(0x23c15fb1177d?, 0x1?, 0x0?, 0x1c?, 0x0?) runtime/proc.go:435 +0xce fp=0xc00011af38 sp=0xc00011af18 pc=0x610fe9187d4e runtime.gcBgMarkWorker(0xc0000b98f0) runtime/mgc.go:1423 +0xe9 fp=0xc00011afc8 sp=0xc00011af38 pc=0x610fe9135249 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc00011afe0 sp=0xc00011afc8 pc=0x610fe9135125 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc00011afe8 sp=0xc00011afe0 pc=0x610fe918f481 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 20 gp=0xc000524380 m=nil [GC worker (idle)]: runtime.gopark(0x23c1607d4ede?, 0x3?, 0x5f?, 0x98?, 0x0?) runtime/proc.go:435 +0xce fp=0xc00052af38 sp=0xc00052af18 pc=0x610fe9187d4e runtime.gcBgMarkWorker(0xc0000b98f0) runtime/mgc.go:1423 +0xe9 fp=0xc00052afc8 sp=0xc00052af38 pc=0x610fe9135249 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc00052afe0 sp=0xc00052afc8 pc=0x610fe9135125 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc00052afe8 sp=0xc00052afe0 pc=0x610fe918f481 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 72 gp=0xc000583180 m=nil [select, 2 minutes]: runtime.gopark(0xc000315a10?, 0x2?, 0x0?, 0x0?, 0xc000315874?) runtime/proc.go:435 +0xce fp=0xc0003156a0 sp=0xc000315680 pc=0x610fe9187d4e runtime.selectgo(0xc000315a10, 0xc000315870, 0x7b52?, 0x0, 0x4?, 0x1) runtime/select.go:351 +0x837 fp=0xc0003157d8 sp=0xc0003156a0 pc=0x610fe91663b7 github.com/ollama/ollama/runner/ollamarunner.(*Server).completion(0xc0007285a0, {0x610fea4fb4c8, 0xc000000000}, 0xc001b00140) github.com/ollama/ollama/runner/ollamarunner/runner.go:680 +0xb65 fp=0xc000315ac0 sp=0xc0003157d8 pc=0x610fe967b3c5 github.com/ollama/ollama/runner/ollamarunner.(*Server).completion-fm({0x610fea4fb4c8?, 0xc000000000?}, 0xc000315b40?) <autogenerated>:1 +0x36 fp=0xc000315af0 sp=0xc000315ac0 pc=0x610fe967e796 net/http.HandlerFunc.ServeHTTP(0xc0005b52c0?, {0x610fea4fb4c8?, 0xc000000000?}, 0xc000315b60?) net/http/server.go:2294 +0x29 fp=0xc000315b18 sp=0xc000315af0 pc=0x610fe9485c49 net/http.(*ServeMux).ServeHTTP(0x610fe912c265?, {0x610fea4fb4c8, 0xc000000000}, 0xc001b00140) net/http/server.go:2822 +0x1c4 fp=0xc000315b68 sp=0xc000315b18 pc=0x610fe9487b44 net/http.serverHandler.ServeHTTP({0x610fea4f7b10?}, {0x610fea4fb4c8?, 0xc000000000?}, 0x1?) net/http/server.go:3301 +0x8e fp=0xc000315b98 sp=0xc000315b68 pc=0x610fe94a55ce net/http.(*conn).serve(0xc0005443f0, {0x610fea4fd758, 0xc0000ffaa0}) net/http/server.go:2102 +0x625 fp=0xc000315fb8 sp=0xc000315b98 pc=0x610fe9484145 net/http.(*Server).Serve.gowrap3() net/http/server.go:3454 +0x28 fp=0xc000315fe0 sp=0xc000315fb8 pc=0x610fe9489a08 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000315fe8 sp=0xc000315fe0 pc=0x610fe918f481 created by net/http.(*Server).Serve in goroutine 1 net/http/server.go:3454 +0x485 goroutine 73 gp=0xc000307a40 m=nil [IO wait, 2 minutes]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0xb?) runtime/proc.go:435 +0xce fp=0xc0014b55d8 sp=0xc0014b55b8 pc=0x610fe9187d4e runtime.netpollblock(0x610fe91ab0b8?, 0xe9120b46?, 0xf?) runtime/netpoll.go:575 +0xf7 fp=0xc0014b5610 sp=0xc0014b55d8 pc=0x610fe914c837 internal/poll.runtime_pollWait(0x7b5e085c6d98, 0x72) runtime/netpoll.go:351 +0x85 fp=0xc0014b5630 sp=0xc0014b5610 pc=0x610fe9186f65 internal/poll.(*pollDesc).wait(0xc00004e080?, 0xc00221c0a1?, 0x0) internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc0014b5658 sp=0xc0014b5630 pc=0x610fe920e3a7 internal/poll.(*pollDesc).waitRead(...) internal/poll/fd_poll_runtime.go:89 internal/poll.(*FD).Read(0xc00004e080, {0xc00221c0a1, 0x1, 0x1}) internal/poll/fd_unix.go:165 +0x27a fp=0xc0014b56f0 sp=0xc0014b5658 pc=0x610fe920f69a net.(*netFD).Read(0xc00004e080, {0xc00221c0a1?, 0xc0018f2058?, 0xc0014b5770?}) net/fd_posix.go:55 +0x25 fp=0xc0014b5738 sp=0xc0014b56f0 pc=0x610fe9283de5 net.(*conn).Read(0xc001f10000, {0xc00221c0a1?, 0x0?, 0x0?}) net/net.go:194 +0x45 fp=0xc0014b5780 sp=0xc0014b5738 pc=0x610fe92921a5 net/http.(*connReader).backgroundRead(0xc00221c090) net/http/server.go:690 +0x37 fp=0xc0014b57c8 sp=0xc0014b5780 pc=0x610fe947e017 net/http.(*connReader).startBackgroundRead.gowrap2() net/http/server.go:686 +0x25 fp=0xc0014b57e0 sp=0xc0014b57c8 pc=0x610fe947df45 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0014b57e8 sp=0xc0014b57e0 pc=0x610fe918f481 created by net/http.(*connReader).startBackgroundRead in goroutine 72 net/http/server.go:686 +0xb6 rax 0x204a03f7c rbx 0x7b5df87b1d20 rcx 0xfdf rdx 0x7b5df8649b10 rdi 0x7b5df8649b20 rsi 0x0 rbp 0x7b5d2d7fd500 rsp 0x7b5d2d7fd4e0 r8 0x0 r9 0x8d4a9743 r10 0x0 r11 0x246 r12 0x7b56f000e520 r13 0x7b5df8649b20 r14 0x0 r15 0x7b5df8002d10 rip 0x7b5de1024fb7 rflags 0x10297 cs 0x33 fs 0x0 gs 0x0 SIGABRT: abort PC=0x7b5e4f96db2c m=14 sigcode=18446744073709551610 signal arrived during cgo execution goroutine 22 gp=0xc000306e00 m=14 mp=0xc000581008 [syscall]: runtime.cgocall(0x610fe9e51720, 0xc0042f3a58) runtime/cgocall.go:167 +0x4b fp=0xc0042f3a30 sp=0xc0042f39f8 pc=0x610fe91848cb github.com/ollama/ollama/ml/backend/ggml._Cfunc_ggml_backend_sched_graph_compute_async(0x7b5df90110f0, 0x7b572c006150) _cgo_gotypes.go:886 +0x4a fp=0xc0042f3a58 sp=0xc0042f3a30 pc=0x610fe95bec6a github.com/ollama/ollama/ml/backend/ggml.(*Context).Compute.func1(...) github.com/ollama/ollama/ml/backend/ggml/ggml.go:631 github.com/ollama/ollama/ml/backend/ggml.(*Context).Compute(0xc001a68000, {0xc034cbc360, 0x1, 0x0?}) github.com/ollama/ollama/ml/backend/ggml/ggml.go:631 +0x9d fp=0xc0042f3b00 sp=0xc0042f3a58 pc=0x610fe95ca17d github.com/ollama/ollama/model.Forward({0x610fea505a90, 0xc001a68000}, {0x610fea4fc2b0, 0xc0000e96b0}, {0xc0013b8800, 0x200, 0x200}, {{0x610fea5106e8, 0xc0018f0018}, {0x0, ...}, ...}) github.com/ollama/ollama/model/model.go:305 +0x2a7 fp=0xc0042f3be8 sp=0xc0042f3b00 pc=0x610fe95d8147 github.com/ollama/ollama/runner/ollamarunner.(*Server).processBatch(0xc0007285a0) github.com/ollama/ollama/runner/ollamarunner/runner.go:480 +0x4c5 fp=0xc0042f3f98 sp=0xc0042f3be8 pc=0x610fe9679085 github.com/ollama/ollama/runner/ollamarunner.(*Server).run(0xc0007285a0, {0x610fea4fd790, 0xc000690690}) github.com/ollama/ollama/runner/ollamarunner/runner.go:362 +0x4e fp=0xc0042f3fb8 sp=0xc0042f3f98 pc=0x610fe9678b6e github.com/ollama/ollama/runner/ollamarunner.Execute.gowrap2() github.com/ollama/ollama/runner/ollamarunner/runner.go:960 +0x28 fp=0xc0042f3fe0 sp=0xc0042f3fb8 pc=0x610fe967e2c8 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0042f3fe8 sp=0xc0042f3fe0 pc=0x610fe918f481 created by github.com/ollama/ollama/runner/ollamarunner.Execute in goroutine 1 github.com/ollama/ollama/runner/ollamarunner/runner.go:960 +0xa74 goroutine 1 gp=0xc000002380 m=nil [IO wait, 2 minutes]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000313650 sp=0xc000313630 pc=0x610fe9187d4e runtime.netpollblock(0xc0003136a0?, 0xe9120b46?, 0xf?) runtime/netpoll.go:575 +0xf7 fp=0xc000313688 sp=0xc000313650 pc=0x610fe914c837 internal/poll.runtime_pollWait(0x7b5e085c6eb0, 0x72) runtime/netpoll.go:351 +0x85 fp=0xc0003136a8 sp=0xc000313688 pc=0x610fe9186f65 internal/poll.(*pollDesc).wait(0xc000716280?, 0x90012ae3e?, 0x0) internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc0003136d0 sp=0xc0003136a8 pc=0x610fe920e3a7 internal/poll.(*pollDesc).waitRead(...) internal/poll/fd_poll_runtime.go:89 internal/poll.(*FD).Accept(0xc000716280) internal/poll/fd_unix.go:620 +0x295 fp=0xc000313778 sp=0xc0003136d0 pc=0x610fe9213775 net.(*netFD).accept(0xc000716280) net/fd_unix.go:172 +0x29 fp=0xc000313830 sp=0xc000313778 pc=0x610fe9285d89 net.(*TCPListener).accept(0xc0001406c0) net/tcpsock_posix.go:159 +0x1b fp=0xc000313880 sp=0xc000313830 pc=0x610fe929b73b net.(*TCPListener).Accept(0xc0001406c0) net/tcpsock.go:380 +0x30 fp=0xc0003138b0 sp=0xc000313880 pc=0x610fe929a5f0 net/http.(*onceCloseListener).Accept(0xc0005443f0?) <autogenerated>:1 +0x24 fp=0xc0003138c8 sp=0xc0003138b0 pc=0x610fe94b1d44 net/http.(*Server).Serve(0xc0001ff400, {0x610fea4fb2e8, 0xc0001406c0}) net/http/server.go:3424 +0x30c fp=0xc0003139f8 sp=0xc0003138c8 pc=0x610fe948960c github.com/ollama/ollama/runner/ollamarunner.Execute({0xc000034150, 0xe, 0xf}) github.com/ollama/ollama/runner/ollamarunner/runner.go:984 +0xe09 fp=0xc000313d08 sp=0xc0003139f8 pc=0x610fe967e029 github.com/ollama/ollama/runner.Execute({0xc000034130?, 0x0?, 0x0?}) github.com/ollama/ollama/runner/runner.go:20 +0xc9 fp=0xc000313d30 sp=0xc000313d08 pc=0x610fe967e929 github.com/ollama/ollama/cmd.NewCLI.func2(0xc0001ff200?, {0x610fea03e07e?, 0x4?, 0x610fea03e082?}) github.com/ollama/ollama/cmd/cmd.go:1583 +0x45 fp=0xc000313d58 sp=0xc000313d30 pc=0x610fe9de3685 github.com/spf13/cobra.(*Command).execute(0xc000546f08, {0xc0005a8870, 0xf, 0xf}) github.com/spf13/cobra@v1.7.0/command.go:940 +0x85c fp=0xc000313e78 sp=0xc000313d58 pc=0x610fe92ff3dc github.com/spf13/cobra.(*Command).ExecuteC(0xc000734908) github.com/spf13/cobra@v1.7.0/command.go:1068 +0x3a5 fp=0xc000313f30 sp=0xc000313e78 pc=0x610fe92ffc25 github.com/spf13/cobra.(*Command).Execute(...) github.com/spf13/cobra@v1.7.0/command.go:992 github.com/spf13/cobra.(*Command).ExecuteContext(...) github.com/spf13/cobra@v1.7.0/command.go:985 main.main() github.com/ollama/ollama/main.go:12 +0x4d fp=0xc000313f50 sp=0xc000313f30 pc=0x610fe9de416d runtime.main() runtime/proc.go:283 +0x29d fp=0xc000313fe0 sp=0xc000313f50 pc=0x610fe9153ebd runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000313fe8 sp=0xc000313fe0 pc=0x610fe918f481 goroutine 2 gp=0xc000002e00 m=nil [force gc (idle), 5 minutes]: runtime.gopark(0x239b11b73f90?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000084fa8 sp=0xc000084f88 pc=0x610fe9187d4e runtime.goparkunlock(...) runtime/proc.go:441 runtime.forcegchelper() runtime/proc.go:348 +0xb8 fp=0xc000084fe0 sp=0xc000084fa8 pc=0x610fe91541f8 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000084fe8 sp=0xc000084fe0 pc=0x610fe918f481 created by runtime.init.7 in goroutine 1 runtime/proc.go:336 +0x1a goroutine 3 gp=0xc000003340 m=nil [GC sweep wait]: runtime.gopark(0x1?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000085780 sp=0xc000085760 pc=0x610fe9187d4e runtime.goparkunlock(...) runtime/proc.go:441 runtime.bgsweep(0xc0000aa000) runtime/mgcsweep.go:316 +0xdf fp=0xc0000857c8 sp=0xc000085780 pc=0x610fe913e99f runtime.gcenable.gowrap1() runtime/mgc.go:204 +0x25 fp=0xc0000857e0 sp=0xc0000857c8 pc=0x610fe9132d85 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0000857e8 sp=0xc0000857e0 pc=0x610fe918f481 created by runtime.gcenable in goroutine 1 runtime/mgc.go:204 +0x66 goroutine 4 gp=0xc000003500 m=nil [GC scavenge wait]: runtime.gopark(0x10000?, 0x496245?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000085f78 sp=0xc000085f58 pc=0x610fe9187d4e runtime.goparkunlock(...) runtime/proc.go:441 runtime.(*scavengerState).park(0x610fead929a0) runtime/mgcscavenge.go:425 +0x49 fp=0xc000085fa8 sp=0xc000085f78 pc=0x610fe913c3e9 runtime.bgscavenge(0xc0000aa000) runtime/mgcscavenge.go:658 +0x59 fp=0xc000085fc8 sp=0xc000085fa8 pc=0x610fe913c979 runtime.gcenable.gowrap2() runtime/mgc.go:205 +0x25 fp=0xc000085fe0 sp=0xc000085fc8 pc=0x610fe9132d25 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000085fe8 sp=0xc000085fe0 pc=0x610fe918f481 created by runtime.gcenable in goroutine 1 runtime/mgc.go:205 +0xa5 goroutine 5 gp=0xc000003dc0 m=nil [finalizer wait, 5 minutes]: runtime.gopark(0x1b8?, 0xc000002380?, 0x1?, 0x23?, 0xc000084688?) runtime/proc.go:435 +0xce fp=0xc000084630 sp=0xc000084610 pc=0x610fe9187d4e runtime.runfinq() runtime/mfinal.go:196 +0x107 fp=0xc0000847e0 sp=0xc000084630 pc=0x610fe9131d47 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0000847e8 sp=0xc0000847e0 pc=0x610fe918f481 created by runtime.createfing in goroutine 1 runtime/mfinal.go:166 +0x3d goroutine 6 gp=0xc0001dc8c0 m=nil [chan receive]: runtime.gopark(0xc000235540?, 0xc0018f08b8?, 0x60?, 0x67?, 0x610fe926c9c8?) runtime/proc.go:435 +0xce fp=0xc000086718 sp=0xc0000866f8 pc=0x610fe9187d4e runtime.chanrecv(0xc0000b8310, 0x0, 0x1) runtime/chan.go:664 +0x445 fp=0xc000086790 sp=0xc000086718 pc=0x610fe9123725 runtime.chanrecv1(0x0?, 0x0?) runtime/chan.go:506 +0x12 fp=0xc0000867b8 sp=0xc000086790 pc=0x610fe91232b2 runtime.unique_runtime_registerUniqueMapCleanup.func2(...) runtime/mgc.go:1796 runtime.unique_runtime_registerUniqueMapCleanup.gowrap1() runtime/mgc.go:1799 +0x2f fp=0xc0000867e0 sp=0xc0000867b8 pc=0x610fe9135f2f runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0000867e8 sp=0xc0000867e0 pc=0x610fe918f481 created by unique.runtime_registerUniqueMapCleanup in goroutine 1 runtime/mgc.go:1794 +0x85 goroutine 7 gp=0xc0001dce00 m=nil [GC worker (idle)]: runtime.gopark(0x23c1603d3d41?, 0x3?, 0xc3?, 0x71?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000086f38 sp=0xc000086f18 pc=0x610fe9187d4e runtime.gcBgMarkWorker(0xc0000b98f0) runtime/mgc.go:1423 +0xe9 fp=0xc000086fc8 sp=0xc000086f38 pc=0x610fe9135249 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc000086fe0 sp=0xc000086fc8 pc=0x610fe9135125 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000086fe8 sp=0xc000086fe0 pc=0x610fe918f481 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 8 gp=0xc0001dcfc0 m=nil [GC worker (idle)]: runtime.gopark(0x610feae411e0?, 0x1?, 0x95?, 0x3a?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000087738 sp=0xc000087718 pc=0x610fe9187d4e runtime.gcBgMarkWorker(0xc0000b98f0) runtime/mgc.go:1423 +0xe9 fp=0xc0000877c8 sp=0xc000087738 pc=0x610fe9135249 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc0000877e0 sp=0xc0000877c8 pc=0x610fe9135125 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0000877e8 sp=0xc0000877e0 pc=0x610fe918f481 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 9 gp=0xc0001dd180 m=nil [GC worker (idle)]: runtime.gopark(0x610feae411e0?, 0x1?, 0xab?, 0x3?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000087f38 sp=0xc000087f18 pc=0x610fe9187d4e runtime.gcBgMarkWorker(0xc0000b98f0) runtime/mgc.go:1423 +0xe9 fp=0xc000087fc8 sp=0xc000087f38 pc=0x610fe9135249 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc000087fe0 sp=0xc000087fc8 pc=0x610fe9135125 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000087fe8 sp=0xc000087fe0 pc=0x610fe918f481 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 10 gp=0xc0001dd340 m=nil [GC worker (idle)]: runtime.gopark(0x23bef7e001da?, 0x3?, 0x95?, 0x7?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000080738 sp=0xc000080718 pc=0x610fe9187d4e runtime.gcBgMarkWorker(0xc0000b98f0) runtime/mgc.go:1423 +0xe9 fp=0xc0000807c8 sp=0xc000080738 pc=0x610fe9135249 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc0000807e0 sp=0xc0000807c8 pc=0x610fe9135125 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0000807e8 sp=0xc0000807e0 pc=0x610fe918f481 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 11 gp=0xc0001dd500 m=nil [GC worker (idle)]: runtime.gopark(0x23c1603ccc08?, 0x3?, 0xa5?, 0xc7?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000080f38 sp=0xc000080f18 pc=0x610fe9187d4e runtime.gcBgMarkWorker(0xc0000b98f0) runtime/mgc.go:1423 +0xe9 fp=0xc000080fc8 sp=0xc000080f38 pc=0x610fe9135249 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc000080fe0 sp=0xc000080fc8 pc=0x610fe9135125 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000080fe8 sp=0xc000080fe0 pc=0x610fe918f481 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 12 gp=0xc0001dd6c0 m=nil [GC worker (idle)]: runtime.gopark(0x23c15fb13c79?, 0x1?, 0x11?, 0x7e?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000081738 sp=0xc000081718 pc=0x610fe9187d4e runtime.gcBgMarkWorker(0xc0000b98f0) runtime/mgc.go:1423 +0xe9 fp=0xc0000817c8 sp=0xc000081738 pc=0x610fe9135249 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc0000817e0 sp=0xc0000817c8 pc=0x610fe9135125 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0000817e8 sp=0xc0000817e0 pc=0x610fe918f481 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 13 gp=0xc0001dd880 m=nil [GC worker (idle)]: runtime.gopark(0x23c1603d1eae?, 0x3?, 0x8e?, 0xb2?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000081f38 sp=0xc000081f18 pc=0x610fe9187d4e runtime.gcBgMarkWorker(0xc0000b98f0) runtime/mgc.go:1423 +0xe9 fp=0xc000081fc8 sp=0xc000081f38 pc=0x610fe9135249 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc000081fe0 sp=0xc000081fc8 pc=0x610fe9135125 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000081fe8 sp=0xc000081fe0 pc=0x610fe918f481 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 14 gp=0xc0001dda40 m=nil [GC worker (idle)]: runtime.gopark(0x610feae411e0?, 0x1?, 0x9d?, 0xd3?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000082738 sp=0xc000082718 pc=0x610fe9187d4e runtime.gcBgMarkWorker(0xc0000b98f0) runtime/mgc.go:1423 +0xe9 fp=0xc0000827c8 sp=0xc000082738 pc=0x610fe9135249 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc0000827e0 sp=0xc0000827c8 pc=0x610fe9135125 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0000827e8 sp=0xc0000827e0 pc=0x610fe918f481 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 15 gp=0xc0001ddc00 m=nil [GC worker (idle)]: runtime.gopark(0x610feae411e0?, 0x1?, 0x88?, 0xc1?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000082f38 sp=0xc000082f18 pc=0x610fe9187d4e runtime.gcBgMarkWorker(0xc0000b98f0) runtime/mgc.go:1423 +0xe9 fp=0xc000082fc8 sp=0xc000082f38 pc=0x610fe9135249 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc000082fe0 sp=0xc000082fc8 pc=0x610fe9135125 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000082fe8 sp=0xc000082fe0 pc=0x610fe918f481 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 16 gp=0xc0001dddc0 m=nil [GC worker (idle)]: runtime.gopark(0x23c1603f6bf7?, 0x1?, 0xfa?, 0xa7?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000083738 sp=0xc000083718 pc=0x610fe9187d4e runtime.gcBgMarkWorker(0xc0000b98f0) runtime/mgc.go:1423 +0xe9 fp=0xc0000837c8 sp=0xc000083738 pc=0x610fe9135249 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc0000837e0 sp=0xc0000837c8 pc=0x610fe9135125 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0000837e8 sp=0xc0000837e0 pc=0x610fe918f481 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 18 gp=0xc000524000 m=nil [GC worker (idle)]: runtime.gopark(0x23c15fb13c5e?, 0x3?, 0x66?, 0xb1?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000083f38 sp=0xc000083f18 pc=0x610fe9187d4e runtime.gcBgMarkWorker(0xc0000b98f0) runtime/mgc.go:1423 +0xe9 fp=0xc000083fc8 sp=0xc000083f38 pc=0x610fe9135249 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc000083fe0 sp=0xc000083fc8 pc=0x610fe9135125 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000083fe8 sp=0xc000083fe0 pc=0x610fe918f481 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 19 gp=0xc0005241c0 m=nil [GC worker (idle)]: runtime.gopark(0x23c15fb136df?, 0x3?, 0xbf?, 0xf9?, 0x0?) runtime/proc.go:435 +0xce fp=0xc00052a738 sp=0xc00052a718 pc=0x610fe9187d4e runtime.gcBgMarkWorker(0xc0000b98f0) runtime/mgc.go:1423 +0xe9 fp=0xc00052a7c8 sp=0xc00052a738 pc=0x610fe9135249 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc00052a7e0 sp=0xc00052a7c8 pc=0x610fe9135125 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc00052a7e8 sp=0xc00052a7e0 pc=0x610fe918f481 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 34 gp=0xc000306000 m=nil [GC worker (idle)]: runtime.gopark(0x23c1603c83f3?, 0x3?, 0xbc?, 0xf1?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000526738 sp=0xc000526718 pc=0x610fe9187d4e runtime.gcBgMarkWorker(0xc0000b98f0) runtime/mgc.go:1423 +0xe9 fp=0xc0005267c8 sp=0xc000526738 pc=0x610fe9135249 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc0005267e0 sp=0xc0005267c8 pc=0x610fe9135125 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0005267e8 sp=0xc0005267e0 pc=0x610fe918f481 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 50 gp=0xc000102380 m=nil [GC worker (idle)]: runtime.gopark(0x610feae411e0?, 0x1?, 0x9a?, 0xa1?, 0x0?) runtime/proc.go:435 +0xce fp=0xc00011a738 sp=0xc00011a718 pc=0x610fe9187d4e runtime.gcBgMarkWorker(0xc0000b98f0) runtime/mgc.go:1423 +0xe9 fp=0xc00011a7c8 sp=0xc00011a738 pc=0x610fe9135249 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc00011a7e0 sp=0xc00011a7c8 pc=0x610fe9135125 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc00011a7e8 sp=0xc00011a7e0 pc=0x610fe918f481 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 51 gp=0xc000102540 m=nil [GC worker (idle)]: runtime.gopark(0x23c15fb1177d?, 0x1?, 0x0?, 0x1c?, 0x0?) runtime/proc.go:435 +0xce fp=0xc00011af38 sp=0xc00011af18 pc=0x610fe9187d4e runtime.gcBgMarkWorker(0xc0000b98f0) runtime/mgc.go:1423 +0xe9 fp=0xc00011afc8 sp=0xc00011af38 pc=0x610fe9135249 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc00011afe0 sp=0xc00011afc8 pc=0x610fe9135125 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc00011afe8 sp=0xc00011afe0 pc=0x610fe918f481 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 20 gp=0xc000524380 m=nil [GC worker (idle)]: runtime.gopark(0x23c1607d4ede?, 0x3?, 0x5f?, 0x98?, 0x0?) runtime/proc.go:435 +0xce fp=0xc00052af38 sp=0xc00052af18 pc=0x610fe9187d4e runtime.gcBgMarkWorker(0xc0000b98f0) runtime/mgc.go:1423 +0xe9 fp=0xc00052afc8 sp=0xc00052af38 pc=0x610fe9135249 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc00052afe0 sp=0xc00052afc8 pc=0x610fe9135125 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc00052afe8 sp=0xc00052afe0 pc=0x610fe918f481 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 72 gp=0xc000583180 m=nil [select, 2 minutes]: runtime.gopark(0xc000315a10?, 0x2?, 0x0?, 0x0?, 0xc000315874?) runtime/proc.go:435 +0xce fp=0xc0003156a0 sp=0xc000315680 pc=0x610fe9187d4e runtime.selectgo(0xc000315a10, 0xc000315870, 0x7b52?, 0x0, 0x4?, 0x1) runtime/select.go:351 +0x837 fp=0xc0003157d8 sp=0xc0003156a0 pc=0x610fe91663b7 github.com/ollama/ollama/runner/ollamarunner.(*Server).completion(0xc0007285a0, {0x610fea4fb4c8, 0xc000000000}, 0xc001b00140) github.com/ollama/ollama/runner/ollamarunner/runner.go:680 +0xb65 fp=0xc000315ac0 sp=0xc0003157d8 pc=0x610fe967b3c5 github.com/ollama/ollama/runner/ollamarunner.(*Server).completion-fm({0x610fea4fb4c8?, 0xc000000000?}, 0xc000315b40?) <autogenerated>:1 +0x36 fp=0xc000315af0 sp=0xc000315ac0 pc=0x610fe967e796 net/http.HandlerFunc.ServeHTTP(0xc0005b52c0?, {0x610fea4fb4c8?, 0xc000000000?}, 0xc000315b60?) net/http/server.go:2294 +0x29 fp=0xc000315b18 sp=0xc000315af0 pc=0x610fe9485c49 net/http.(*ServeMux).ServeHTTP(0x610fe912c265?, {0x610fea4fb4c8, 0xc000000000}, 0xc001b00140) net/http/server.go:2822 +0x1c4 fp=0xc000315b68 sp=0xc000315b18 pc=0x610fe9487b44 net/http.serverHandler.ServeHTTP({0x610fea4f7b10?}, {0x610fea4fb4c8?, 0xc000000000?}, 0x1?) net/http/server.go:3301 +0x8e fp=0xc000315b98 sp=0xc000315b68 pc=0x610fe94a55ce net/http.(*conn).serve(0xc0005443f0, {0x610fea4fd758, 0xc0000ffaa0}) net/http/server.go:2102 +0x625 fp=0xc000315fb8 sp=0xc000315b98 pc=0x610fe9484145 net/http.(*Server).Serve.gowrap3() net/http/server.go:3454 +0x28 fp=0xc000315fe0 sp=0xc000315fb8 pc=0x610fe9489a08 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000315fe8 sp=0xc000315fe0 pc=0x610fe918f481 created by net/http.(*Server).Serve in goroutine 1 net/http/server.go:3454 +0x485 goroutine 73 gp=0xc000307a40 m=nil [IO wait, 2 minutes]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0xb?) runtime/proc.go:435 +0xce fp=0xc0014b55d8 sp=0xc0014b55b8 pc=0x610fe9187d4e runtime.netpollblock(0x610fe91ab0b8?, 0xe9120b46?, 0xf?) runtime/netpoll.go:575 +0xf7 fp=0xc0014b5610 sp=0xc0014b55d8 pc=0x610fe914c837 internal/poll.runtime_pollWait(0x7b5e085c6d98, 0x72) runtime/netpoll.go:351 +0x85 fp=0xc0014b5630 sp=0xc0014b5610 pc=0x610fe9186f65 internal/poll.(*pollDesc).wait(0xc00004e080?, 0xc00221c0a1?, 0x0) internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc0014b5658 sp=0xc0014b5630 pc=0x610fe920e3a7 internal/poll.(*pollDesc).waitRead(...) internal/poll/fd_poll_runtime.go:89 internal/poll.(*FD).Read(0xc00004e080, {0xc00221c0a1, 0x1, 0x1}) internal/poll/fd_unix.go:165 +0x27a fp=0xc0014b56f0 sp=0xc0014b5658 pc=0x610fe920f69a net.(*netFD).Read(0xc00004e080, {0xc00221c0a1?, 0xc0018f2058?, 0xc0014b5770?}) net/fd_posix.go:55 +0x25 fp=0xc0014b5738 sp=0xc0014b56f0 pc=0x610fe9283de5 net.(*conn).Read(0xc001f10000, {0xc00221c0a1?, 0x0?, 0x0?}) net/net.go:194 +0x45 fp=0xc0014b5780 sp=0xc0014b5738 pc=0x610fe92921a5 net/http.(*connReader).backgroundRead(0xc00221c090) net/http/server.go:690 +0x37 fp=0xc0014b57c8 sp=0xc0014b5780 pc=0x610fe947e017 net/http.(*connReader).startBackgroundRead.gowrap2() net/http/server.go:686 +0x25 fp=0xc0014b57e0 sp=0xc0014b57c8 pc=0x610fe947df45 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0014b57e8 sp=0xc0014b57e0 pc=0x610fe918f481 created by net/http.(*connReader).startBackgroundRead in goroutine 72 net/http/server.go:686 +0xb6 rax 0x0 rbx 0x298 rcx 0x7b5e4f96db2c rdx 0x6 rdi 0x288 rsi 0x298 rbp 0x7b5d2d7fd670 rsp 0x7b5d2d7fd630 r8 0x0 r9 0x0 r10 0x8 r11 0x246 r12 0x6 r13 0x4d r14 0x16 r15 0x7b572c04f1c0 rip 0x7b5e4f96db2c rflags 0x246 cs 0x33 fs 0x0 gs 0x0 time=2025-08-06T19:48:17.187Z level=ERROR source=server.go:807 msg="post predict" error="Post \"http://127.0.0.1:44317/completion\": EOF" [GIN] 2025/08/06 - 19:48:17 | 200 | 53.432952415s | 172.18.0.1 | POST "/api/chat" time=2025-08-06T19:48:17.187Z level=DEBUG source=sched.go:432 msg="context for request finished" runner.name=registry.ollama.ai/library/gpt-oss:20b runner.inference=cuda runner.devices=1 runner.size="22.6 GiB" runner.vram="22.6 GiB" runner.parallel=1 runner.pid=648 runner.model=/root/.ollama/models/blobs/sha256-b112e727c6f18875636c56a779790a590d705aec9e1c0eb5a97d51fc2a778583 runner.num_ctx=36864 time=2025-08-06T19:48:17.187Z level=DEBUG source=sched.go:341 msg="runner with non-zero duration has gone idle, adding timer" runner.name=registry.ollama.ai/library/gpt-oss:20b runner.inference=cuda runner.devices=1 runner.size="22.6 GiB" runner.vram="22.6 GiB" runner.parallel=1 runner.pid=648 runner.model=/root/.ollama/models/blobs/sha256-b112e727c6f18875636c56a779790a590d705aec9e1c0eb5a97d51fc2a778583 runner.num_ctx=36864 duration=2562047h47m16.854775807s time=2025-08-06T19:48:17.187Z level=DEBUG source=sched.go:359 msg="after processing request finished event" runner.name=registry.ollama.ai/library/gpt-oss:20b runner.inference=cuda runner.devices=1 runner.size="22.6 GiB" runner.vram="22.6 GiB" runner.parallel=1 runner.pid=648 runner.model=/root/.ollama/models/blobs/sha256-b112e727c6f18875636c56a779790a590d705aec9e1c0eb5a97d51fc2a778583 runner.num_ctx=36864 refCount=0 time=2025-08-06T19:48:17.240Z level=ERROR source=server.go:464 msg="llama runner terminated" error="exit status 2" ``` </details> ### OS Linux ### GPU Nvidia ### CPU Intel ### Ollama version 0.11.3
GiteaMirror added the bug label 2026-05-10 00:16:34 -05:00
Author
Owner

@asif-a-srbd commented on GitHub (Aug 20, 2025):

Faced similar issue while using GPT OSS 20B with huggingface transformers pipeline. At every inference the memory usage spiked up by ~3 GB and ran out of memory after 3 calls. So maybe this is not an Ollama specific issue.

<!-- gh-comment-id:3204820932 --> @asif-a-srbd commented on GitHub (Aug 20, 2025): Faced similar issue while using GPT OSS 20B with huggingface transformers pipeline. At every inference the memory usage spiked up by ~3 GB and ran out of memory after 3 calls. So maybe this is not an Ollama specific issue.
Author
Owner

@jessegross commented on GitHub (Aug 21, 2025):

It's actually not a memory leak, it's a pool allocator that adds additional memory as needed, which it does as the history grows. This is used for temporary conversion buffers, which are needed more for gpt-oss as it is published with some BF16 tensors.

That's not to say it isn't a problem.

<!-- gh-comment-id:3211843157 --> @jessegross commented on GitHub (Aug 21, 2025): It's actually not a memory leak, it's a pool allocator that adds additional memory as needed, which it does as the history grows. This is used for temporary conversion buffers, which are needed more for gpt-oss as it is published with some BF16 tensors. That's not to say it isn't a problem.
Author
Owner

@jessegross commented on GitHub (Aug 21, 2025):

As a mitigation, this can be avoided by setting OLLAMA_FLASH_ATTENTION=1

<!-- gh-comment-id:3212081344 --> @jessegross commented on GitHub (Aug 21, 2025): As a mitigation, this can be avoided by setting OLLAMA_FLASH_ATTENTION=1
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#85476