[GH-ISSUE #9685] Ollama unable to run Gemma:27b #68378

New Issue

GiteaMirror · 2026-05-04T13:41:12-05:00

GiteaMirror commented

2026-05-04 13:41:12 -05:00

Originally created by @Jan-Fuchs on GitHub (Mar 12, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/9685

What is the issue?

Output is always the same, all other models, even bigger ones work without issues

ggml_cuda_init: found 1 CUDA devices:
Device 0: Orin, compute capability 8.7, VMM: yes
load_backend: loaded CUDA backend from /usr/local/lib/ollama/cuda_jetpack6/libggml-cuda.so
load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu.so
time=2025-03-12T12:11:35.233Z level=INFO source=ggml.go:109 msg=system CPU.0.NEON=1 CPU.0.ARM_FMA=1 CPU.0.LLAMAFILE=1 CUDA.0.ARCHS=870 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 CPU.1.NEON=1 CPU.1.ARM_FMA=1 CPU.1.LLAMAFILE=1 compiler=cgo(clang)
time=2025-03-12T12:11:39.337Z level=INFO source=ggml.go:289 msg="model weights" buffer=CUDA0 size="16.2 GiB"
time=2025-03-12T12:11:39.338Z level=INFO source=ggml.go:289 msg="model weights" buffer=CPU size="1.1 GiB"
time=2025-03-12T12:14:15.838Z level=INFO source=server.go:619 msg="waiting for server to become available" status="llm server not responding"
time=2025-03-12T12:14:21.512Z level=INFO source=server.go:619 msg="waiting for server to become available" status="llm server error"
time=2025-03-12T12:14:22.015Z level=ERROR source=sched.go:456 msg="error loading llama server" error="llama runner process has terminated: signal: killed"
[GIN] 2025/03/12 - 12:14:22 | 500 | 2m48s | 127.0.0.1 | POST "/api/generate"

Relevant log output

OS

No response

GPU

No response

CPU

No response

Ollama version

No response

Originally created by @Jan-Fuchs on GitHub (Mar 12, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/9685 ### What is the issue? Output is always the same, all other models, even bigger ones work without issues ggml_cuda_init: found 1 CUDA devices: Device 0: Orin, compute capability 8.7, VMM: yes load_backend: loaded CUDA backend from /usr/local/lib/ollama/cuda_jetpack6/libggml-cuda.so load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu.so time=2025-03-12T12:11:35.233Z level=INFO source=ggml.go:109 msg=system CPU.0.NEON=1 CPU.0.ARM_FMA=1 CPU.0.LLAMAFILE=1 CUDA.0.ARCHS=870 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 CPU.1.NEON=1 CPU.1.ARM_FMA=1 CPU.1.LLAMAFILE=1 compiler=cgo(clang) time=2025-03-12T12:11:39.337Z level=INFO source=ggml.go:289 msg="model weights" buffer=CUDA0 size="16.2 GiB" time=2025-03-12T12:11:39.338Z level=INFO source=ggml.go:289 msg="model weights" buffer=CPU size="1.1 GiB" time=2025-03-12T12:14:15.838Z level=INFO source=server.go:619 msg="waiting for server to become available" status="llm server not responding" time=2025-03-12T12:14:21.512Z level=INFO source=server.go:619 msg="waiting for server to become available" status="llm server error" time=2025-03-12T12:14:22.015Z level=ERROR source=sched.go:456 msg="error loading llama server" error="llama runner process has terminated: signal: killed" [GIN] 2025/03/12 - 12:14:22 | 500 | 2m48s | 127.0.0.1 | POST "/api/generate" ### Relevant log output ```shell ``` ### OS _No response_ ### GPU _No response_ ### CPU _No response_ ### Ollama version _No response_

GiteaMirror added the bug needs more info labels 2026-05-04 13:41:13 -05:00

GiteaMirror closed this issue

2026-05-04 13:41:14 -05:00

GiteaMirror commented

2026-05-04 13:41:18 -05:00

@UlrikWKoren commented on GitHub (Mar 12, 2025):

Update your client.

@UlrikWKoren commented on GitHub (Mar 12, 2025): Update your client.

GiteaMirror commented

2026-05-04 13:41:20 -05:00

@Jan-Fuchs commented on GitHub (Mar 12, 2025):

Update your client.

Did that about 4 times today already

@Jan-Fuchs commented on GitHub (Mar 12, 2025): > Update your client. Did that about 4 times today already

GiteaMirror commented

2026-05-04 13:41:21 -05:00

@rick-github commented on GitHub (Mar 12, 2025):

"llama runner process has terminated: signal: killed"

signal killed generally means that an external process killed the runner, perhaps because of OOM. What do the system logs (/var/log/kernel, dmesg, etc) show?

@rick-github commented on GitHub (Mar 12, 2025): ``` "llama runner process has terminated: signal: killed" ``` signal killed generally means that an external process killed the runner, perhaps because of OOM. What do the system logs (/var/log/kernel, dmesg, etc) show?

GiteaMirror commented

2026-05-04 13:41:25 -05:00

@nachocheeseburger commented on GitHub (Mar 12, 2025):

same issue here. i can't run the 12b either. but i can run much bigger non-gemma models with no issue

@nachocheeseburger commented on GitHub (Mar 12, 2025): same issue here. i can't run the 12b either. but i can run much bigger non-gemma models with no issue

GiteaMirror commented

2026-05-04 13:41:29 -05:00

@jarek7777 commented on GitHub (Mar 12, 2025):

Same her. I was trying run gemma3:27b-it-fp16. Two A6000 ADA, Ubuntu 24.0,

Error: llama runner process has terminated: signal: killed
or
Error: Post "http://127.0.0.1:11434/api/generate": EOF

gemma3:12b is fine

@jarek7777 commented on GitHub (Mar 12, 2025): Same her. I was trying run gemma3:27b-it-fp16. Two A6000 ADA, Ubuntu 24.0, Error: llama runner process has terminated: signal: killed or Error: Post "http://127.0.0.1:11434/api/generate": EOF gemma3:12b is fine

GiteaMirror commented

2026-05-04 13:41:32 -05:00

@rick-github commented on GitHub (Mar 12, 2025):

Server logs will aid in debugging.

@rick-github commented on GitHub (Mar 12, 2025): [Server logs](https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md#how-to-troubleshoot-issues) will aid in debugging.

GiteaMirror commented

2026-05-04 13:41:33 -05:00

@nachocheeseburger commented on GitHub (Mar 12, 2025):

here's the logs when using api..

time=2025-03-12T14:23:03.214Z level=INFO source=images.go:432 msg="total blobs: 55"
time=2025-03-12T14:23:03.214Z level=INFO source=images.go:439 msg="total unused blobs removed: 0"
time=2025-03-12T14:23:03.215Z level=INFO source=routes.go:1292 msg="Listening on [::]:11434 (version 0.6.0)"
time=2025-03-12T14:23:03.215Z level=INFO source=gpu.go:217 msg="looking for compatible GPUs"
time=2025-03-12T14:23:03.447Z level=INFO source=types.go:130 msg="inference compute" id=GPU-07f2832b-0aaa-0fda-9cc3-d53aecc3d142 library=cuda variant=v12 compute=8.9 driver=12.8 name="NVIDIA GeForce RTX 4090 Laptop GPU" total="15.6 GiB" available="14.5 GiB"
time=2025-03-12T14:23:29.240Z level=INFO source=server.go:105 msg="system memory" total="62.4 GiB" free="54.8 GiB" free_swap="0 B"
time=2025-03-12T14:23:29.240Z level=INFO source=server.go:138 msg=offload library=cuda layers.requested=-1 layers.model=49 layers.offload=33 layers.split="" memory.available="[14.5 GiB]" memory.gpu_overhead="0 B" memory.required.full="20.8 GiB" memory.required.partial="14.3 GiB" memory.required.kv="11.7 GiB" memory.required.allocations="[14.3 GiB]" memory.weights.total="17.7 GiB" memory.weights.repeating="17.0 GiB" memory.weights.nonrepeating="787.5 MiB" memory.graph.full="1.1 GiB" memory.graph.partial="1.3 GiB"
time=2025-03-12T14:23:29.288Z level=WARN source=ggml.go:149 msg="key not found" key=tokenizer.ggml.pretokenizer default="(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\\r\\n\\p{L}\\p{N}]?\\p{L}+|\\p{N}{1,3}| ?[^\\s\\p{L}\\p{N}]+[\\r\\n]*|\\s*[\\r\\n]+|\\s+(?!\\S)|\\s+"
time=2025-03-12T14:23:29.289Z level=WARN source=ggml.go:149 msg="key not found" key=tokenizer.ggml.add_eot_token default=false
time=2025-03-12T14:23:29.291Z level=WARN source=ggml.go:149 msg="key not found" key=tokenizer.ggml.pretokenizer default="(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\\r\\n\\p{L}\\p{N}]?\\p{L}+|\\p{N}{1,3}| ?[^\\s\\p{L}\\p{N}]+[\\r\\n]*|\\s*[\\r\\n]+|\\s+(?!\\S)|\\s+"
time=2025-03-12T14:23:29.294Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.attention.layer_norm_rms_epsilon default=9.999999974752427e-07
time=2025-03-12T14:23:29.294Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.rope.local.freq_base default=10000
time=2025-03-12T14:23:29.294Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06
time=2025-03-12T14:23:29.294Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.rope.freq_scale default=1
time=2025-03-12T14:23:29.294Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.final_logit_softcapping default=30
time=2025-03-12T14:23:29.294Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.mm_tokens_per_image default=256
time=2025-03-12T14:23:29.294Z level=INFO source=server.go:405 msg="starting llama server" cmd="/usr/local/bin/ollama runner --ollama-engine --model /home/arch/Desktop/AIstuff/llms/models/ollama/blobs/sha256-adca500fad9b54c565ae672184e0c9eb690eb6014ba63f8ec13849d4f73a32d3 --ctx-size 32000 --batch-size 512 --n-gpu-layers 33 --threads 6 --parallel 1 --port 36283"
time=2025-03-12T14:23:29.294Z level=INFO source=sched.go:450 msg="loaded runners" count=1
time=2025-03-12T14:23:29.294Z level=INFO source=server.go:585 msg="waiting for llama runner to start responding"
time=2025-03-12T14:23:29.295Z level=INFO source=server.go:619 msg="waiting for server to become available" status="llm server error"
time=2025-03-12T14:23:29.301Z level=INFO source=runner.go:882 msg="starting ollama engine"
time=2025-03-12T14:23:29.301Z level=INFO source=runner.go:938 msg="Server listening on 127.0.0.1:36283"
time=2025-03-12T14:23:29.349Z level=WARN source=ggml.go:149 msg="key not found" key=general.name default=""
time=2025-03-12T14:23:29.349Z level=WARN source=ggml.go:149 msg="key not found" key=general.description default=""
time=2025-03-12T14:23:29.349Z level=INFO source=ggml.go:67 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=1065 num_key_values=36
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 4090 Laptop GPU, compute capability 8.9, VMM: yes
load_backend: loaded CUDA backend from /usr/local/lib/ollama/cuda_v12/libggml-cuda.so
load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-alderlake.so
time=2025-03-12T14:23:29.386Z level=INFO source=ggml.go:109 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX_VNNI=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=500,600,610,700,750,800,860,870,890,900,1200 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc)
time=2025-03-12T14:23:29.493Z level=INFO source=ggml.go:289 msg="model weights" buffer=CUDA0 size="4.1 GiB"
time=2025-03-12T14:23:29.493Z level=INFO source=ggml.go:289 msg="model weights" buffer=CPU size="4.2 GiB"
time=2025-03-12T14:23:29.549Z level=INFO source=server.go:619 msg="waiting for server to become available" status="llm server loading model"
time=2025-03-12T14:23:30.436Z level=INFO source=ggml.go:356 msg="compute graph" backend=CUDA0 buffer_type=CUDA0
time=2025-03-12T14:23:30.436Z level=INFO source=ggml.go:356 msg="compute graph" backend=CPU buffer_type=CUDA_Host
time=2025-03-12T14:23:30.436Z level=WARN source=ggml.go:149 msg="key not found" key=tokenizer.ggml.pretokenizer default="(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\\r\\n\\p{L}\\p{N}]?\\p{L}+|\\p{N}{1,3}| ?[^\\s\\p{L}\\p{N}]+[\\r\\n]*|\\s*[\\r\\n]+|\\s+(?!\\S)|\\s+"
time=2025-03-12T14:23:30.437Z level=WARN source=ggml.go:149 msg="key not found" key=tokenizer.ggml.add_eot_token default=false
time=2025-03-12T14:23:30.439Z level=WARN source=ggml.go:149 msg="key not found" key=tokenizer.ggml.pretokenizer default="(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\\r\\n\\p{L}\\p{N}]?\\p{L}+|\\p{N}{1,3}| ?[^\\s\\p{L}\\p{N}]+[\\r\\n]*|\\s*[\\r\\n]+|\\s+(?!\\S)|\\s+"
time=2025-03-12T14:23:30.441Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.attention.layer_norm_rms_epsilon default=9.999999974752427e-07
time=2025-03-12T14:23:30.441Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.rope.local.freq_base default=10000
time=2025-03-12T14:23:30.441Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06
time=2025-03-12T14:23:30.441Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.rope.freq_scale default=1
time=2025-03-12T14:23:30.441Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.final_logit_softcapping default=30
time=2025-03-12T14:23:30.441Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.mm_tokens_per_image default=256
time=2025-03-12T14:23:30.555Z level=INFO source=server.go:624 msg="llama runner started in 1.26 seconds"
ggml_backend_cuda_buffer_type_alloc_buffer: allocating 15554.32 MiB on device 0: cudaMalloc failed: out of memory
ggml_gallocr_reserve_n: failed to allocate CUDA0 buffer of size 16309888000
SIGSEGV: segmentation violation
PC=0x6415ab24ffd0 m=164 sigcode=1 addr=0x58
signal arrived during cgo execution

goroutine 10 gp=0xc000285dc0 m=164 mp=0xc00350e808 [syscall]:
runtime.cgocall(0x6415ab2a3fe0, 0xc0002dbb00)
	runtime/cgocall.go:167 +0x4b fp=0xc0002dbad8 sp=0xc0002dbaa0 pc=0x6415aa47260b
github.com/ollama/ollama/ml/backend/ggml._Cfunc_ggml_backend_sched_graph_compute_async(0x74fbe801a170, 0x7500404de2e0)
	_cgo_gotypes.go:485 +0x4a fp=0xc0002dbb00 sp=0xc0002dbad8 pc=0x6415aa85d1aa
github.com/ollama/ollama/ml/backend/ggml.Context.Compute.func1(...)
	github.com/ollama/ollama/ml/backend/ggml/ggml.go:497
github.com/ollama/ollama/ml/backend/ggml.Context.Compute({0xc00059e040, 0x74fbb8002fd0, 0x7500404de2e0, 0x0, 0x2000}, {0xc06ffa41a0, 0x1, 0x6415ab75f210?})
	github.com/ollama/ollama/ml/backend/ggml/ggml.go:497 +0xbd fp=0xc0002dbb90 sp=0xc0002dbb00 pc=0x6415aa865a7d
github.com/ollama/ollama/ml/backend/ggml.(*Context).Compute(0xc06fe4a2d0?, {0xc06ffa41a0?, 0x200?, 0x0?})
	<autogenerated>:1 +0x72 fp=0xc0002dbc08 sp=0xc0002dbb90 pc=0x6415aa86b4f2
github.com/ollama/ollama/model.Forward({0x6415ab756c00, 0xc06fe4a2d0}, {0x6415ab74e2b0, 0xc00034e000}, {{0xc00440f800, 0x200, 0x200}, {0x0, 0x0, 0x0}, ...})
	github.com/ollama/ollama/model/model.go:303 +0x218 fp=0xc0002dbcf0 sp=0xc0002dbc08 pc=0x6415aa892718
github.com/ollama/ollama/runner/ollamarunner.(*Server).processBatch(0xc0005c19e0)
	github.com/ollama/ollama/runner/ollamarunner/runner.go:395 +0x3bb fp=0xc0002dbf98 sp=0xc0002dbcf0 pc=0x6415aa8fedfb
github.com/ollama/ollama/runner/ollamarunner.(*Server).run(0xc0005c19e0, {0x6415ab74f5e0, 0xc000524370})
	github.com/ollama/ollama/runner/ollamarunner/runner.go:321 +0x4e fp=0xc0002dbfb8 sp=0xc0002dbf98 pc=0x6415aa8fe9ee
github.com/ollama/ollama/runner/ollamarunner.Execute.gowrap2()
	github.com/ollama/ollama/runner/ollamarunner/runner.go:919 +0x28 fp=0xc0002dbfe0 sp=0xc0002dbfb8 pc=0x6415aa903668
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc0002dbfe8 sp=0xc0002dbfe0 pc=0x6415aa47d021
created by github.com/ollama/ollama/runner/ollamarunner.Execute in goroutine 1
	github.com/ollama/ollama/runner/ollamarunner/runner.go:919 +0xa9c

goroutine 1 gp=0xc000002380 m=nil [IO wait]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:435 +0xce fp=0xc0002d7648 sp=0xc0002d7628 pc=0x6415aa4758ee
runtime.netpollblock(0xc0002d7698?, 0xaa40f226?, 0x15?)
	runtime/netpoll.go:575 +0xf7 fp=0xc0002d7680 sp=0xc0002d7648 pc=0x6415aa43a6f7
internal/poll.runtime_pollWait(0x75018525feb0, 0x72)
	runtime/netpoll.go:351 +0x85 fp=0xc0002d76a0 sp=0xc0002d7680 pc=0x6415aa474b05
internal/poll.(*pollDesc).wait(0xc0004a3e00?, 0x0?, 0x0)
	internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc0002d76c8 sp=0xc0002d76a0 pc=0x6415aa4fbf87
internal/poll.(*pollDesc).waitRead(...)
	internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Accept(0xc0004a3e00)
	internal/poll/fd_unix.go:620 +0x295 fp=0xc0002d7770 sp=0xc0002d76c8 pc=0x6415aa501355
net.(*netFD).accept(0xc0004a3e00)
	net/fd_unix.go:172 +0x29 fp=0xc0002d7828 sp=0xc0002d7770 pc=0x6415aa574169
net.(*TCPListener).accept(0xc000520800)
	net/tcpsock_posix.go:159 +0x1b fp=0xc0002d7878 sp=0xc0002d7828 pc=0x6415aa589b1b
net.(*TCPListener).Accept(0xc000520800)
	net/tcpsock.go:380 +0x30 fp=0xc0002d78a8 sp=0xc0002d7878 pc=0x6415aa5889d0
net/http.(*onceCloseListener).Accept(0xc0004c0120?)
	<autogenerated>:1 +0x24 fp=0xc0002d78c0 sp=0xc0002d78a8 pc=0x6415aa79fb44
net/http.(*Server).Serve(0xc000587100, {0x6415ab74d318, 0xc000520800})
	net/http/server.go:3424 +0x30c fp=0xc0002d79f0 sp=0xc0002d78c0 pc=0x6415aa77740c
github.com/ollama/ollama/runner/ollamarunner.Execute({0xc000034150, 0xe, 0xf})
	github.com/ollama/ollama/runner/ollamarunner/runner.go:939 +0xe6a fp=0xc0002d7d08 sp=0xc0002d79f0 pc=0x6415aa9033aa
github.com/ollama/ollama/runner.Execute({0xc000034130?, 0x0?, 0x0?})
	github.com/ollama/ollama/runner/runner.go:20 +0xc9 fp=0xc0002d7d30 sp=0xc0002d7d08 pc=0x6415aa903f09
github.com/ollama/ollama/cmd.NewCLI.func2(0xc000586f00?, {0x6415ab2bf054?, 0x4?, 0x6415ab2bf058?})
	github.com/ollama/ollama/cmd/cmd.go:1285 +0x45 fp=0xc0002d7d58 sp=0xc0002d7d30 pc=0x6415ab0746a5
github.com/spf13/cobra.(*Command).execute(0xc0004c2f08, {0xc0005d8c30, 0xf, 0xf})
	github.com/spf13/cobra@v1.7.0/command.go:940 +0x85c fp=0xc0002d7e78 sp=0xc0002d7d58 pc=0x6415aa5ed2fc
github.com/spf13/cobra.(*Command).ExecuteC(0xc00049af08)
	github.com/spf13/cobra@v1.7.0/command.go:1068 +0x3a5 fp=0xc0002d7f30 sp=0xc0002d7e78 pc=0x6415aa5edb45
github.com/spf13/cobra.(*Command).Execute(...)
	github.com/spf13/cobra@v1.7.0/command.go:992
github.com/spf13/cobra.(*Command).ExecuteContext(...)
	github.com/spf13/cobra@v1.7.0/command.go:985
main.main()
	github.com/ollama/ollama/main.go:12 +0x4d fp=0xc0002d7f50 sp=0xc0002d7f30 pc=0x6415ab074a0d
runtime.main()
	runtime/proc.go:283 +0x29d fp=0xc0002d7fe0 sp=0xc0002d7f50 pc=0x6415aa441cfd
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc0002d7fe8 sp=0xc0002d7fe0 pc=0x6415aa47d021

goroutine 2 gp=0xc000002e00 m=nil [force gc (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:435 +0xce fp=0xc000090fa8 sp=0xc000090f88 pc=0x6415aa4758ee
runtime.goparkunlock(...)
	runtime/proc.go:441
runtime.forcegchelper()
	runtime/proc.go:348 +0xb8 fp=0xc000090fe0 sp=0xc000090fa8 pc=0x6415aa442038
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc000090fe8 sp=0xc000090fe0 pc=0x6415aa47d021
created by runtime.init.7 in goroutine 1
	runtime/proc.go:336 +0x1a

goroutine 3 gp=0xc000003340 m=nil [GC sweep wait]:
runtime.gopark(0x1?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:435 +0xce fp=0xc000091780 sp=0xc000091760 pc=0x6415aa4758ee
runtime.goparkunlock(...)
	runtime/proc.go:441
runtime.bgsweep(0xc000042100)
	runtime/mgcsweep.go:316 +0xdf fp=0xc0000917c8 sp=0xc000091780 pc=0x6415aa42c85f
runtime.gcenable.gowrap1()
	runtime/mgc.go:204 +0x25 fp=0xc0000917e0 sp=0xc0000917c8 pc=0x6415aa420c45
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc0000917e8 sp=0xc0000917e0 pc=0x6415aa47d021
created by runtime.gcenable in goroutine 1
	runtime/mgc.go:204 +0x66

goroutine 4 gp=0xc000003500 m=nil [GC scavenge wait]:
runtime.gopark(0x10000?, 0x6415ab475a00?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:435 +0xce fp=0xc000091f78 sp=0xc000091f58 pc=0x6415aa4758ee
runtime.goparkunlock(...)
	runtime/proc.go:441
runtime.(*scavengerState).park(0x6415abfb3b40)
	runtime/mgcscavenge.go:425 +0x49 fp=0xc000091fa8 sp=0xc000091f78 pc=0x6415aa42a2a9
runtime.bgscavenge(0xc000042100)
	runtime/mgcscavenge.go:658 +0x59 fp=0xc000091fc8 sp=0xc000091fa8 pc=0x6415aa42a839
runtime.gcenable.gowrap2()
	runtime/mgc.go:205 +0x25 fp=0xc000091fe0 sp=0xc000091fc8 pc=0x6415aa420be5
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc000091fe8 sp=0xc000091fe0 pc=0x6415aa47d021
created by runtime.gcenable in goroutine 1
	runtime/mgc.go:205 +0xa5

goroutine 5 gp=0xc000003dc0 m=nil [finalizer wait]:
runtime.gopark(0x1b8?, 0xc000002380?, 0x1?, 0x23?, 0xc000090688?)
	runtime/proc.go:435 +0xce fp=0xc000090630 sp=0xc000090610 pc=0x6415aa4758ee
runtime.runfinq()
	runtime/mfinal.go:196 +0x107 fp=0xc0000907e0 sp=0xc000090630 pc=0x6415aa41fc07
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc0000907e8 sp=0xc0000907e0 pc=0x6415aa47d021
created by runtime.createfing in goroutine 1
	runtime/mfinal.go:166 +0x3d

goroutine 6 gp=0xc00016e8c0 m=nil [chan receive]:
runtime.gopark(0xc0001c74a0?, 0xc0700c0018?, 0x60?, 0x27?, 0x6415aa55aea8?)
	runtime/proc.go:435 +0xce fp=0xc000092718 sp=0xc0000926f8 pc=0x6415aa4758ee
runtime.chanrecv(0xc0000c8380, 0x0, 0x1)
	runtime/chan.go:664 +0x445 fp=0xc000092790 sp=0xc000092718 pc=0x6415aa411e05
runtime.chanrecv1(0x0?, 0x0?)
	runtime/chan.go:506 +0x12 fp=0xc0000927b8 sp=0xc000092790 pc=0x6415aa411992
runtime.unique_runtime_registerUniqueMapCleanup.func2(...)
	runtime/mgc.go:1796
runtime.unique_runtime_registerUniqueMapCleanup.gowrap1()
	runtime/mgc.go:1799 +0x2f fp=0xc0000927e0 sp=0xc0000927b8 pc=0x6415aa423def
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc0000927e8 sp=0xc0000927e0 pc=0x6415aa47d021
created by unique.runtime_registerUniqueMapCleanup in goroutine 1
	runtime/mgc.go:1794 +0x85

goroutine 7 gp=0xc00016ee00 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:435 +0xce fp=0xc000092f38 sp=0xc000092f18 pc=0x6415aa4758ee
runtime.gcBgMarkWorker(0xc0000c97a0)
	runtime/mgc.go:1423 +0xe9 fp=0xc000092fc8 sp=0xc000092f38 pc=0x6415aa423109
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1339 +0x25 fp=0xc000092fe0 sp=0xc000092fc8 pc=0x6415aa422fe5
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc000092fe8 sp=0xc000092fe0 pc=0x6415aa47d021
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1339 +0x105

goroutine 18 gp=0xc000502380 m=nil [GC worker (idle)]:
runtime.gopark(0x3fbf24fd87c1a?, 0x1?, 0x3?, 0x8f?, 0x0?)
	runtime/proc.go:435 +0xce fp=0xc00008c738 sp=0xc00008c718 pc=0x6415aa4758ee
runtime.gcBgMarkWorker(0xc0000c97a0)
	runtime/mgc.go:1423 +0xe9 fp=0xc00008c7c8 sp=0xc00008c738 pc=0x6415aa423109
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1339 +0x25 fp=0xc00008c7e0 sp=0xc00008c7c8 pc=0x6415aa422fe5
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc00008c7e8 sp=0xc00008c7e0 pc=0x6415aa47d021
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1339 +0x105

goroutine 19 gp=0xc000502540 m=nil [GC worker (idle)]:
runtime.gopark(0x3fbf245d1a7cd?, 0x3?, 0x59?, 0xef?, 0x0?)
	runtime/proc.go:435 +0xce fp=0xc00008cf38 sp=0xc00008cf18 pc=0x6415aa4758ee
runtime.gcBgMarkWorker(0xc0000c97a0)
	runtime/mgc.go:1423 +0xe9 fp=0xc00008cfc8 sp=0xc00008cf38 pc=0x6415aa423109
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1339 +0x25 fp=0xc00008cfe0 sp=0xc00008cfc8 pc=0x6415aa422fe5
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc00008cfe8 sp=0xc00008cfe0 pc=0x6415aa47d021
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1339 +0x105

goroutine 20 gp=0xc000502700 m=nil [GC worker (idle)]:
runtime.gopark(0x6415ac0622c0?, 0x3?, 0x92?, 0x4e?, 0x0?)
	runtime/proc.go:435 +0xce fp=0xc00008d738 sp=0xc00008d718 pc=0x6415aa4758ee
runtime.gcBgMarkWorker(0xc0000c97a0)
	runtime/mgc.go:1423 +0xe9 fp=0xc00008d7c8 sp=0xc00008d738 pc=0x6415aa423109
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1339 +0x25 fp=0xc00008d7e0 sp=0xc00008d7c8 pc=0x6415aa422fe5
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc00008d7e8 sp=0xc00008d7e0 pc=0x6415aa47d021
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1339 +0x105

goroutine 34 gp=0xc000284380 m=nil [GC worker (idle)]:
runtime.gopark(0x3fbf245d1aed8?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:435 +0xce fp=0xc00029a738 sp=0xc00029a718 pc=0x6415aa4758ee
runtime.gcBgMarkWorker(0xc0000c97a0)
	runtime/mgc.go:1423 +0xe9 fp=0xc00029a7c8 sp=0xc00029a738 pc=0x6415aa423109
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1339 +0x25 fp=0xc00029a7e0 sp=0xc00029a7c8 pc=0x6415aa422fe5
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc00029a7e8 sp=0xc00029a7e0 pc=0x6415aa47d021
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1339 +0x105

goroutine 35 gp=0xc000284540 m=nil [GC worker (idle)]:
runtime.gopark(0x3fbf245d1c08e?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:435 +0xce fp=0xc00029af38 sp=0xc00029af18 pc=0x6415aa4758ee
runtime.gcBgMarkWorker(0xc0000c97a0)
	runtime/mgc.go:1423 +0xe9 fp=0xc00029afc8 sp=0xc00029af38 pc=0x6415aa423109
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1339 +0x25 fp=0xc00029afe0 sp=0xc00029afc8 pc=0x6415aa422fe5
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc00029afe8 sp=0xc00029afe0 pc=0x6415aa47d021
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1339 +0x105

goroutine 50 gp=0xc000584000 m=nil [GC worker (idle)]:
runtime.gopark(0x3fbf25be6146f?, 0x1?, 0x6c?, 0xfd?, 0x0?)
	runtime/proc.go:435 +0xce fp=0xc000296738 sp=0xc000296718 pc=0x6415aa4758ee
runtime.gcBgMarkWorker(0xc0000c97a0)
	runtime/mgc.go:1423 +0xe9 fp=0xc0002967c8 sp=0xc000296738 pc=0x6415aa423109
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1339 +0x25 fp=0xc0002967e0 sp=0xc0002967c8 pc=0x6415aa422fe5
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc0002967e8 sp=0xc0002967e0 pc=0x6415aa47d021
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1339 +0x105

goroutine 51 gp=0xc0005841c0 m=nil [GC worker (idle)]:
runtime.gopark(0x3fbf245d1c14c?, 0x3?, 0x82?, 0xc?, 0x0?)
	runtime/proc.go:435 +0xce fp=0xc000296f38 sp=0xc000296f18 pc=0x6415aa4758ee
runtime.gcBgMarkWorker(0xc0000c97a0)
	runtime/mgc.go:1423 +0xe9 fp=0xc000296fc8 sp=0xc000296f38 pc=0x6415aa423109
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1339 +0x25 fp=0xc000296fe0 sp=0xc000296fc8 pc=0x6415aa422fe5
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc000296fe8 sp=0xc000296fe0 pc=0x6415aa47d021
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1339 +0x105

goroutine 52 gp=0xc000584380 m=nil [GC worker (idle)]:
runtime.gopark(0x3fbf24f7c0fdc?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:435 +0xce fp=0xc000297738 sp=0xc000297718 pc=0x6415aa4758ee
runtime.gcBgMarkWorker(0xc0000c97a0)
	runtime/mgc.go:1423 +0xe9 fp=0xc0002977c8 sp=0xc000297738 pc=0x6415aa423109
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1339 +0x25 fp=0xc0002977e0 sp=0xc0002977c8 pc=0x6415aa422fe5
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc0002977e8 sp=0xc0002977e0 pc=0x6415aa47d021
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1339 +0x105

goroutine 53 gp=0xc000584540 m=nil [GC worker (idle)]:
runtime.gopark(0x6415ac0622c0?, 0x1?, 0x58?, 0xd7?, 0x0?)
	runtime/proc.go:435 +0xce fp=0xc000297f38 sp=0xc000297f18 pc=0x6415aa4758ee
runtime.gcBgMarkWorker(0xc0000c97a0)
	runtime/mgc.go:1423 +0xe9 fp=0xc000297fc8 sp=0xc000297f38 pc=0x6415aa423109
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1339 +0x25 fp=0xc000297fe0 sp=0xc000297fc8 pc=0x6415aa422fe5
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc000297fe8 sp=0xc000297fe0 pc=0x6415aa47d021
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1339 +0x105

goroutine 54 gp=0xc000584700 m=nil [GC worker (idle)]:
runtime.gopark(0x6415ac0622c0?, 0x1?, 0xe1?, 0xfc?, 0x0?)
	runtime/proc.go:435 +0xce fp=0xc000298738 sp=0xc000298718 pc=0x6415aa4758ee
runtime.gcBgMarkWorker(0xc0000c97a0)
	runtime/mgc.go:1423 +0xe9 fp=0xc0002987c8 sp=0xc000298738 pc=0x6415aa423109
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1339 +0x25 fp=0xc0002987e0 sp=0xc0002987c8 pc=0x6415aa422fe5
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc0002987e8 sp=0xc0002987e0 pc=0x6415aa47d021
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1339 +0x105

goroutine 55 gp=0xc0005848c0 m=nil [GC worker (idle)]:
runtime.gopark(0x3fbf25be81a73?, 0x1?, 0x3?, 0x25?, 0x0?)
	runtime/proc.go:435 +0xce fp=0xc000298f38 sp=0xc000298f18 pc=0x6415aa4758ee
runtime.gcBgMarkWorker(0xc0000c97a0)
	runtime/mgc.go:1423 +0xe9 fp=0xc000298fc8 sp=0xc000298f38 pc=0x6415aa423109
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1339 +0x25 fp=0xc000298fe0 sp=0xc000298fc8 pc=0x6415aa422fe5
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc000298fe8 sp=0xc000298fe0 pc=0x6415aa47d021
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1339 +0x105

goroutine 56 gp=0xc000584a80 m=nil [GC worker (idle)]:
runtime.gopark(0x6415ab73b068?, 0xc0005800e0?, 0x1b?, 0xa?, 0x0?)
	runtime/proc.go:435 +0xce fp=0xc000299738 sp=0xc000299718 pc=0x6415aa4758ee
runtime.gcBgMarkWorker(0xc0000c97a0)
	runtime/mgc.go:1423 +0xe9 fp=0xc0002997c8 sp=0xc000299738 pc=0x6415aa423109
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1339 +0x25 fp=0xc0002997e0 sp=0xc0002997c8 pc=0x6415aa422fe5
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc0002997e8 sp=0xc0002997e0 pc=0x6415aa47d021
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1339 +0x105

goroutine 57 gp=0xc000584c40 m=nil [GC worker (idle)]:
runtime.gopark(0x3fbf245d1b123?, 0x1?, 0xc1?, 0x29?, 0x0?)
	runtime/proc.go:435 +0xce fp=0xc000299f38 sp=0xc000299f18 pc=0x6415aa4758ee
runtime.gcBgMarkWorker(0xc0000c97a0)
	runtime/mgc.go:1423 +0xe9 fp=0xc000299fc8 sp=0xc000299f38 pc=0x6415aa423109
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1339 +0x25 fp=0xc000299fe0 sp=0xc000299fc8 pc=0x6415aa422fe5
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc000299fe8 sp=0xc000299fe0 pc=0x6415aa47d021
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1339 +0x105

goroutine 58 gp=0xc000584e00 m=nil [GC worker (idle)]:
runtime.gopark(0x3fbf245d1a8e0?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:435 +0xce fp=0xc00058c738 sp=0xc00058c718 pc=0x6415aa4758ee
runtime.gcBgMarkWorker(0xc0000c97a0)
	runtime/mgc.go:1423 +0xe9 fp=0xc00058c7c8 sp=0xc00058c738 pc=0x6415aa423109
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1339 +0x25 fp=0xc00058c7e0 sp=0xc00058c7c8 pc=0x6415aa422fe5
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc00058c7e8 sp=0xc00058c7e0 pc=0x6415aa47d021
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1339 +0x105

goroutine 8 gp=0xc00016efc0 m=nil [GC worker (idle)]:
runtime.gopark(0x3fbf245d1ba55?, 0x3?, 0xee?, 0x4?, 0x0?)
	runtime/proc.go:435 +0xce fp=0xc000093738 sp=0xc000093718 pc=0x6415aa4758ee
runtime.gcBgMarkWorker(0xc0000c97a0)
	runtime/mgc.go:1423 +0xe9 fp=0xc0000937c8 sp=0xc000093738 pc=0x6415aa423109
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1339 +0x25 fp=0xc0000937e0 sp=0xc0000937c8 pc=0x6415aa422fe5
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc0000937e8 sp=0xc0000937e0 pc=0x6415aa47d021
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1339 +0x105

goroutine 36 gp=0xc000284700 m=nil [GC worker (idle)]:
runtime.gopark(0x3fbf24f7d65af?, 0x1?, 0x14?, 0x74?, 0x0?)
	runtime/proc.go:435 +0xce fp=0xc00029b738 sp=0xc00029b718 pc=0x6415aa4758ee
runtime.gcBgMarkWorker(0xc0000c97a0)
	runtime/mgc.go:1423 +0xe9 fp=0xc00029b7c8 sp=0xc00029b738 pc=0x6415aa423109
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1339 +0x25 fp=0xc00029b7e0 sp=0xc00029b7c8 pc=0x6415aa422fe5
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc00029b7e8 sp=0xc00029b7e0 pc=0x6415aa47d021
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1339 +0x105

goroutine 37 gp=0xc0002848c0 m=nil [GC worker (idle)]:
runtime.gopark(0x3fbf245d1c19d?, 0x1?, 0x89?, 0x4c?, 0x0?)
	runtime/proc.go:435 +0xce fp=0xc00029bf38 sp=0xc00029bf18 pc=0x6415aa4758ee
runtime.gcBgMarkWorker(0xc0000c97a0)
	runtime/mgc.go:1423 +0xe9 fp=0xc00029bfc8 sp=0xc00029bf38 pc=0x6415aa423109
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1339 +0x25 fp=0xc00029bfe0 sp=0xc00029bfc8 pc=0x6415aa422fe5
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc00029bfe8 sp=0xc00029bfe0 pc=0x6415aa47d021
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1339 +0x105

goroutine 38 gp=0xc000284a80 m=nil [GC worker (idle)]:
runtime.gopark(0x3fbf245d1ae3a?, 0x1?, 0xe5?, 0x1a?, 0x0?)
	runtime/proc.go:435 +0xce fp=0xc00029c738 sp=0xc00029c718 pc=0x6415aa4758ee
runtime.gcBgMarkWorker(0xc0000c97a0)
	runtime/mgc.go:1423 +0xe9 fp=0xc00029c7c8 sp=0xc00029c738 pc=0x6415aa423109
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1339 +0x25 fp=0xc00029c7e0 sp=0xc00029c7c8 pc=0x6415aa422fe5
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc00029c7e8 sp=0xc00029c7e0 pc=0x6415aa47d021
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1339 +0x105

goroutine 39 gp=0xc000284c40 m=nil [GC worker (idle)]:
runtime.gopark(0x3fbf24f841efe?, 0xc0002920c0?, 0x1b?, 0xa?, 0x0?)
	runtime/proc.go:435 +0xce fp=0xc00029cf38 sp=0xc00029cf18 pc=0x6415aa4758ee
runtime.gcBgMarkWorker(0xc0000c97a0)
	runtime/mgc.go:1423 +0xe9 fp=0xc00029cfc8 sp=0xc00029cf38 pc=0x6415aa423109
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1339 +0x25 fp=0xc00029cfe0 sp=0xc00029cfc8 pc=0x6415aa422fe5
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc00029cfe8 sp=0xc00029cfe0 pc=0x6415aa47d021
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1339 +0x105

goroutine 940 gp=0xc003232380 m=nil [IO wait]:
runtime.gopark(0x6415aa41a046?, 0xc0041bc808?, 0x0?, 0x0?, 0xb?)
	runtime/proc.go:435 +0xce fp=0xc00321a5d8 sp=0xc00321a5b8 pc=0x6415aa4758ee
runtime.netpollblock(0x6415aa498d78?, 0xaa40f226?, 0x15?)
	runtime/netpoll.go:575 +0xf7 fp=0xc00321a610 sp=0xc00321a5d8 pc=0x6415aa43a6f7
internal/poll.runtime_pollWait(0x75018525fd98, 0x72)
	runtime/netpoll.go:351 +0x85 fp=0xc00321a630 sp=0xc00321a610 pc=0x6415aa474b05
internal/poll.(*pollDesc).wait(0xc0004a2000?, 0xc06908e4f1?, 0x0)
	internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc00321a658 sp=0xc00321a630 pc=0x6415aa4fbf87
internal/poll.(*pollDesc).waitRead(...)
	internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Read(0xc0004a2000, {0xc06908e4f1, 0x1, 0x1})
	internal/poll/fd_unix.go:165 +0x27a fp=0xc00321a6f0 sp=0xc00321a658 pc=0x6415aa4fd27a
net.(*netFD).Read(0xc0004a2000, {0xc06908e4f1?, 0xc003002058?, 0xc00321a770?})
	net/fd_posix.go:55 +0x25 fp=0xc00321a738 sp=0xc00321a6f0 pc=0x6415aa5721c5
net.(*conn).Read(0xc000094090, {0xc06908e4f1?, 0xc003003880?, 0x6415aa85be40?})
	net/net.go:194 +0x45 fp=0xc00321a780 sp=0xc00321a738 pc=0x6415aa580585
net/http.(*connReader).backgroundRead(0xc06908e4e0)
	net/http/server.go:690 +0x37 fp=0xc00321a7c8 sp=0xc00321a780 pc=0x6415aa76be17
net/http.(*connReader).startBackgroundRead.gowrap2()
	net/http/server.go:686 +0x25 fp=0xc00321a7e0 sp=0xc00321a7c8 pc=0x6415aa76bd45
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc00321a7e8 sp=0xc00321a7e0 pc=0x6415aa47d021
created by net/http.(*connReader).startBackgroundRead in goroutine 982
	net/http/server.go:686 +0xb6

goroutine 982 gp=0xc06e978540 m=nil [select]:
runtime.gopark(0xc00442da68?, 0x2?, 0x0?, 0x65?, 0xc00442d80c?)
	runtime/proc.go:435 +0xce fp=0xc00442d620 sp=0xc00442d600 pc=0x6415aa4758ee
runtime.selectgo(0xc00442da68, 0xc00442d808, 0x821?, 0x0, 0x1?, 0x1)
	runtime/select.go:351 +0x837 fp=0xc00442d758 sp=0xc00442d620 pc=0x6415aa4541f7
github.com/ollama/ollama/runner/ollamarunner.(*Server).completion(0xc0005c19e0, {0x6415ab74d4f8, 0xc00440c0e0}, 0xc003564140)
	github.com/ollama/ollama/runner/ollamarunner/runner.go:649 +0xad0 fp=0xc00442dac0 sp=0xc00442d758 pc=0x6415aa900e70
github.com/ollama/ollama/runner/ollamarunner.(*Server).completion-fm({0x6415ab74d4f8?, 0xc00440c0e0?}, 0xc0002dbb40?)
	<autogenerated>:1 +0x36 fp=0xc00442daf0 sp=0xc00442dac0 pc=0x6415aa903a36
net/http.HandlerFunc.ServeHTTP(0xc000533140?, {0x6415ab74d4f8?, 0xc00440c0e0?}, 0xc0002dbb60?)
	net/http/server.go:2294 +0x29 fp=0xc00442db18 sp=0xc00442daf0 pc=0x6415aa773a49
net/http.(*ServeMux).ServeHTTP(0x6415aa41a125?, {0x6415ab74d4f8, 0xc00440c0e0}, 0xc003564140)
	net/http/server.go:2822 +0x1c4 fp=0xc00442db68 sp=0xc00442db18 pc=0x6415aa775944
net/http.serverHandler.ServeHTTP({0x6415ab749b10?}, {0x6415ab74d4f8?, 0xc00440c0e0?}, 0x1?)
	net/http/server.go:3301 +0x8e fp=0xc00442db98 sp=0xc00442db68 pc=0x6415aa7933ce
net/http.(*conn).serve(0xc0004c0120, {0x6415ab74f5a8, 0xc0004a7bc0})
	net/http/server.go:2102 +0x625 fp=0xc00442dfb8 sp=0xc00442db98 pc=0x6415aa771f45
net/http.(*Server).Serve.gowrap3()
	net/http/server.go:3454 +0x28 fp=0xc00442dfe0 sp=0xc00442dfb8 pc=0x6415aa777808
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc00442dfe8 sp=0xc00442dfe0 pc=0x6415aa47d021
created by net/http.(*Server).Serve in goroutine 1
	net/http/server.go:3454 +0x485

rax    0x74fbe8354380
rbx    0x74fbe83542f0
rcx    0x2
rdx    0x74fbb81c0f00
rdi    0x0
rsi    0x74fb9eb33030
rbp    0x74fbb81c0ef0
rsp    0x74fbadffac48
r8     0x4
r9     0xc000094048
r10    0x1
r11    0x206
r12    0x0
r13    0x74fbe801a2c8
r14    0xa52
r15    0x74fbe83542f0
rip    0x6415ab24ffd0
rflags 0x10206
cs     0x33
fs     0x0
gs     0x0
[GIN] 2025/03/12 - 14:23:31 | 200 |  2.505928799s |      172.17.0.2 | POST     "/api/chat"
time=2025-03-12T14:23:31.575Z level=ERROR source=server.go:449 msg="llama runner terminated" error="exit status 2"
time=2025-03-12T14:23:36.546Z level=WARN source=sched.go:647 msg="gpu VRAM usage didn't recover within timeout" seconds=5.191615043 model=/home/arch/Desktop/AIstuff/llms/models/ollama/blobs/sha256-adca500fad9b54c565ae672184e0c9eb690eb6014ba63f8ec13849d4f73a32d3
time=2025-03-12T14:23:36.796Z level=WARN source=sched.go:647 msg="gpu VRAM usage didn't recover within timeout" seconds=5.440854627 model=/home/arch/Desktop/AIstuff/llms/models/ollama/blobs/sha256-adca500fad9b54c565ae672184e0c9eb690eb6014ba63f8ec13849d4f73a32d3
time=2025-03-12T14:23:37.108Z level=WARN source=sched.go:647 msg="gpu VRAM usage didn't recover within timeout" seconds=5.753601007 model=/home/arch/Desktop/AIstuff/llms/models/ollama/blobs/sha256-adca500fad9b54c565ae672184e0c9eb690eb6014ba63f8ec13849d4f73a32d3
time=2025-03-12T14:23:37.278Z level=INFO source=server.go:105 msg="system memory" total="62.4 GiB" free="54.7 GiB" free_swap="0 B"
time=2025-03-12T14:23:37.279Z level=INFO source=server.go:138 msg=offload library=cuda layers.requested=-1 layers.model=49 layers.offload=33 layers.split="" memory.available="[14.5 GiB]" memory.gpu_overhead="0 B" memory.required.full="20.8 GiB" memory.required.partial="14.3 GiB" memory.required.kv="11.7 GiB" memory.required.allocations="[14.3 GiB]" memory.weights.total="17.7 GiB" memory.weights.repeating="17.0 GiB" memory.weights.nonrepeating="787.5 MiB" memory.graph.full="1.1 GiB" memory.graph.partial="1.3 GiB"
time=2025-03-12T14:23:37.325Z level=WARN source=ggml.go:149 msg="key not found" key=tokenizer.ggml.pretokenizer default="(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\\r\\n\\p{L}\\p{N}]?\\p{L}+|\\p{N}{1,3}| ?[^\\s\\p{L}\\p{N}]+[\\r\\n]*|\\s*[\\r\\n]+|\\s+(?!\\S)|\\s+"
time=2025-03-12T14:23:37.327Z level=WARN source=ggml.go:149 msg="key not found" key=tokenizer.ggml.add_eot_token default=false
time=2025-03-12T14:23:37.328Z level=WARN source=ggml.go:149 msg="key not found" key=tokenizer.ggml.pretokenizer default="(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\\r\\n\\p{L}\\p{N}]?\\p{L}+|\\p{N}{1,3}| ?[^\\s\\p{L}\\p{N}]+[\\r\\n]*|\\s*[\\r\\n]+|\\s+(?!\\S)|\\s+"
time=2025-03-12T14:23:37.331Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.attention.layer_norm_rms_epsilon default=9.999999974752427e-07
time=2025-03-12T14:23:37.331Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.rope.local.freq_base default=10000
time=2025-03-12T14:23:37.331Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06
time=2025-03-12T14:23:37.331Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.rope.freq_scale default=1
time=2025-03-12T14:23:37.331Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.final_logit_softcapping default=30
time=2025-03-12T14:23:37.331Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.mm_tokens_per_image default=256
time=2025-03-12T14:23:37.331Z level=INFO source=server.go:405 msg="starting llama server" cmd="/usr/local/bin/ollama runner --ollama-engine --model /home/arch/Desktop/AIstuff/llms/models/ollama/blobs/sha256-adca500fad9b54c565ae672184e0c9eb690eb6014ba63f8ec13849d4f73a32d3 --ctx-size 32000 --batch-size 512 --n-gpu-layers 33 --threads 6 --parallel 1 --port 43925"
time=2025-03-12T14:23:37.331Z level=INFO source=sched.go:450 msg="loaded runners" count=1
time=2025-03-12T14:23:37.331Z level=INFO source=server.go:585 msg="waiting for llama runner to start responding"
time=2025-03-12T14:23:37.332Z level=INFO source=server.go:619 msg="waiting for server to become available" status="llm server error"
time=2025-03-12T14:23:37.337Z level=INFO source=runner.go:882 msg="starting ollama engine"
time=2025-03-12T14:23:37.338Z level=INFO source=runner.go:938 msg="Server listening on 127.0.0.1:43925"
time=2025-03-12T14:23:37.383Z level=WARN source=ggml.go:149 msg="key not found" key=general.name default=""
time=2025-03-12T14:23:37.383Z level=WARN source=ggml.go:149 msg="key not found" key=general.description default=""
time=2025-03-12T14:23:37.383Z level=INFO source=ggml.go:67 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=1065 num_key_values=36
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 4090 Laptop GPU, compute capability 8.9, VMM: yes
load_backend: loaded CUDA backend from /usr/local/lib/ollama/cuda_v12/libggml-cuda.so
load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-alderlake.so
time=2025-03-12T14:23:37.420Z level=INFO source=ggml.go:109 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX_VNNI=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=500,600,610,700,750,800,860,870,890,900,1200 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc)
time=2025-03-12T14:23:37.530Z level=INFO source=ggml.go:289 msg="model weights" buffer=CUDA0 size="4.1 GiB"
time=2025-03-12T14:23:37.530Z level=INFO source=ggml.go:289 msg="model weights" buffer=CPU size="4.2 GiB"
time=2025-03-12T14:23:37.784Z level=INFO source=server.go:619 msg="waiting for server to become available" status="llm server not responding"
time=2025-03-12T14:23:38.037Z level=INFO source=server.go:619 msg="waiting for server to become available" status="llm server loading model"
time=2025-03-12T14:23:38.463Z level=INFO source=ggml.go:356 msg="compute graph" backend=CUDA0 buffer_type=CUDA0
time=2025-03-12T14:23:38.463Z level=INFO source=ggml.go:356 msg="compute graph" backend=CPU buffer_type=CUDA_Host
time=2025-03-12T14:23:38.463Z level=WARN source=ggml.go:149 msg="key not found" key=tokenizer.ggml.pretokenizer default="(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\\r\\n\\p{L}\\p{N}]?\\p{L}+|\\p{N}{1,3}| ?[^\\s\\p{L}\\p{N}]+[\\r\\n]*|\\s*[\\r\\n]+|\\s+(?!\\S)|\\s+"
time=2025-03-12T14:23:38.465Z level=WARN source=ggml.go:149 msg="key not found" key=tokenizer.ggml.add_eot_token default=false
time=2025-03-12T14:23:38.466Z level=WARN source=ggml.go:149 msg="key not found" key=tokenizer.ggml.pretokenizer default="(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\\r\\n\\p{L}\\p{N}]?\\p{L}+|\\p{N}{1,3}| ?[^\\s\\p{L}\\p{N}]+[\\r\\n]*|\\s*[\\r\\n]+|\\s+(?!\\S)|\\s+"
time=2025-03-12T14:23:38.469Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.attention.layer_norm_rms_epsilon default=9.999999974752427e-07
time=2025-03-12T14:23:38.469Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.rope.local.freq_base default=10000
time=2025-03-12T14:23:38.469Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06
time=2025-03-12T14:23:38.469Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.rope.freq_scale default=1
time=2025-03-12T14:23:38.469Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.final_logit_softcapping default=30
time=2025-03-12T14:23:38.469Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.mm_tokens_per_image default=256
time=2025-03-12T14:23:38.539Z level=INFO source=server.go:624 msg="llama runner started in 1.21 seconds"
ggml_backend_cuda_buffer_type_alloc_buffer: allocating 15554.32 MiB on device 0: cudaMalloc failed: out of memory
ggml_gallocr_reserve_n: failed to allocate CUDA0 buffer of size 16309888000
SIGSEGV: segmentation violation
PC=0x60fc5fd56fd0 m=135 sigcode=1 addr=0x58
signal arrived during cgo execution

goroutine 9 gp=0xc000103dc0 m=135 mp=0xc1ad8e8008 [syscall]:
runtime.cgocall(0x60fc5fdaafe0, 0xc000153b00)
	runtime/cgocall.go:167 +0x4b fp=0xc000153ad8 sp=0xc000153aa0 pc=0x60fc5ef7960b
github.com/ollama/ollama/ml/backend/ggml._Cfunc_ggml_backend_sched_graph_compute_async(0x7ee3b4003430, 0x7ee6281bd2e0)
	_cgo_gotypes.go:485 +0x4a fp=0xc000153b00 sp=0xc000153ad8 pc=0x60fc5f3641aa
github.com/ollama/ollama/ml/backend/ggml.Context.Compute.func1(...)
	github.com/ollama/ollama/ml/backend/ggml/ggml.go:497
github.com/ollama/ollama/ml/backend/ggml.Context.Compute({0xc000142000, 0x7ee1f0000ba0, 0x7ee6281bd2e0, 0x0, 0x2000}, {0xc21a1c7ca0, 0x1, 0x60fc60266210?})
	github.com/ollama/ollama/ml/backend/ggml/ggml.go:497 +0xbd fp=0xc000153b90 sp=0xc000153b00 pc=0x60fc5f36ca7d
github.com/ollama/ollama/ml/backend/ggml.(*Context).Compute(0xc21a20b410?, {0xc21a1c7ca0?, 0x200?, 0x0?})
	<autogenerated>:1 +0x72 fp=0xc000153c08 sp=0xc000153b90 pc=0x60fc5f3724f2
github.com/ollama/ollama/model.Forward({0x60fc6025dc00, 0xc21a20b410}, {0x60fc602552b0, 0xc0003580e0}, {{0xc005ced800, 0x200, 0x200}, {0x0, 0x0, 0x0}, ...})
	github.com/ollama/ollama/model/model.go:303 +0x218 fp=0xc000153cf0 sp=0xc000153c08 pc=0x60fc5f399718
github.com/ollama/ollama/runner/ollamarunner.(*Server).processBatch(0xc00013d9e0)
	github.com/ollama/ollama/runner/ollamarunner/runner.go:395 +0x3bb fp=0xc000153f98 sp=0xc000153cf0 pc=0x60fc5f405dfb
github.com/ollama/ollama/runner/ollamarunner.(*Server).run(0xc00013d9e0, {0x60fc602565e0, 0xc000524910})
	github.com/ollama/ollama/runner/ollamarunner/runner.go:321 +0x4e fp=0xc000153fb8 sp=0xc000153f98 pc=0x60fc5f4059ee
github.com/ollama/ollama/runner/ollamarunner.Execute.gowrap2()
	github.com/ollama/ollama/runner/ollamarunner/runner.go:919 +0x28 fp=0xc000153fe0 sp=0xc000153fb8 pc=0x60fc5f40a668
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc000153fe8 sp=0xc000153fe0 pc=0x60fc5ef84021
created by github.com/ollama/ollama/runner/ollamarunner.Execute in goroutine 1
	github.com/ollama/ollama/runner/ollamarunner/runner.go:919 +0xa9c

goroutine 1 gp=0xc000002380 m=nil [IO wait]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:435 +0xce fp=0xc00014f648 sp=0xc00014f628 pc=0x60fc5ef7c8ee
runtime.netpollblock(0xc00014f698?, 0x5ef16226?, 0xfc?)
	runtime/netpoll.go:575 +0xf7 fp=0xc00014f680 sp=0xc00014f648 pc=0x60fc5ef416f7
internal/poll.runtime_pollWait(0x7ee7261a6eb0, 0x72)
	runtime/netpoll.go:351 +0x85 fp=0xc00014f6a0 sp=0xc00014f680 pc=0x60fc5ef7bb05
internal/poll.(*pollDesc).wait(0xc0005a2800?, 0x900f1fcfe?, 0x0)
	internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc00014f6c8 sp=0xc00014f6a0 pc=0x60fc5f002f87
internal/poll.(*pollDesc).waitRead(...)
	internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Accept(0xc0005a2800)
	internal/poll/fd_unix.go:620 +0x295 fp=0xc00014f770 sp=0xc00014f6c8 pc=0x60fc5f008355
net.(*netFD).accept(0xc0005a2800)
	net/fd_unix.go:172 +0x29 fp=0xc00014f828 sp=0xc00014f770 pc=0x60fc5f07b169
net.(*TCPListener).accept(0xc000522cc0)
	net/tcpsock_posix.go:159 +0x1b fp=0xc00014f878 sp=0xc00014f828 pc=0x60fc5f090b1b
net.(*TCPListener).Accept(0xc000522cc0)
	net/tcpsock.go:380 +0x30 fp=0xc00014f8a8 sp=0xc00014f878 pc=0x60fc5f08f9d0
net/http.(*onceCloseListener).Accept(0xc0004c2480?)
	<autogenerated>:1 +0x24 fp=0xc00014f8c0 sp=0xc00014f8a8 pc=0x60fc5f2a6b44
net/http.(*Server).Serve(0xc000511100, {0x60fc60254318, 0xc000522cc0})
	net/http/server.go:3424 +0x30c fp=0xc00014f9f0 sp=0xc00014f8c0 pc=0x60fc5f27e40c
github.com/ollama/ollama/runner/ollamarunner.Execute({0xc000034150, 0xe, 0xf})
	github.com/ollama/ollama/runner/ollamarunner/runner.go:939 +0xe6a fp=0xc00014fd08 sp=0xc00014f9f0 pc=0x60fc5f40a3aa
github.com/ollama/ollama/runner.Execute({0xc000034130?, 0x0?, 0x0?})
	github.com/ollama/ollama/runner/runner.go:20 +0xc9 fp=0xc00014fd30 sp=0xc00014fd08 pc=0x60fc5f40af09
github.com/ollama/ollama/cmd.NewCLI.func2(0xc000510f00?, {0x60fc5fdc6054?, 0x4?, 0x60fc5fdc6058?})
	github.com/ollama/ollama/cmd/cmd.go:1285 +0x45 fp=0xc00014fd58 sp=0xc00014fd30 pc=0x60fc5fb7b6a5
github.com/spf13/cobra.(*Command).execute(0xc0004c4f08, {0xc000144c30, 0xf, 0xf})
	github.com/spf13/cobra@v1.7.0/command.go:940 +0x85c fp=0xc00014fe78 sp=0xc00014fd58 pc=0x60fc5f0f42fc
github.com/spf13/cobra.(*Command).ExecuteC(0xc0004ae908)
	github.com/spf13/cobra@v1.7.0/command.go:1068 +0x3a5 fp=0xc00014ff30 sp=0xc00014fe78 pc=0x60fc5f0f4b45
github.com/spf13/cobra.(*Command).Execute(...)
	github.com/spf13/cobra@v1.7.0/command.go:992
github.com/spf13/cobra.(*Command).ExecuteContext(...)
	github.com/spf13/cobra@v1.7.0/command.go:985
main.main()
	github.com/ollama/ollama/main.go:12 +0x4d fp=0xc00014ff50 sp=0xc00014ff30 pc=0x60fc5fb7ba0d
runtime.main()
	runtime/proc.go:283 +0x29d fp=0xc00014ffe0 sp=0xc00014ff50 pc=0x60fc5ef48cfd
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc00014ffe8 sp=0xc00014ffe0 pc=0x60fc5ef84021

goroutine 2 gp=0xc000002e00 m=nil [force gc (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:435 +0xce fp=0xc000090fa8 sp=0xc000090f88 pc=0x60fc5ef7c8ee
runtime.goparkunlock(...)
	runtime/proc.go:441
runtime.forcegchelper()
	runtime/proc.go:348 +0xb8 fp=0xc000090fe0 sp=0xc000090fa8 pc=0x60fc5ef49038
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc000090fe8 sp=0xc000090fe0 pc=0x60fc5ef84021
created by runtime.init.7 in goroutine 1
	runtime/proc.go:336 +0x1a

goroutine 3 gp=0xc000003340 m=nil [GC sweep wait]:
runtime.gopark(0x1?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:435 +0xce fp=0xc000091780 sp=0xc000091760 pc=0x60fc5ef7c8ee
runtime.goparkunlock(...)
	runtime/proc.go:441
runtime.bgsweep(0xc000042100)
	runtime/mgcsweep.go:316 +0xdf fp=0xc0000917c8 sp=0xc000091780 pc=0x60fc5ef3385f
runtime.gcenable.gowrap1()
	runtime/mgc.go:204 +0x25 fp=0xc0000917e0 sp=0xc0000917c8 pc=0x60fc5ef27c45
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc0000917e8 sp=0xc0000917e0 pc=0x60fc5ef84021
created by runtime.gcenable in goroutine 1
	runtime/mgc.go:204 +0x66

goroutine 4 gp=0xc000003500 m=nil [GC scavenge wait]:
runtime.gopark(0x60fc60abab40?, 0x60fc5ff7ca00?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:435 +0xce fp=0xc000091f78 sp=0xc000091f58 pc=0x60fc5ef7c8ee
runtime.goparkunlock(...)
	runtime/proc.go:441
runtime.(*scavengerState).park(0x60fc60abab40)
	runtime/mgcscavenge.go:425 +0x49 fp=0xc000091fa8 sp=0xc000091f78 pc=0x60fc5ef312a9
runtime.bgscavenge(0xc000042100)
	runtime/mgcscavenge.go:658 +0x59 fp=0xc000091fc8 sp=0xc000091fa8 pc=0x60fc5ef31839
runtime.gcenable.gowrap2()
	runtime/mgc.go:205 +0x25 fp=0xc000091fe0 sp=0xc000091fc8 pc=0x60fc5ef27be5
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc000091fe8 sp=0xc000091fe0 pc=0x60fc5ef84021
created by runtime.gcenable in goroutine 1
	runtime/mgc.go:205 +0xa5

goroutine 5 gp=0xc000003dc0 m=nil [finalizer wait]:
runtime.gopark(0x1b8?, 0x60fc5ef4b2a9?, 0x1?, 0x23?, 0xc000090688?)
	runtime/proc.go:435 +0xce fp=0xc000090630 sp=0xc000090610 pc=0x60fc5ef7c8ee
runtime.runfinq()
	runtime/mfinal.go:196 +0x107 fp=0xc0000907e0 sp=0xc000090630 pc=0x60fc5ef26c07
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc0000907e8 sp=0xc0000907e0 pc=0x60fc5ef84021
created by runtime.createfing in goroutine 1
	runtime/mfinal.go:166 +0x3d

goroutine 6 gp=0xc0001f08c0 m=nil [chan receive]:
runtime.gopark(0xc0001979a0?, 0xc0032fc030?, 0x60?, 0x27?, 0x60fc5f061ea8?)
	runtime/proc.go:435 +0xce fp=0xc000092718 sp=0xc0000926f8 pc=0x60fc5ef7c8ee
runtime.chanrecv(0xc0000c8380, 0x0, 0x1)
	runtime/chan.go:664 +0x445 fp=0xc000092790 sp=0xc000092718 pc=0x60fc5ef18e05
runtime.chanrecv1(0x0?, 0x0?)
	runtime/chan.go:506 +0x12 fp=0xc0000927b8 sp=0xc000092790 pc=0x60fc5ef18992
runtime.unique_runtime_registerUniqueMapCleanup.func2(...)
	runtime/mgc.go:1796
runtime.unique_runtime_registerUniqueMapCleanup.gowrap1()
	runtime/mgc.go:1799 +0x2f fp=0xc0000927e0 sp=0xc0000927b8 pc=0x60fc5ef2adef
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc0000927e8 sp=0xc0000927e0 pc=0x60fc5ef84021
created by unique.runtime_registerUniqueMapCleanup in goroutine 1
	runtime/mgc.go:1794 +0x85

goroutine 7 gp=0xc0001f0e00 m=nil [GC worker (idle)]:
runtime.gopark(0x3fbf424c610e4?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:435 +0xce fp=0xc000092f38 sp=0xc000092f18 pc=0x60fc5ef7c8ee
runtime.gcBgMarkWorker(0xc0000c9960)
	runtime/mgc.go:1423 +0xe9 fp=0xc000092fc8 sp=0xc000092f38 pc=0x60fc5ef2a109
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1339 +0x25 fp=0xc000092fe0 sp=0xc000092fc8 pc=0x60fc5ef29fe5
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc000092fe8 sp=0xc000092fe0 pc=0x60fc5ef84021
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1339 +0x105

goroutine 18 gp=0xc000504000 m=nil [GC worker (idle)]:
runtime.gopark(0x3fbf424c612d9?, 0x3?, 0xf9?, 0xfc?, 0x0?)
	runtime/proc.go:435 +0xce fp=0xc00008c738 sp=0xc00008c718 pc=0x60fc5ef7c8ee
runtime.gcBgMarkWorker(0xc0000c9960)
	runtime/mgc.go:1423 +0xe9 fp=0xc00008c7c8 sp=0xc00008c738 pc=0x60fc5ef2a109
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1339 +0x25 fp=0xc00008c7e0 sp=0xc00008c7c8 pc=0x60fc5ef29fe5
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc00008c7e8 sp=0xc00008c7e0 pc=0x60fc5ef84021
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1339 +0x105

goroutine 19 gp=0xc0005041c0 m=nil [GC worker (idle)]:
runtime.gopark(0x3fbf424c61029?, 0x3?, 0xea?, 0x7e?, 0x0?)
	runtime/proc.go:435 +0xce fp=0xc00008cf38 sp=0xc00008cf18 pc=0x60fc5ef7c8ee
runtime.gcBgMarkWorker(0xc0000c9960)
	runtime/mgc.go:1423 +0xe9 fp=0xc00008cfc8 sp=0xc00008cf38 pc=0x60fc5ef2a109
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1339 +0x25 fp=0xc00008cfe0 sp=0xc00008cfc8 pc=0x60fc5ef29fe5
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc00008cfe8 sp=0xc00008cfe0 pc=0x60fc5ef84021
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1339 +0x105

goroutine 20 gp=0xc000504380 m=nil [GC worker (idle)]:
runtime.gopark(0x3fbf424c60fe9?, 0x3?, 0x7a?, 0x8d?, 0x0?)
	runtime/proc.go:435 +0xce fp=0xc00008d738 sp=0xc00008d718 pc=0x60fc5ef7c8ee
runtime.gcBgMarkWorker(0xc0000c9960)
	runtime/mgc.go:1423 +0xe9 fp=0xc00008d7c8 sp=0xc00008d738 pc=0x60fc5ef2a109
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1339 +0x25 fp=0xc00008d7e0 sp=0xc00008d7c8 pc=0x60fc5ef29fe5
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc00008d7e8 sp=0xc00008d7e0 pc=0x60fc5ef84021
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1339 +0x105

goroutine 21 gp=0xc000504540 m=nil [GC worker (idle)]:
runtime.gopark(0x3fbf424c615d6?, 0x3?, 0x9?, 0x76?, 0x0?)
	runtime/proc.go:435 +0xce fp=0xc00008df38 sp=0xc00008df18 pc=0x60fc5ef7c8ee
runtime.gcBgMarkWorker(0xc0000c9960)
	runtime/mgc.go:1423 +0xe9 fp=0xc00008dfc8 sp=0xc00008df38 pc=0x60fc5ef2a109
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1339 +0x25 fp=0xc00008dfe0 sp=0xc00008dfc8 pc=0x60fc5ef29fe5
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc00008dfe8 sp=0xc00008dfe0 pc=0x60fc5ef84021
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1339 +0x105

goroutine 22 gp=0xc000504700 m=nil [GC worker (idle)]:
runtime.gopark(0x3fbf434724ef4?, 0x1?, 0xe6?, 0x8?, 0x0?)
	runtime/proc.go:435 +0xce fp=0xc00008e738 sp=0xc00008e718 pc=0x60fc5ef7c8ee
runtime.gcBgMarkWorker(0xc0000c9960)
	runtime/mgc.go:1423 +0xe9 fp=0xc00008e7c8 sp=0xc00008e738 pc=0x60fc5ef2a109
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1339 +0x25 fp=0xc00008e7e0 sp=0xc00008e7c8 pc=0x60fc5ef29fe5
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc00008e7e8 sp=0xc00008e7e0 pc=0x60fc5ef84021
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1339 +0x105

goroutine 23 gp=0xc0005048c0 m=nil [GC worker (idle)]:
runtime.gopark(0x60fc60b692c0?, 0x1?, 0x80?, 0x85?, 0x0?)
	runtime/proc.go:435 +0xce fp=0xc00008ef38 sp=0xc00008ef18 pc=0x60fc5ef7c8ee
runtime.gcBgMarkWorker(0xc0000c9960)
	runtime/mgc.go:1423 +0xe9 fp=0xc00008efc8 sp=0xc00008ef38 pc=0x60fc5ef2a109
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1339 +0x25 fp=0xc00008efe0 sp=0xc00008efc8 pc=0x60fc5ef29fe5
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc00008efe8 sp=0xc00008efe0 pc=0x60fc5ef84021
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1339 +0x105

goroutine 24 gp=0xc000504a80 m=nil [GC worker (idle)]:
runtime.gopark(0x3fbf424c5fe6a?, 0x3?, 0xd5?, 0xab?, 0x0?)
	runtime/proc.go:435 +0xce fp=0xc00008f738 sp=0xc00008f718 pc=0x60fc5ef7c8ee
runtime.gcBgMarkWorker(0xc0000c9960)
	runtime/mgc.go:1423 +0xe9 fp=0xc00008f7c8 sp=0xc00008f738 pc=0x60fc5ef2a109
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1339 +0x25 fp=0xc00008f7e0 sp=0xc00008f7c8 pc=0x60fc5ef29fe5
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc00008f7e8 sp=0xc00008f7e0 pc=0x60fc5ef84021
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1339 +0x105

goroutine 25 gp=0xc000504c40 m=nil [GC worker (idle)]:
runtime.gopark(0x3fbf424c6114d?, 0x1?, 0xcf?, 0x37?, 0x0?)
	runtime/proc.go:435 +0xce fp=0xc00008ff38 sp=0xc00008ff18 pc=0x60fc5ef7c8ee
runtime.gcBgMarkWorker(0xc0000c9960)
	runtime/mgc.go:1423 +0xe9 fp=0xc00008ffc8 sp=0xc00008ff38 pc=0x60fc5ef2a109
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1339 +0x25 fp=0xc00008ffe0 sp=0xc00008ffc8 pc=0x60fc5ef29fe5
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc00008ffe8 sp=0xc00008ffe0 pc=0x60fc5ef84021
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1339 +0x105

goroutine 26 gp=0xc000504e00 m=nil [GC worker (idle)]:
runtime.gopark(0x3fbf424c6108f?, 0x3?, 0xd2?, 0xb8?, 0x0?)
	runtime/proc.go:435 +0xce fp=0xc00050c738 sp=0xc00050c718 pc=0x60fc5ef7c8ee
runtime.gcBgMarkWorker(0xc0000c9960)
	runtime/mgc.go:1423 +0xe9 fp=0xc00050c7c8 sp=0xc00050c738 pc=0x60fc5ef2a109
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1339 +0x25 fp=0xc00050c7e0 sp=0xc00050c7c8 pc=0x60fc5ef29fe5
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc00050c7e8 sp=0xc00050c7e0 pc=0x60fc5ef84021
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1339 +0x105

goroutine 27 gp=0xc000504fc0 m=nil [GC worker (idle)]:
runtime.gopark(0x60fc60b692c0?, 0x1?, 0xf5?, 0x2c?, 0x0?)
	runtime/proc.go:435 +0xce fp=0xc00050cf38 sp=0xc00050cf18 pc=0x60fc5ef7c8ee
runtime.gcBgMarkWorker(0xc0000c9960)
	runtime/mgc.go:1423 +0xe9 fp=0xc00050cfc8 sp=0xc00050cf38 pc=0x60fc5ef2a109
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1339 +0x25 fp=0xc00050cfe0 sp=0xc00050cfc8 pc=0x60fc5ef29fe5
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc00050cfe8 sp=0xc00050cfe0 pc=0x60fc5ef84021
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1339 +0x105

goroutine 34 gp=0xc000102700 m=nil [GC worker (idle)]:
runtime.gopark(0x3fbf424c60ffd?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:435 +0xce fp=0xc000508738 sp=0xc000508718 pc=0x60fc5ef7c8ee
runtime.gcBgMarkWorker(0xc0000c9960)
	runtime/mgc.go:1423 +0xe9 fp=0xc0005087c8 sp=0xc000508738 pc=0x60fc5ef2a109
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1339 +0x25 fp=0xc0005087e0 sp=0xc0005087c8 pc=0x60fc5ef29fe5
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc0005087e8 sp=0xc0005087e0 pc=0x60fc5ef84021
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1339 +0x105

goroutine 35 gp=0xc0001028c0 m=nil [GC worker (idle)]:
runtime.gopark(0x60fc60b692c0?, 0x3?, 0x9c?, 0x45?, 0x0?)
	runtime/proc.go:435 +0xce fp=0xc000508f38 sp=0xc000508f18 pc=0x60fc5ef7c8ee
runtime.gcBgMarkWorker(0xc0000c9960)
	runtime/mgc.go:1423 +0xe9 fp=0xc000508fc8 sp=0xc000508f38 pc=0x60fc5ef2a109
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1339 +0x25 fp=0xc000508fe0 sp=0xc000508fc8 pc=0x60fc5ef29fe5
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc000508fe8 sp=0xc000508fe0 pc=0x60fc5ef84021
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1339 +0x105

goroutine 36 gp=0xc000102a80 m=nil [GC worker (idle)]:
runtime.gopark(0x3fbf424c61173?, 0x3?, 0x1d?, 0x40?, 0x0?)
	runtime/proc.go:435 +0xce fp=0xc000509738 sp=0xc000509718 pc=0x60fc5ef7c8ee
runtime.gcBgMarkWorker(0xc0000c9960)
	runtime/mgc.go:1423 +0xe9 fp=0xc0005097c8 sp=0xc000509738 pc=0x60fc5ef2a109
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1339 +0x25 fp=0xc0005097e0 sp=0xc0005097c8 pc=0x60fc5ef29fe5
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc0005097e8 sp=0xc0005097e0 pc=0x60fc5ef84021
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1339 +0x105

goroutine 37 gp=0xc000102c40 m=nil [GC worker (idle)]:
runtime.gopark(0x60fc60b692c0?, 0x1?, 0x35?, 0x12?, 0x0?)
	runtime/proc.go:435 +0xce fp=0xc000509f38 sp=0xc000509f18 pc=0x60fc5ef7c8ee
runtime.gcBgMarkWorker(0xc0000c9960)
	runtime/mgc.go:1423 +0xe9 fp=0xc000509fc8 sp=0xc000509f38 pc=0x60fc5ef2a109
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1339 +0x25 fp=0xc000509fe0 sp=0xc000509fc8 pc=0x60fc5ef29fe5
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc000509fe8 sp=0xc000509fe0 pc=0x60fc5ef84021
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1339 +0x105

goroutine 38 gp=0xc000102e00 m=nil [GC worker (idle)]:
runtime.gopark(0x3fbf4325936f0?, 0x1?, 0x1f?, 0xdb?, 0x0?)
	runtime/proc.go:435 +0xce fp=0xc00050a738 sp=0xc00050a718 pc=0x60fc5ef7c8ee
runtime.gcBgMarkWorker(0xc0000c9960)
	runtime/mgc.go:1423 +0xe9 fp=0xc00050a7c8 sp=0xc00050a738 pc=0x60fc5ef2a109
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1339 +0x25 fp=0xc00050a7e0 sp=0xc00050a7c8 pc=0x60fc5ef29fe5
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc00050a7e8 sp=0xc00050a7e0 pc=0x60fc5ef84021
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1339 +0x105

goroutine 39 gp=0xc000102fc0 m=nil [GC worker (idle)]:
runtime.gopark(0x3fbf424c614a6?, 0x1?, 0xa0?, 0x20?, 0x0?)
	runtime/proc.go:435 +0xce fp=0xc00050af38 sp=0xc00050af18 pc=0x60fc5ef7c8ee
runtime.gcBgMarkWorker(0xc0000c9960)
	runtime/mgc.go:1423 +0xe9 fp=0xc00050afc8 sp=0xc00050af38 pc=0x60fc5ef2a109
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1339 +0x25 fp=0xc00050afe0 sp=0xc00050afc8 pc=0x60fc5ef29fe5
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc00050afe8 sp=0xc00050afe0 pc=0x60fc5ef84021
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1339 +0x105

goroutine 50 gp=0xc000584000 m=nil [GC worker (idle)]:
runtime.gopark(0x3fbf432593e51?, 0x1?, 0xdf?, 0xee?, 0x0?)
	runtime/proc.go:435 +0xce fp=0xc00058a738 sp=0xc00058a718 pc=0x60fc5ef7c8ee
runtime.gcBgMarkWorker(0xc0000c9960)
	runtime/mgc.go:1423 +0xe9 fp=0xc00058a7c8 sp=0xc00058a738 pc=0x60fc5ef2a109
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1339 +0x25 fp=0xc00058a7e0 sp=0xc00058a7c8 pc=0x60fc5ef29fe5
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc00058a7e8 sp=0xc00058a7e0 pc=0x60fc5ef84021
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1339 +0x105

goroutine 51 gp=0xc0005841c0 m=nil [GC worker (idle)]:
runtime.gopark(0x3fbf432594712?, 0x1?, 0xd5?, 0xf5?, 0x0?)
	runtime/proc.go:435 +0xce fp=0xc00058af38 sp=0xc00058af18 pc=0x60fc5ef7c8ee
runtime.gcBgMarkWorker(0xc0000c9960)
	runtime/mgc.go:1423 +0xe9 fp=0xc00058afc8 sp=0xc00058af38 pc=0x60fc5ef2a109
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1339 +0x25 fp=0xc00058afe0 sp=0xc00058afc8 pc=0x60fc5ef29fe5
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc00058afe8 sp=0xc00058afe0 pc=0x60fc5ef84021
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1339 +0x105

goroutine 52 gp=0xc000584380 m=nil [GC worker (idle)]:
runtime.gopark(0x3fbf432613ecb?, 0x1?, 0x91?, 0xeb?, 0x0?)
	runtime/proc.go:435 +0xce fp=0xc00058b738 sp=0xc00058b718 pc=0x60fc5ef7c8ee
runtime.gcBgMarkWorker(0xc0000c9960)
	runtime/mgc.go:1423 +0xe9 fp=0xc00058b7c8 sp=0xc00058b738 pc=0x60fc5ef2a109
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1339 +0x25 fp=0xc00058b7e0 sp=0xc00058b7c8 pc=0x60fc5ef29fe5
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc00058b7e8 sp=0xc00058b7e0 pc=0x60fc5ef84021
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1339 +0x105

goroutine 1138 gp=0xc003132700 m=nil [select]:
runtime.gopark(0xc038221a68?, 0x2?, 0x0?, 0x8b?, 0xc03822180c?)
	runtime/proc.go:435 +0xce fp=0xc038221620 sp=0xc038221600 pc=0x60fc5ef7c8ee
runtime.selectgo(0xc038221a68, 0xc038221808, 0x901?, 0x0, 0x1?, 0x1)
	runtime/select.go:351 +0x837 fp=0xc038221758 sp=0xc038221620 pc=0x60fc5ef5b1f7
github.com/ollama/ollama/runner/ollamarunner.(*Server).completion(0xc00013d9e0, {0x60fc602544f8, 0xc005cea1c0}, 0xc000128280)
	github.com/ollama/ollama/runner/ollamarunner/runner.go:649 +0xad0 fp=0xc038221ac0 sp=0xc038221758 pc=0x60fc5f407e70
github.com/ollama/ollama/runner/ollamarunner.(*Server).completion-fm({0x60fc602544f8?, 0xc005cea1c0?}, 0xc000153b40?)
	<autogenerated>:1 +0x36 fp=0xc038221af0 sp=0xc038221ac0 pc=0x60fc5f40aa36
net/http.HandlerFunc.ServeHTTP(0xc000141500?, {0x60fc602544f8?, 0xc005cea1c0?}, 0xc000153b60?)
	net/http/server.go:2294 +0x29 fp=0xc038221b18 sp=0xc038221af0 pc=0x60fc5f27aa49
net/http.(*ServeMux).ServeHTTP(0x60fc5ef21125?, {0x60fc602544f8, 0xc005cea1c0}, 0xc000128280)
	net/http/server.go:2822 +0x1c4 fp=0xc038221b68 sp=0xc038221b18 pc=0x60fc5f27c944
net/http.serverHandler.ServeHTTP({0x60fc60250b10?}, {0x60fc602544f8?, 0xc005cea1c0?}, 0x1?)
	net/http/server.go:3301 +0x8e fp=0xc038221b98 sp=0xc038221b68 pc=0x60fc5f29a3ce
net/http.(*conn).serve(0xc0004c2480, {0x60fc602565a8, 0xc000688d50})
	net/http/server.go:2102 +0x625 fp=0xc038221fb8 sp=0xc038221b98 pc=0x60fc5f278f45
net/http.(*Server).Serve.gowrap3()
	net/http/server.go:3454 +0x28 fp=0xc038221fe0 sp=0xc038221fb8 pc=0x60fc5f27e808
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc038221fe8 sp=0xc038221fe0 pc=0x60fc5ef84021
created by net/http.(*Server).Serve in goroutine 1
	net/http/server.go:3454 +0x485

goroutine 30 gp=0xc0032ff6c0 m=nil [IO wait]:
runtime.gopark(0xc004883df8?, 0x60fc5ef83c7c?, 0x20?, 0x3e?, 0xb?)
	runtime/proc.go:435 +0xce fp=0xc004883dd8 sp=0xc004883db8 pc=0x60fc5ef7c8ee
runtime.netpollblock(0x60fc5ef9fd78?, 0x5ef16226?, 0xfc?)
	runtime/netpoll.go:575 +0xf7 fp=0xc004883e10 sp=0xc004883dd8 pc=0x60fc5ef416f7
internal/poll.runtime_pollWait(0x7ee7261a6d98, 0x72)
	runtime/netpoll.go:351 +0x85 fp=0xc004883e30 sp=0xc004883e10 pc=0x60fc5ef7bb05
internal/poll.(*pollDesc).wait(0xc0005a2080?, 0xc00652c821?, 0x0)
	internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc004883e58 sp=0xc004883e30 pc=0x60fc5f002f87
internal/poll.(*pollDesc).waitRead(...)
	internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Read(0xc0005a2080, {0xc00652c821, 0x1, 0x1})
	internal/poll/fd_unix.go:165 +0x27a fp=0xc004883ef0 sp=0xc004883e58 pc=0x60fc5f00427a
net.(*netFD).Read(0xc0005a2080, {0xc00652c821?, 0xc0030ae1d8?, 0xc004883f70?})
	net/fd_posix.go:55 +0x25 fp=0xc004883f38 sp=0xc004883ef0 pc=0x60fc5f0791c5
net.(*conn).Read(0xc000134190, {0xc00652c821?, 0xc0030af880?, 0x60fc5f362e40?})
	net/net.go:194 +0x45 fp=0xc004883f80 sp=0xc004883f38 pc=0x60fc5f087585
net/http.(*connReader).backgroundRead(0xc00652c810)
	net/http/server.go:690 +0x37 fp=0xc004883fc8 sp=0xc004883f80 pc=0x60fc5f272e17
net/http.(*connReader).startBackgroundRead.gowrap2()
	net/http/server.go:686 +0x25 fp=0xc004883fe0 sp=0xc004883fc8 pc=0x60fc5f272d45
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc004883fe8 sp=0xc004883fe0 pc=0x60fc5ef84021
created by net/http.(*connReader).startBackgroundRead in goroutine 1138
	net/http/server.go:686 +0xb6

rax    0x7ee3b433d620
rbx    0x7ee3b433d590
rcx    0x2
rdx    0x7ee1f01beac0
rdi    0x0
rsi    0x7ee18cb3f030
rbp    0x7ee1f01beab0
rsp    0x7ee20affcc48
r8     0x4
r9     0xc000094048
r10    0x1
r11    0x206
r12    0x0
r13    0x7ee3b4003588
r14    0xa52
r15    0x7ee3b433d590
rip    0x60fc5fd56fd0
rflags 0x10206
cs     0x33
fs     0x0
gs     0x0
[GIN] 2025/03/12 - 14:23:39 | 500 |  8.107990206s |      172.17.0.2 | POST     "/api/chat"
time=2025-03-12T14:23:39.716Z level=ERROR source=server.go:449 msg="llama runner terminated" error="exit status 2"
time=2025-03-12T14:23:44.670Z level=WARN source=sched.go:647 msg="gpu VRAM usage didn't recover within timeout" seconds=5.197443181 model=/home/arch/Desktop/AIstuff/llms/models/ollama/blobs/sha256-adca500fad9b54c565ae672184e0c9eb690eb6014ba63f8ec13849d4f73a32d3
time=2025-03-12T14:23:44.921Z level=WARN source=sched.go:647 msg="gpu VRAM usage didn't recover within timeout" seconds=5.447655667 model=/home/arch/Desktop/AIstuff/llms/models/ollama/blobs/sha256-adca500fad9b54c565ae672184e0c9eb690eb6014ba63f8ec13849d4f73a32d3
time=2025-03-12T14:23:45.180Z level=WARN source=sched.go:647 msg="gpu VRAM usage didn't recover within timeout" seconds=5.706586906 model=/home/arch/Desktop/AIstuff/llms/models/ollama/blobs/sha256-adca500fad9b54c565ae672184e0c9eb690eb6014ba63f8ec13849d4f73a32d3
time=2025-03-12T14:23:45.354Z level=INFO source=server.go:105 msg="system memory" total="62.4 GiB" free="54.1 GiB" free_swap="0 B"
time=2025-03-12T14:23:45.354Z level=INFO source=server.go:138 msg=offload library=cuda layers.requested=-1 layers.model=49 layers.offload=33 layers.split="" memory.available="[14.5 GiB]" memory.gpu_overhead="0 B" memory.required.full="20.8 GiB" memory.required.partial="14.3 GiB" memory.required.kv="11.7 GiB" memory.required.allocations="[14.3 GiB]" memory.weights.total="17.7 GiB" memory.weights.repeating="17.0 GiB" memory.weights.nonrepeating="787.5 MiB" memory.graph.full="1.1 GiB" memory.graph.partial="1.3 GiB"
time=2025-03-12T14:23:45.403Z level=WARN source=ggml.go:149 msg="key not found" key=tokenizer.ggml.pretokenizer default="(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\\r\\n\\p{L}\\p{N}]?\\p{L}+|\\p{N}{1,3}| ?[^\\s\\p{L}\\p{N}]+[\\r\\n]*|\\s*[\\r\\n]+|\\s+(?!\\S)|\\s+"
time=2025-03-12T14:23:45.405Z level=WARN source=ggml.go:149 msg="key not found" key=tokenizer.ggml.add_eot_token default=false
time=2025-03-12T14:23:45.406Z level=WARN source=ggml.go:149 msg="key not found" key=tokenizer.ggml.pretokenizer default="(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\\r\\n\\p{L}\\p{N}]?\\p{L}+|\\p{N}{1,3}| ?[^\\s\\p{L}\\p{N}]+[\\r\\n]*|\\s*[\\r\\n]+|\\s+(?!\\S)|\\s+"
time=2025-03-12T14:23:45.408Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.attention.layer_norm_rms_epsilon default=9.999999974752427e-07
time=2025-03-12T14:23:45.409Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.rope.local.freq_base default=10000
time=2025-03-12T14:23:45.409Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06
time=2025-03-12T14:23:45.409Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.rope.freq_scale default=1
time=2025-03-12T14:23:45.409Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.final_logit_softcapping default=30
time=2025-03-12T14:23:45.409Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.mm_tokens_per_image default=256
time=2025-03-12T14:23:45.409Z level=INFO source=server.go:405 msg="starting llama server" cmd="/usr/local/bin/ollama runner --ollama-engine --model /home/arch/Desktop/AIstuff/llms/models/ollama/blobs/sha256-adca500fad9b54c565ae672184e0c9eb690eb6014ba63f8ec13849d4f73a32d3 --ctx-size 32000 --batch-size 512 --n-gpu-layers 33 --threads 6 --parallel 1 --port 33003"
time=2025-03-12T14:23:45.409Z level=INFO source=sched.go:450 msg="loaded runners" count=1
time=2025-03-12T14:23:45.409Z level=INFO source=server.go:585 msg="waiting for llama runner to start responding"
time=2025-03-12T14:23:45.409Z level=INFO source=server.go:619 msg="waiting for server to become available" status="llm server error"
time=2025-03-12T14:23:45.417Z level=INFO source=runner.go:882 msg="starting ollama engine"
time=2025-03-12T14:23:45.417Z level=INFO source=runner.go:938 msg="Server listening on 127.0.0.1:33003"
time=2025-03-12T14:23:45.465Z level=WARN source=ggml.go:149 msg="key not found" key=general.name default=""
time=2025-03-12T14:23:45.465Z level=WARN source=ggml.go:149 msg="key not found" key=general.description default=""
time=2025-03-12T14:23:45.465Z level=INFO source=ggml.go:67 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=1065 num_key_values=36
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 4090 Laptop GPU, compute capability 8.9, VMM: yes
load_backend: loaded CUDA backend from /usr/local/lib/ollama/cuda_v12/libggml-cuda.so
load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-alderlake.so
time=2025-03-12T14:23:45.502Z level=INFO source=ggml.go:109 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX_VNNI=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=500,600,610,700,750,800,860,870,890,900,1200 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc)
time=2025-03-12T14:23:45.610Z level=INFO source=ggml.go:289 msg="model weights" buffer=CPU size="4.2 GiB"
time=2025-03-12T14:23:45.610Z level=INFO source=ggml.go:289 msg="model weights" buffer=CUDA0 size="4.1 GiB"
time=2025-03-12T14:23:45.673Z level=INFO source=server.go:619 msg="waiting for server to become available" status="llm server loading model"
time=2025-03-12T14:23:46.577Z level=INFO source=ggml.go:356 msg="compute graph" backend=CUDA0 buffer_type=CUDA0
time=2025-03-12T14:23:46.577Z level=INFO source=ggml.go:356 msg="compute graph" backend=CPU buffer_type=CUDA_Host
time=2025-03-12T14:23:46.577Z level=WARN source=ggml.go:149 msg="key not found" key=tokenizer.ggml.pretokenizer default="(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\\r\\n\\p{L}\\p{N}]?\\p{L}+|\\p{N}{1,3}| ?[^\\s\\p{L}\\p{N}]+[\\r\\n]*|\\s*[\\r\\n]+|\\s+(?!\\S)|\\s+"
time=2025-03-12T14:23:46.578Z level=WARN source=ggml.go:149 msg="key not found" key=tokenizer.ggml.add_eot_token default=false
time=2025-03-12T14:23:46.580Z level=WARN source=ggml.go:149 msg="key not found" key=tokenizer.ggml.pretokenizer default="(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\\r\\n\\p{L}\\p{N}]?\\p{L}+|\\p{N}{1,3}| ?[^\\s\\p{L}\\p{N}]+[\\r\\n]*|\\s*[\\r\\n]+|\\s+(?!\\S)|\\s+"
time=2025-03-12T14:23:46.582Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.attention.layer_norm_rms_epsilon default=9.999999974752427e-07
time=2025-03-12T14:23:46.582Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.rope.local.freq_base default=10000
time=2025-03-12T14:23:46.582Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06
time=2025-03-12T14:23:46.582Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.rope.freq_scale default=1
time=2025-03-12T14:23:46.582Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.final_logit_softcapping default=30
time=2025-03-12T14:23:46.582Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.mm_tokens_per_image default=256
time=2025-03-12T14:23:46.683Z level=INFO source=server.go:624 msg="llama runner started in 1.27 seconds"
ggml_backend_cuda_buffer_type_alloc_buffer: allocating 15554.32 MiB on device 0: cudaMalloc failed: out of memory
ggml_gallocr_reserve_n: failed to allocate CUDA0 buffer of size 16309888000
SIGSEGV: segmentation violation
PC=0x64a7b881cfd0 m=43 sigcode=1 addr=0x58
signal arrived during cgo execution

goroutine 15 gp=0xc00048ae00 m=43 mp=0xc004f00008 [syscall]:
runtime.cgocall(0x64a7b8870fe0, 0xc0005f7b00)
	runtime/cgocall.go:167 +0x4b fp=0xc0005f7ad8 sp=0xc0005f7aa0 pc=0x64a7b7a3f60b
github.com/ollama/ollama/ml/backend/ggml._Cfunc_ggml_backend_sched_graph_compute_async(0x7e6eb8003770, 0x7e73062bf2e0)
	_cgo_gotypes.go:485 +0x4a fp=0xc0005f7b00 sp=0xc0005f7ad8 pc=0x64a7b7e2a1aa
github.com/ollama/ollama/ml/backend/ggml.Context.Compute.func1(...)
	github.com/ollama/ollama/ml/backend/ggml/ggml.go:497
github.com/ollama/ollama/ml/backend/ggml.Context.Compute({0xc00043c040, 0x7e707c002d40, 0x7e73062bf2e0, 0x0, 0x2000}, {0xc0050d0e00, 0x1, 0x64a7b8d2c210?})
	github.com/ollama/ollama/ml/backend/ggml/ggml.go:497 +0xbd fp=0xc0005f7b90 sp=0xc0005f7b00 pc=0x64a7b7e32a7d
github.com/ollama/ollama/ml/backend/ggml.(*Context).Compute(0xc005103b30?, {0xc0050d0e00?, 0x200?, 0x0?})
	<autogenerated>:1 +0x72 fp=0xc0005f7c08 sp=0xc0005f7b90 pc=0x64a7b7e384f2
github.com/ollama/ollama/model.Forward({0x64a7b8d23c00, 0xc005103b30}, {0x64a7b8d1b2b0, 0xc0000e8000}, {{0xc004b69800, 0x200, 0x200}, {0x0, 0x0, 0x0}, ...})
	github.com/ollama/ollama/model/model.go:303 +0x218 fp=0xc0005f7cf0 sp=0xc0005f7c08 pc=0x64a7b7e5f718
github.com/ollama/ollama/runner/ollamarunner.(*Server).processBatch(0xc000035d40)
	github.com/ollama/ollama/runner/ollamarunner/runner.go:395 +0x3bb fp=0xc0005f7f98 sp=0xc0005f7cf0 pc=0x64a7b7ecbdfb
github.com/ollama/ollama/runner/ollamarunner.(*Server).run(0xc000035d40, {0x64a7b8d1c5e0, 0xc000611540})
	github.com/ollama/ollama/runner/ollamarunner/runner.go:321 +0x4e fp=0xc0005f7fb8 sp=0xc0005f7f98 pc=0x64a7b7ecb9ee
github.com/ollama/ollama/runner/ollamarunner.Execute.gowrap2()
	github.com/ollama/ollama/runner/ollamarunner/runner.go:919 +0x28 fp=0xc0005f7fe0 sp=0xc0005f7fb8 pc=0x64a7b7ed0668
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc0005f7fe8 sp=0xc0005f7fe0 pc=0x64a7b7a4a021
created by github.com/ollama/ollama/runner/ollamarunner.Execute in goroutine 1
	github.com/ollama/ollama/runner/ollamarunner/runner.go:919 +0xa9c

goroutine 1 gp=0xc000002380 m=nil [IO wait]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:435 +0xce fp=0xc0005f3648 sp=0xc0005f3628 pc=0x64a7b7a428ee
runtime.netpollblock(0xc0005f3698?, 0xb79dc226?, 0xa7?)
	runtime/netpoll.go:575 +0xf7 fp=0xc0005f3680 sp=0xc0005f3648 pc=0x64a7b7a076f7
internal/poll.runtime_pollWait(0x7e7405985de0, 0x72)
	runtime/netpoll.go:351 +0x85 fp=0xc0005f36a0 sp=0xc0005f3680 pc=0x64a7b7a41b05
internal/poll.(*pollDesc).wait(0xc0005b2600?, 0x9009e5cfe?, 0x0)
	internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc0005f36c8 sp=0xc0005f36a0 pc=0x64a7b7ac8f87
internal/poll.(*pollDesc).waitRead(...)
	internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Accept(0xc0005b2600)
	internal/poll/fd_unix.go:620 +0x295 fp=0xc0005f3770 sp=0xc0005f36c8 pc=0x64a7b7ace355
net.(*netFD).accept(0xc0005b2600)
	net/fd_unix.go:172 +0x29 fp=0xc0005f3828 sp=0xc0005f3770 pc=0x64a7b7b41169
net.(*TCPListener).accept(0xc0000bcf80)
	net/tcpsock_posix.go:159 +0x1b fp=0xc0005f3878 sp=0xc0005f3828 pc=0x64a7b7b56b1b
net.(*TCPListener).Accept(0xc0000bcf80)
	net/tcpsock.go:380 +0x30 fp=0xc0005f38a8 sp=0xc0005f3878 pc=0x64a7b7b559d0
net/http.(*onceCloseListener).Accept(0xc0005c2120?)
	<autogenerated>:1 +0x24 fp=0xc0005f38c0 sp=0xc0005f38a8 pc=0x64a7b7d6cb44
net/http.(*Server).Serve(0xc0001f9400, {0x64a7b8d1a318, 0xc0000bcf80})
	net/http/server.go:3424 +0x30c fp=0xc0005f39f0 sp=0xc0005f38c0 pc=0x64a7b7d4440c
github.com/ollama/ollama/runner/ollamarunner.Execute({0xc000132030, 0xe, 0xf})
	github.com/ollama/ollama/runner/ollamarunner/runner.go:939 +0xe6a fp=0xc0005f3d08 sp=0xc0005f39f0 pc=0x64a7b7ed03aa
github.com/ollama/ollama/runner.Execute({0xc000132010?, 0x0?, 0x0?})
	github.com/ollama/ollama/runner/runner.go:20 +0xc9 fp=0xc0005f3d30 sp=0xc0005f3d08 pc=0x64a7b7ed0f09
github.com/ollama/ollama/cmd.NewCLI.func2(0xc0001f9200?, {0x64a7b888c054?, 0x4?, 0x64a7b888c058?})
	github.com/ollama/ollama/cmd/cmd.go:1285 +0x45 fp=0xc0005f3d58 sp=0xc0005f3d30 pc=0x64a7b86416a5
github.com/spf13/cobra.(*Command).execute(0xc0005c4f08, {0xc00067eff0, 0xf, 0xf})
	github.com/spf13/cobra@v1.7.0/command.go:940 +0x85c fp=0xc0005f3e78 sp=0xc0005f3d58 pc=0x64a7b7bba2fc
github.com/spf13/cobra.(*Command).ExecuteC(0xc0005ac908)
	github.com/spf13/cobra@v1.7.0/command.go:1068 +0x3a5 fp=0xc0005f3f30 sp=0xc0005f3e78 pc=0x64a7b7bbab45
github.com/spf13/cobra.(*Command).Execute(...)
	github.com/spf13/cobra@v1.7.0/command.go:992
github.com/spf13/cobra.(*Command).ExecuteContext(...)
	github.com/spf13/cobra@v1.7.0/command.go:985
main.main()
	github.com/ollama/ollama/main.go:12 +0x4d fp=0xc0005f3f50 sp=0xc0005f3f30 pc=0x64a7b8641a0d
runtime.main()
	runtime/proc.go:283 +0x29d fp=0xc0005f3fe0 sp=0xc0005f3f50 pc=0x64a7b7a0ecfd
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc0005f3fe8 sp=0xc0005f3fe0 pc=0x64a7b7a4a021

goroutine 2 gp=0xc000002e00 m=nil [force gc (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:435 +0xce fp=0xc000090fa8 sp=0xc000090f88 pc=0x64a7b7a428ee
runtime.goparkunlock(...)
	runtime/proc.go:441
runtime.forcegchelper()
	runtime/proc.go:348 +0xb8 fp=0xc000090fe0 sp=0xc000090fa8 pc=0x64a7b7a0f038
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc000090fe8 sp=0xc000090fe0 pc=0x64a7b7a4a021
created by runtime.init.7 in goroutine 1
	runtime/proc.go:336 +0x1a

goroutine 3 gp=0xc000003340 m=nil [GC sweep wait]:
runtime.gopark(0x1?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:435 +0xce fp=0xc000091780 sp=0xc000091760 pc=0x64a7b7a428ee
runtime.goparkunlock(...)
	runtime/proc.go:441
runtime.bgsweep(0xc000042100)
	runtime/mgcsweep.go:316 +0xdf fp=0xc0000917c8 sp=0xc000091780 pc=0x64a7b79f985f
runtime.gcenable.gowrap1()
	runtime/mgc.go:204 +0x25 fp=0xc0000917e0 sp=0xc0000917c8 pc=0x64a7b79edc45
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc0000917e8 sp=0xc0000917e0 pc=0x64a7b7a4a021
created by runtime.gcenable in goroutine 1
	runtime/mgc.go:204 +0x66

goroutine 4 gp=0xc000003500 m=nil [GC scavenge wait]:
runtime.gopark(0x10000?, 0x64a7b8a42a00?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:435 +0xce fp=0xc000091f78 sp=0xc000091f58 pc=0x64a7b7a428ee
runtime.goparkunlock(...)
	runtime/proc.go:441
runtime.(*scavengerState).park(0x64a7b9580b40)
	runtime/mgcscavenge.go:425 +0x49 fp=0xc000091fa8 sp=0xc000091f78 pc=0x64a7b79f72a9
runtime.bgscavenge(0xc000042100)
	runtime/mgcscavenge.go:658 +0x59 fp=0xc000091fc8 sp=0xc000091fa8 pc=0x64a7b79f7839
runtime.gcenable.gowrap2()
	runtime/mgc.go:205 +0x25 fp=0xc000091fe0 sp=0xc000091fc8 pc=0x64a7b79edbe5
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc000091fe8 sp=0xc000091fe0 pc=0x64a7b7a4a021
created by runtime.gcenable in goroutine 1
	runtime/mgc.go:205 +0xa5

goroutine 18 gp=0xc000102700 m=nil [finalizer wait]:
runtime.gopark(0x1b8?, 0xc000002380?, 0x1?, 0x23?, 0xc000090688?)
	runtime/proc.go:435 +0xce fp=0xc000090630 sp=0xc000090610 pc=0x64a7b7a428ee
runtime.runfinq()
	runtime/mfinal.go:196 +0x107 fp=0xc0000907e0 sp=0xc000090630 pc=0x64a7b79ecc07
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc0000907e8 sp=0xc0000907e0 pc=0x64a7b7a4a021
created by runtime.createfing in goroutine 1
	runtime/mfinal.go:166 +0x3d

goroutine 19 gp=0xc000103180 m=nil [chan receive]:
runtime.gopark(0xc0002297c0?, 0xc00016c180?, 0x60?, 0xc7?, 0x64a7b7b27ea8?)
	runtime/proc.go:435 +0xce fp=0xc00008c718 sp=0xc00008c6f8 pc=0x64a7b7a428ee
runtime.chanrecv(0xc000110380, 0x0, 0x1)
	runtime/chan.go:664 +0x445 fp=0xc00008c790 sp=0xc00008c718 pc=0x64a7b79dee05
runtime.chanrecv1(0x0?, 0x0?)
	runtime/chan.go:506 +0x12 fp=0xc00008c7b8 sp=0xc00008c790 pc=0x64a7b79de992
runtime.unique_runtime_registerUniqueMapCleanup.func2(...)
	runtime/mgc.go:1796
runtime.unique_runtime_registerUniqueMapCleanup.gowrap1()
	runtime/mgc.go:1799 +0x2f fp=0xc00008c7e0 sp=0xc00008c7b8 pc=0x64a7b79f0def
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc00008c7e8 sp=0xc00008c7e0 pc=0x64a7b7a4a021
created by unique.runtime_registerUniqueMapCleanup in goroutine 1
	runtime/mgc.go:1794 +0x85

goroutine 20 gp=0xc000103500 m=nil [GC worker (idle)]:
runtime.gopark(0x64a7b962f2c0?, 0x1?, 0xf6?, 0x45?, 0x0?)
	runtime/proc.go:435 +0xce fp=0xc00008cf38 sp=0xc00008cf18 pc=0x64a7b7a428ee
runtime.gcBgMarkWorker(0xc0001117a0)
	runtime/mgc.go:1423 +0xe9 fp=0xc00008cfc8 sp=0xc00008cf38 pc=0x64a7b79f0109
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1339 +0x25 fp=0xc00008cfe0 sp=0xc00008cfc8 pc=0x64a7b79effe5
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc00008cfe8 sp=0xc00008cfe0 pc=0x64a7b7a4a021
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1339 +0x105

goroutine 21 gp=0xc0001036c0 m=nil [GC worker (idle)]:
runtime.gopark(0x3fbf62c640f7e?, 0x1?, 0x87?, 0xaa?, 0x0?)
	runtime/proc.go:435 +0xce fp=0xc00008d738 sp=0xc00008d718 pc=0x64a7b7a428ee
runtime.gcBgMarkWorker(0xc0001117a0)
	runtime/mgc.go:1423 +0xe9 fp=0xc00008d7c8 sp=0xc00008d738 pc=0x64a7b79f0109
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1339 +0x25 fp=0xc00008d7e0 sp=0xc00008d7c8 pc=0x64a7b79effe5
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc00008d7e8 sp=0xc00008d7e0 pc=0x64a7b7a4a021
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1339 +0x105

goroutine 22 gp=0xc000103880 m=nil [GC worker (idle)]:
runtime.gopark(0x3fbf60fff63af?, 0x3?, 0x51?, 0x97?, 0x0?)
	runtime/proc.go:435 +0xce fp=0xc00008df38 sp=0xc00008df18 pc=0x64a7b7a428ee
runtime.gcBgMarkWorker(0xc0001117a0)
	runtime/mgc.go:1423 +0xe9 fp=0xc00008dfc8 sp=0xc00008df38 pc=0x64a7b79f0109
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1339 +0x25 fp=0xc00008dfe0 sp=0xc00008dfc8 pc=0x64a7b79effe5
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc00008dfe8 sp=0xc00008dfe0 pc=0x64a7b7a4a021
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1339 +0x105

goroutine 23 gp=0xc000103a40 m=nil [GC worker (idle)]:
runtime.gopark(0x3fbf60fff7152?, 0x3?, 0x93?, 0xf7?, 0x0?)
	runtime/proc.go:435 +0xce fp=0xc00008e738 sp=0xc00008e718 pc=0x64a7b7a428ee
runtime.gcBgMarkWorker(0xc0001117a0)
	runtime/mgc.go:1423 +0xe9 fp=0xc00008e7c8 sp=0xc00008e738 pc=0x64a7b79f0109
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1339 +0x25 fp=0xc00008e7e0 sp=0xc00008e7c8 pc=0x64a7b79effe5
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc00008e7e8 sp=0xc00008e7e0 pc=0x64a7b7a4a021
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1339 +0x105

goroutine 24 gp=0xc000103c00 m=nil [GC worker (idle)]:
runtime.gopark(0x3fbf62d507189?, 0x1?, 0x4e?, 0x73?, 0x0?)
	runtime/proc.go:435 +0xce fp=0xc00008ef38 sp=0xc00008ef18 pc=0x64a7b7a428ee
runtime.gcBgMarkWorker(0xc0001117a0)
	runtime/mgc.go:1423 +0xe9 fp=0xc00008efc8 sp=0xc00008ef38 pc=0x64a7b79f0109
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1339 +0x25 fp=0xc00008efe0 sp=0xc00008efc8 pc=0x64a7b79effe5
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc00008efe8 sp=0xc00008efe0 pc=0x64a7b7a4a021
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1339 +0x105

goroutine 25 gp=0xc000103dc0 m=nil [GC worker (idle)]:
runtime.gopark(0x3fbf62d05f63a?, 0x1?, 0x24?, 0x11?, 0x0?)
	runtime/proc.go:435 +0xce fp=0xc00008f738 sp=0xc00008f718 pc=0x64a7b7a428ee
runtime.gcBgMarkWorker(0xc0001117a0)
	runtime/mgc.go:1423 +0xe9 fp=0xc00008f7c8 sp=0xc00008f738 pc=0x64a7b79f0109
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1339 +0x25 fp=0xc00008f7e0 sp=0xc00008f7c8 pc=0x64a7b79effe5
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc00008f7e8 sp=0xc00008f7e0 pc=0x64a7b7a4a021
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1339 +0x105

goroutine 26 gp=0xc000478000 m=nil [GC worker (idle)]:
runtime.gopark(0x64a7b962f2c0?, 0x1?, 0x3a?, 0x7?, 0x0?)
	runtime/proc.go:435 +0xce fp=0xc00008ff38 sp=0xc00008ff18 pc=0x64a7b7a428ee
runtime.gcBgMarkWorker(0xc0001117a0)
	runtime/mgc.go:1423 +0xe9 fp=0xc00008ffc8 sp=0xc00008ff38 pc=0x64a7b79f0109
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1339 +0x25 fp=0xc00008ffe0 sp=0xc00008ffc8 pc=0x64a7b79effe5
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc00008ffe8 sp=0xc00008ffe0 pc=0x64a7b7a4a021
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1339 +0x105

goroutine 34 gp=0xc0003b8000 m=nil [GC worker (idle)]:
runtime.gopark(0x64a7b962f2c0?, 0x1?, 0xd3?, 0x46?, 0x0?)
	runtime/proc.go:435 +0xce fp=0xc000484738 sp=0xc000484718 pc=0x64a7b7a428ee
runtime.gcBgMarkWorker(0xc0001117a0)
	runtime/mgc.go:1423 +0xe9 fp=0xc0004847c8 sp=0xc000484738 pc=0x64a7b79f0109
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1339 +0x25 fp=0xc0004847e0 sp=0xc0004847c8 pc=0x64a7b79effe5
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc0004847e8 sp=0xc0004847e0 pc=0x64a7b7a4a021
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1339 +0x105

goroutine 35 gp=0xc0003b81c0 m=nil [GC worker (idle)]:
runtime.gopark(0x3fbf60ffee25c?, 0x1?, 0x79?, 0x74?, 0x0?)
	runtime/proc.go:435 +0xce fp=0xc000484f38 sp=0xc000484f18 pc=0x64a7b7a428ee
runtime.gcBgMarkWorker(0xc0001117a0)
	runtime/mgc.go:1423 +0xe9 fp=0xc000484fc8 sp=0xc000484f38 pc=0x64a7b79f0109
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1339 +0x25 fp=0xc000484fe0 sp=0xc000484fc8 pc=0x64a7b79effe5
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc000484fe8 sp=0xc000484fe0 pc=0x64a7b7a4a021
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1339 +0x105

goroutine 36 gp=0xc0003b8380 m=nil [GC worker (idle)]:
runtime.gopark(0x3fbf62d05e2ed?, 0x1?, 0xb7?, 0x49?, 0x0?)
	runtime/proc.go:435 +0xce fp=0xc000485738 sp=0xc000485718 pc=0x64a7b7a428ee
runtime.gcBgMarkWorker(0xc0001117a0)
	runtime/mgc.go:1423 +0xe9 fp=0xc0004857c8 sp=0xc000485738 pc=0x64a7b79f0109
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1339 +0x25 fp=0xc0004857e0 sp=0xc0004857c8 pc=0x64a7b79effe5
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc0004857e8 sp=0xc0004857e0 pc=0x64a7b7a4a021
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1339 +0x105

goroutine 37 gp=0xc0003b8540 m=nil [GC worker (idle)]:
runtime.gopark(0x64a7b962f2c0?, 0x3?, 0x7?, 0x4f?, 0x0?)
	runtime/proc.go:435 +0xce fp=0xc000485f38 sp=0xc000485f18 pc=0x64a7b7a428ee
runtime.gcBgMarkWorker(0xc0001117a0)
	runtime/mgc.go:1423 +0xe9 fp=0xc000485fc8 sp=0xc000485f38 pc=0x64a7b79f0109
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1339 +0x25 fp=0xc000485fe0 sp=0xc000485fc8 pc=0x64a7b79effe5
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc000485fe8 sp=0xc000485fe0 pc=0x64a7b7a4a021
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1339 +0x105

goroutine 5 gp=0xc000003a40 m=nil [GC worker (idle)]:
runtime.gopark(0x3fbf60fff4eb4?, 0x3?, 0x54?, 0x29?, 0x0?)
	runtime/proc.go:435 +0xce fp=0xc000092738 sp=0xc000092718 pc=0x64a7b7a428ee
runtime.gcBgMarkWorker(0xc0001117a0)
	runtime/mgc.go:1423 +0xe9 fp=0xc0000927c8 sp=0xc000092738 pc=0x64a7b79f0109
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1339 +0x25 fp=0xc0000927e0 sp=0xc0000927c8 pc=0x64a7b79effe5
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc0000927e8 sp=0xc0000927e0 pc=0x64a7b7a4a021
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1339 +0x105

goroutine 6 gp=0xc000003c00 m=nil [GC worker (idle)]:
runtime.gopark(0x3fbf610034dde?, 0x3?, 0x8d?, 0x13?, 0x0?)
	runtime/proc.go:435 +0xce fp=0xc000092f38 sp=0xc000092f18 pc=0x64a7b7a428ee
runtime.gcBgMarkWorker(0xc0001117a0)
	runtime/mgc.go:1423 +0xe9 fp=0xc000092fc8 sp=0xc000092f38 pc=0x64a7b79f0109
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1339 +0x25 fp=0xc000092fe0 sp=0xc000092fc8 pc=0x64a7b79effe5
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc000092fe8 sp=0xc000092fe0 pc=0x64a7b7a4a021
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1339 +0x105

goroutine 7 gp=0xc000003dc0 m=nil [GC worker (idle)]:
runtime.gopark(0x3fbf60fff0551?, 0x3?, 0xfd?, 0x35?, 0x0?)
	runtime/proc.go:435 +0xce fp=0xc000093738 sp=0xc000093718 pc=0x64a7b7a428ee
runtime.gcBgMarkWorker(0xc0001117a0)
	runtime/mgc.go:1423 +0xe9 fp=0xc0000937c8 sp=0xc000093738 pc=0x64a7b79f0109
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1339 +0x25 fp=0xc0000937e0 sp=0xc0000937c8 pc=0x64a7b79effe5
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc0000937e8 sp=0xc0000937e0 pc=0x64a7b7a4a021
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1339 +0x105

goroutine 8 gp=0xc0000c8000 m=nil [GC worker (idle)]:
runtime.gopark(0x3fbf62c5331e9?, 0x1?, 0x8d?, 0xe5?, 0x0?)
	runtime/proc.go:435 +0xce fp=0xc000093f38 sp=0xc000093f18 pc=0x64a7b7a428ee
runtime.gcBgMarkWorker(0xc0001117a0)
	runtime/mgc.go:1423 +0xe9 fp=0xc000093fc8 sp=0xc000093f38 pc=0x64a7b79f0109
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1339 +0x25 fp=0xc000093fe0 sp=0xc000093fc8 pc=0x64a7b79effe5
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc000093fe8 sp=0xc000093fe0 pc=0x64a7b7a4a021
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1339 +0x105

goroutine 9 gp=0xc0000c81c0 m=nil [GC worker (idle)]:
runtime.gopark(0x3fbf610030717?, 0x1?, 0xaa?, 0x25?, 0x0?)
	runtime/proc.go:435 +0xce fp=0xc000480738 sp=0xc000480718 pc=0x64a7b7a428ee
runtime.gcBgMarkWorker(0xc0001117a0)
	runtime/mgc.go:1423 +0xe9 fp=0xc0004807c8 sp=0xc000480738 pc=0x64a7b79f0109
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1339 +0x25 fp=0xc0004807e0 sp=0xc0004807c8 pc=0x64a7b79effe5
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc0004807e8 sp=0xc0004807e0 pc=0x64a7b7a4a021
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1339 +0x105

goroutine 10 gp=0xc0000c8380 m=nil [GC worker (idle)]:
runtime.gopark(0x3fbf62ca87c98?, 0x1?, 0xfb?, 0xc?, 0x0?)
	runtime/proc.go:435 +0xce fp=0xc000480f38 sp=0xc000480f18 pc=0x64a7b7a428ee
runtime.gcBgMarkWorker(0xc0001117a0)
	runtime/mgc.go:1423 +0xe9 fp=0xc000480fc8 sp=0xc000480f38 pc=0x64a7b79f0109
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1339 +0x25 fp=0xc000480fe0 sp=0xc000480fc8 pc=0x64a7b79effe5
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc000480fe8 sp=0xc000480fe0 pc=0x64a7b7a4a021
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1339 +0x105

goroutine 11 gp=0xc0000c8540 m=nil [GC worker (idle)]:
runtime.gopark(0x3fbf60ffe5948?, 0x1?, 0x23?, 0xa2?, 0x0?)
	runtime/proc.go:435 +0xce fp=0xc000481738 sp=0xc000481718 pc=0x64a7b7a428ee
runtime.gcBgMarkWorker(0xc0001117a0)
	runtime/mgc.go:1423 +0xe9 fp=0xc0004817c8 sp=0xc000481738 pc=0x64a7b79f0109
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1339 +0x25 fp=0xc0004817e0 sp=0xc0004817c8 pc=0x64a7b79effe5
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc0004817e8 sp=0xc0004817e0 pc=0x64a7b7a4a021
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1339 +0x105

goroutine 12 gp=0xc0000c8700 m=nil [GC worker (idle)]:
runtime.gopark(0x3fbf62c420cca?, 0x1?, 0xca?, 0xc?, 0x0?)
	runtime/proc.go:435 +0xce fp=0xc000481f38 sp=0xc000481f18 pc=0x64a7b7a428ee
runtime.gcBgMarkWorker(0xc0001117a0)
	runtime/mgc.go:1423 +0xe9 fp=0xc000481fc8 sp=0xc000481f38 pc=0x64a7b79f0109
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1339 +0x25 fp=0xc000481fe0 sp=0xc000481fc8 pc=0x64a7b79effe5
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc000481fe8 sp=0xc000481fe0 pc=0x64a7b7a4a021
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1339 +0x105

goroutine 13 gp=0xc0000c88c0 m=nil [GC worker (idle)]:
runtime.gopark(0x64a7b962f2c0?, 0x1?, 0x76?, 0x60?, 0x0?)
	runtime/proc.go:435 +0xce fp=0xc000482738 sp=0xc000482718 pc=0x64a7b7a428ee
runtime.gcBgMarkWorker(0xc0001117a0)
	runtime/mgc.go:1423 +0xe9 fp=0xc0004827c8 sp=0xc000482738 pc=0x64a7b79f0109
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1339 +0x25 fp=0xc0004827e0 sp=0xc0004827c8 pc=0x64a7b79effe5
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc0004827e8 sp=0xc0004827e0 pc=0x64a7b7a4a021
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1339 +0x105

goroutine 1170 gp=0xc00048ac40 m=nil [IO wait]:
runtime.gopark(0x64a7b79e7046?, 0xc000489808?, 0x0?, 0xc0?, 0xb?)
	runtime/proc.go:435 +0xce fp=0xc004c23dd8 sp=0xc004c23db8 pc=0x64a7b7a428ee
runtime.netpollblock(0x64a7b7a65d78?, 0xb79dc226?, 0xa7?)
	runtime/netpoll.go:575 +0xf7 fp=0xc004c23e10 sp=0xc004c23dd8 pc=0x64a7b7a076f7
internal/poll.runtime_pollWait(0x7e7405985cc8, 0x72)
	runtime/netpoll.go:351 +0x85 fp=0xc004c23e30 sp=0xc004c23e10 pc=0x64a7b7a41b05
internal/poll.(*pollDesc).wait(0xc0005b2000?, 0xc003b007f1?, 0x0)
	internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc004c23e58 sp=0xc004c23e30 pc=0x64a7b7ac8f87
internal/poll.(*pollDesc).waitRead(...)
	internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Read(0xc0005b2000, {0xc003b007f1, 0x1, 0x1})
	internal/poll/fd_unix.go:165 +0x27a fp=0xc004c23ef0 sp=0xc004c23e58 pc=0x64a7b7aca27a
net.(*netFD).Read(0xc0005b2000, {0xc003b007f1?, 0x0?, 0x0?})
	net/fd_posix.go:55 +0x25 fp=0xc004c23f38 sp=0xc004c23ef0 pc=0x64a7b7b3f1c5
net.(*conn).Read(0xc003aa80c8, {0xc003b007f1?, 0xc00313b880?, 0x64a7b7e28e40?})
	net/net.go:194 +0x45 fp=0xc004c23f80 sp=0xc004c23f38 pc=0x64a7b7b4d585
net/http.(*connReader).backgroundRead(0xc003b007e0)
	net/http/server.go:690 +0x37 fp=0xc004c23fc8 sp=0xc004c23f80 pc=0x64a7b7d38e17
net/http.(*connReader).startBackgroundRead.gowrap2()
	net/http/server.go:686 +0x25 fp=0xc004c23fe0 sp=0xc004c23fc8 pc=0x64a7b7d38d45
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc004c23fe8 sp=0xc004c23fe0 pc=0x64a7b7a4a021
created by net/http.(*connReader).startBackgroundRead in goroutine 914
	net/http/server.go:686 +0xb6

goroutine 914 gp=0xc000502fc0 m=nil [select]:
runtime.gopark(0xc00004ba68?, 0x2?, 0x0?, 0x5?, 0xc00004b80c?)
	runtime/proc.go:435 +0xce fp=0xc00004b620 sp=0xc00004b600 pc=0x64a7b7a428ee
runtime.selectgo(0xc00004ba68, 0xc00004b808, 0x8d9?, 0x0, 0x1?, 0x1)
	runtime/select.go:351 +0x837 fp=0xc00004b758 sp=0xc00004b620 pc=0x64a7b7a211f7
github.com/ollama/ollama/runner/ollamarunner.(*Server).completion(0xc000035d40, {0x64a7b8d1a4f8, 0xc004b66000}, 0xc004ac5400)
	github.com/ollama/ollama/runner/ollamarunner/runner.go:649 +0xad0 fp=0xc00004bac0 sp=0xc00004b758 pc=0x64a7b7ecde70
github.com/ollama/ollama/runner/ollamarunner.(*Server).completion-fm({0x64a7b8d1a4f8?, 0xc004b66000?}, 0xc0005f7b40?)
	<autogenerated>:1 +0x36 fp=0xc00004baf0 sp=0xc00004bac0 pc=0x64a7b7ed0a36
net/http.HandlerFunc.ServeHTTP(0xc00065c780?, {0x64a7b8d1a4f8?, 0xc004b66000?}, 0xc0005f7b60?)
	net/http/server.go:2294 +0x29 fp=0xc00004bb18 sp=0xc00004baf0 pc=0x64a7b7d40a49
net/http.(*ServeMux).ServeHTTP(0x64a7b79e7125?, {0x64a7b8d1a4f8, 0xc004b66000}, 0xc004ac5400)
	net/http/server.go:2822 +0x1c4 fp=0xc00004bb68 sp=0xc00004bb18 pc=0x64a7b7d42944
net/http.serverHandler.ServeHTTP({0x64a7b8d16b10?}, {0x64a7b8d1a4f8?, 0xc004b66000?}, 0x1?)
	net/http/server.go:3301 +0x8e fp=0xc00004bb98 sp=0xc00004bb68 pc=0x64a7b7d603ce
net/http.(*conn).serve(0xc0005c2120, {0x64a7b8d1c5a8, 0xc0005c0b10})
	net/http/server.go:2102 +0x625 fp=0xc00004bfb8 sp=0xc00004bb98 pc=0x64a7b7d3ef45
net/http.(*Server).Serve.gowrap3()
	net/http/server.go:3454 +0x28 fp=0xc00004bfe0 sp=0xc00004bfb8 pc=0x64a7b7d44808
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc00004bfe8 sp=0xc00004bfe0 pc=0x64a7b7a4a021
created by net/http.(*Server).Serve in goroutine 1
	net/http/server.go:3454 +0x485

rax    0x7e6eb833d960
rbx    0x7e6eb833d8d0
rcx    0x2
rdx    0x7e707c1c7840
rdi    0x0
rsi    0x7e6e6db39030
rbp    0x7e707c1c7830
rsp    0x7e70937fdc48
r8     0x4
r9     0xc00011c040
r10    0x1
r11    0x206
r12    0x0
r13    0x7e6eb80038c8
r14    0xa52
r15    0x7e6eb833d8d0
rip    0x64a7b881cfd0
rflags 0x10206
cs     0x33
fs     0x0
gs     0x0
[GIN] 2025/03/12 - 14:23:47 | 500 |  7.998409014s |      172.17.0.2 | POST     "/api/chat"
time=2025-03-12T14:23:47.709Z level=ERROR source=server.go:449 msg="llama runner terminated" error="exit status 2"

here's the logs when using ollama run..

time=2025-03-12T14:42:20.304Z level=INFO source=images.go:432 msg="total blobs: 55"
time=2025-03-12T14:42:20.304Z level=INFO source=images.go:439 msg="total unused blobs removed: 0"
time=2025-03-12T14:42:20.304Z level=INFO source=routes.go:1292 msg="Listening on [::]:11434 (version 0.6.0)"
time=2025-03-12T14:42:20.304Z level=INFO source=gpu.go:217 msg="looking for compatible GPUs"
time=2025-03-12T14:42:20.540Z level=INFO source=types.go:130 msg="inference compute" id=GPU-07f2832b-0aaa-0fda-9cc3-d53aecc3d142 library=cuda variant=v12 compute=8.9 driver=12.8 name="NVIDIA GeForce RTX 4090 Laptop GPU" total="15.6 GiB" available="14.5 GiB"
[GIN] 2025/03/12 - 14:42:26 | 200 |      27.388µs |       127.0.0.1 | HEAD     "/"
[GIN] 2025/03/12 - 14:42:26 | 200 |   26.808883ms |       127.0.0.1 | POST     "/api/show"
time=2025-03-12T14:42:26.763Z level=INFO source=server.go:105 msg="system memory" total="62.4 GiB" free="54.9 GiB" free_swap="0 B"
time=2025-03-12T14:42:26.764Z level=INFO source=server.go:138 msg=offload library=cuda layers.requested=-1 layers.model=63 layers.offload=49 layers.split="" memory.available="[14.5 GiB]" memory.gpu_overhead="0 B" memory.required.full="18.8 GiB" memory.required.partial="14.3 GiB" memory.required.kv="992.0 MiB" memory.required.allocations="[14.3 GiB]" memory.weights.total="15.3 GiB" memory.weights.repeating="14.2 GiB" memory.weights.nonrepeating="1.1 GiB" memory.graph.full="522.5 MiB" memory.graph.partial="1.6 GiB"
time=2025-03-12T14:42:26.813Z level=WARN source=ggml.go:149 msg="key not found" key=tokenizer.ggml.pretokenizer default="(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\\r\\n\\p{L}\\p{N}]?\\p{L}+|\\p{N}{1,3}| ?[^\\s\\p{L}\\p{N}]+[\\r\\n]*|\\s*[\\r\\n]+|\\s+(?!\\S)|\\s+"
time=2025-03-12T14:42:26.815Z level=WARN source=ggml.go:149 msg="key not found" key=tokenizer.ggml.add_eot_token default=false
time=2025-03-12T14:42:26.816Z level=WARN source=ggml.go:149 msg="key not found" key=tokenizer.ggml.pretokenizer default="(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\\r\\n\\p{L}\\p{N}]?\\p{L}+|\\p{N}{1,3}| ?[^\\s\\p{L}\\p{N}]+[\\r\\n]*|\\s*[\\r\\n]+|\\s+(?!\\S)|\\s+"
time=2025-03-12T14:42:26.819Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.attention.layer_norm_rms_epsilon default=9.999999974752427e-07
time=2025-03-12T14:42:26.819Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.rope.local.freq_base default=10000
time=2025-03-12T14:42:26.819Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06
time=2025-03-12T14:42:26.819Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.rope.freq_scale default=1
time=2025-03-12T14:42:26.819Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.final_logit_softcapping default=30
time=2025-03-12T14:42:26.819Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.mm_tokens_per_image default=256
time=2025-03-12T14:42:26.819Z level=INFO source=server.go:405 msg="starting llama server" cmd="/usr/local/bin/ollama runner --ollama-engine --model /home/arch/Desktop/AIstuff/llms/models/ollama/blobs/sha256-afa0ea2ef463c87a1eebb9af070e76a353107493b5d9a62e5e66f65a65409541 --ctx-size 2048 --batch-size 512 --n-gpu-layers 49 --threads 6 --parallel 1 --port 38197"
time=2025-03-12T14:42:26.820Z level=INFO source=sched.go:450 msg="loaded runners" count=1
time=2025-03-12T14:42:26.820Z level=INFO source=server.go:585 msg="waiting for llama runner to start responding"
time=2025-03-12T14:42:26.820Z level=INFO source=server.go:619 msg="waiting for server to become available" status="llm server error"
time=2025-03-12T14:42:26.827Z level=INFO source=runner.go:882 msg="starting ollama engine"
time=2025-03-12T14:42:26.827Z level=INFO source=runner.go:938 msg="Server listening on 127.0.0.1:38197"
time=2025-03-12T14:42:26.875Z level=WARN source=ggml.go:149 msg="key not found" key=general.name default=""
time=2025-03-12T14:42:26.875Z level=WARN source=ggml.go:149 msg="key not found" key=general.description default=""
time=2025-03-12T14:42:26.875Z level=INFO source=ggml.go:67 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=1247 num_key_values=36
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 4090 Laptop GPU, compute capability 8.9, VMM: yes
load_backend: loaded CUDA backend from /usr/local/lib/ollama/cuda_v12/libggml-cuda.so
load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-alderlake.so
time=2025-03-12T14:42:26.930Z level=INFO source=ggml.go:109 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX_VNNI=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=500,600,610,700,750,800,860,870,890,900,1200 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc)
time=2025-03-12T14:42:27.037Z level=INFO source=ggml.go:289 msg="model weights" buffer=CPU size="6.0 GiB"
time=2025-03-12T14:42:27.037Z level=INFO source=ggml.go:289 msg="model weights" buffer=CUDA0 size="11.2 GiB"
time=2025-03-12T14:42:27.072Z level=INFO source=server.go:619 msg="waiting for server to become available" status="llm server loading model"
time=2025-03-12T14:42:29.261Z level=INFO source=ggml.go:356 msg="compute graph" backend=CUDA0 buffer_type=CUDA0
time=2025-03-12T14:42:29.261Z level=INFO source=ggml.go:356 msg="compute graph" backend=CPU buffer_type=CUDA_Host
time=2025-03-12T14:42:29.262Z level=WARN source=ggml.go:149 msg="key not found" key=tokenizer.ggml.pretokenizer default="(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\\r\\n\\p{L}\\p{N}]?\\p{L}+|\\p{N}{1,3}| ?[^\\s\\p{L}\\p{N}]+[\\r\\n]*|\\s*[\\r\\n]+|\\s+(?!\\S)|\\s+"
time=2025-03-12T14:42:29.263Z level=WARN source=ggml.go:149 msg="key not found" key=tokenizer.ggml.add_eot_token default=false
time=2025-03-12T14:42:29.264Z level=WARN source=ggml.go:149 msg="key not found" key=tokenizer.ggml.pretokenizer default="(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\\r\\n\\p{L}\\p{N}]?\\p{L}+|\\p{N}{1,3}| ?[^\\s\\p{L}\\p{N}]+[\\r\\n]*|\\s*[\\r\\n]+|\\s+(?!\\S)|\\s+"
time=2025-03-12T14:42:29.267Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.attention.layer_norm_rms_epsilon default=9.999999974752427e-07
time=2025-03-12T14:42:29.267Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.rope.local.freq_base default=10000
time=2025-03-12T14:42:29.267Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06
time=2025-03-12T14:42:29.267Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.rope.freq_scale default=1
time=2025-03-12T14:42:29.267Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.final_logit_softcapping default=30
time=2025-03-12T14:42:29.267Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.mm_tokens_per_image default=256
time=2025-03-12T14:42:29.343Z level=INFO source=server.go:624 msg="llama runner started in 2.52 seconds"
[GIN] 2025/03/12 - 14:42:29 | 200 |  3.024439048s |       127.0.0.1 | POST     "/api/generate"
[GIN] 2025/03/12 - 14:42:48 | 200 |  4.071745585s |       127.0.0.1 | POST     "/api/chat"

@nachocheeseburger commented on GitHub (Mar 12, 2025): here's the logs when using api.. ```2025/03/12 14:23:03 routes.go:1225: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:2048 OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/home/arch/Desktop/AIstuff/llms/models/ollama OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]" time=2025-03-12T14:23:03.214Z level=INFO source=images.go:432 msg="total blobs: 55" time=2025-03-12T14:23:03.214Z level=INFO source=images.go:439 msg="total unused blobs removed: 0" time=2025-03-12T14:23:03.215Z level=INFO source=routes.go:1292 msg="Listening on [::]:11434 (version 0.6.0)" time=2025-03-12T14:23:03.215Z level=INFO source=gpu.go:217 msg="looking for compatible GPUs" time=2025-03-12T14:23:03.447Z level=INFO source=types.go:130 msg="inference compute" id=GPU-07f2832b-0aaa-0fda-9cc3-d53aecc3d142 library=cuda variant=v12 compute=8.9 driver=12.8 name="NVIDIA GeForce RTX 4090 Laptop GPU" total="15.6 GiB" available="14.5 GiB" time=2025-03-12T14:23:29.240Z level=INFO source=server.go:105 msg="system memory" total="62.4 GiB" free="54.8 GiB" free_swap="0 B" time=2025-03-12T14:23:29.240Z level=INFO source=server.go:138 msg=offload library=cuda layers.requested=-1 layers.model=49 layers.offload=33 layers.split="" memory.available="[14.5 GiB]" memory.gpu_overhead="0 B" memory.required.full="20.8 GiB" memory.required.partial="14.3 GiB" memory.required.kv="11.7 GiB" memory.required.allocations="[14.3 GiB]" memory.weights.total="17.7 GiB" memory.weights.repeating="17.0 GiB" memory.weights.nonrepeating="787.5 MiB" memory.graph.full="1.1 GiB" memory.graph.partial="1.3 GiB" time=2025-03-12T14:23:29.288Z level=WARN source=ggml.go:149 msg="key not found" key=tokenizer.ggml.pretokenizer default="(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\\r\\n\\p{L}\\p{N}]?\\p{L}+|\\p{N}{1,3}| ?[^\\s\\p{L}\\p{N}]+[\\r\\n]*|\\s*[\\r\\n]+|\\s+(?!\\S)|\\s+" time=2025-03-12T14:23:29.289Z level=WARN source=ggml.go:149 msg="key not found" key=tokenizer.ggml.add_eot_token default=false time=2025-03-12T14:23:29.291Z level=WARN source=ggml.go:149 msg="key not found" key=tokenizer.ggml.pretokenizer default="(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\\r\\n\\p{L}\\p{N}]?\\p{L}+|\\p{N}{1,3}| ?[^\\s\\p{L}\\p{N}]+[\\r\\n]*|\\s*[\\r\\n]+|\\s+(?!\\S)|\\s+" time=2025-03-12T14:23:29.294Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.attention.layer_norm_rms_epsilon default=9.999999974752427e-07 time=2025-03-12T14:23:29.294Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.rope.local.freq_base default=10000 time=2025-03-12T14:23:29.294Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06 time=2025-03-12T14:23:29.294Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.rope.freq_scale default=1 time=2025-03-12T14:23:29.294Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.final_logit_softcapping default=30 time=2025-03-12T14:23:29.294Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.mm_tokens_per_image default=256 time=2025-03-12T14:23:29.294Z level=INFO source=server.go:405 msg="starting llama server" cmd="/usr/local/bin/ollama runner --ollama-engine --model /home/arch/Desktop/AIstuff/llms/models/ollama/blobs/sha256-adca500fad9b54c565ae672184e0c9eb690eb6014ba63f8ec13849d4f73a32d3 --ctx-size 32000 --batch-size 512 --n-gpu-layers 33 --threads 6 --parallel 1 --port 36283" time=2025-03-12T14:23:29.294Z level=INFO source=sched.go:450 msg="loaded runners" count=1 time=2025-03-12T14:23:29.294Z level=INFO source=server.go:585 msg="waiting for llama runner to start responding" time=2025-03-12T14:23:29.295Z level=INFO source=server.go:619 msg="waiting for server to become available" status="llm server error" time=2025-03-12T14:23:29.301Z level=INFO source=runner.go:882 msg="starting ollama engine" time=2025-03-12T14:23:29.301Z level=INFO source=runner.go:938 msg="Server listening on 127.0.0.1:36283" time=2025-03-12T14:23:29.349Z level=WARN source=ggml.go:149 msg="key not found" key=general.name default="" time=2025-03-12T14:23:29.349Z level=WARN source=ggml.go:149 msg="key not found" key=general.description default="" time=2025-03-12T14:23:29.349Z level=INFO source=ggml.go:67 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=1065 num_key_values=36 ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 CUDA devices: Device 0: NVIDIA GeForce RTX 4090 Laptop GPU, compute capability 8.9, VMM: yes load_backend: loaded CUDA backend from /usr/local/lib/ollama/cuda_v12/libggml-cuda.so load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-alderlake.so time=2025-03-12T14:23:29.386Z level=INFO source=ggml.go:109 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX_VNNI=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=500,600,610,700,750,800,860,870,890,900,1200 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc) time=2025-03-12T14:23:29.493Z level=INFO source=ggml.go:289 msg="model weights" buffer=CUDA0 size="4.1 GiB" time=2025-03-12T14:23:29.493Z level=INFO source=ggml.go:289 msg="model weights" buffer=CPU size="4.2 GiB" time=2025-03-12T14:23:29.549Z level=INFO source=server.go:619 msg="waiting for server to become available" status="llm server loading model" time=2025-03-12T14:23:30.436Z level=INFO source=ggml.go:356 msg="compute graph" backend=CUDA0 buffer_type=CUDA0 time=2025-03-12T14:23:30.436Z level=INFO source=ggml.go:356 msg="compute graph" backend=CPU buffer_type=CUDA_Host time=2025-03-12T14:23:30.436Z level=WARN source=ggml.go:149 msg="key not found" key=tokenizer.ggml.pretokenizer default="(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\\r\\n\\p{L}\\p{N}]?\\p{L}+|\\p{N}{1,3}| ?[^\\s\\p{L}\\p{N}]+[\\r\\n]*|\\s*[\\r\\n]+|\\s+(?!\\S)|\\s+" time=2025-03-12T14:23:30.437Z level=WARN source=ggml.go:149 msg="key not found" key=tokenizer.ggml.add_eot_token default=false time=2025-03-12T14:23:30.439Z level=WARN source=ggml.go:149 msg="key not found" key=tokenizer.ggml.pretokenizer default="(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\\r\\n\\p{L}\\p{N}]?\\p{L}+|\\p{N}{1,3}| ?[^\\s\\p{L}\\p{N}]+[\\r\\n]*|\\s*[\\r\\n]+|\\s+(?!\\S)|\\s+" time=2025-03-12T14:23:30.441Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.attention.layer_norm_rms_epsilon default=9.999999974752427e-07 time=2025-03-12T14:23:30.441Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.rope.local.freq_base default=10000 time=2025-03-12T14:23:30.441Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06 time=2025-03-12T14:23:30.441Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.rope.freq_scale default=1 time=2025-03-12T14:23:30.441Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.final_logit_softcapping default=30 time=2025-03-12T14:23:30.441Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.mm_tokens_per_image default=256 time=2025-03-12T14:23:30.555Z level=INFO source=server.go:624 msg="llama runner started in 1.26 seconds" ggml_backend_cuda_buffer_type_alloc_buffer: allocating 15554.32 MiB on device 0: cudaMalloc failed: out of memory ggml_gallocr_reserve_n: failed to allocate CUDA0 buffer of size 16309888000 SIGSEGV: segmentation violation PC=0x6415ab24ffd0 m=164 sigcode=1 addr=0x58 signal arrived during cgo execution goroutine 10 gp=0xc000285dc0 m=164 mp=0xc00350e808 [syscall]: runtime.cgocall(0x6415ab2a3fe0, 0xc0002dbb00) runtime/cgocall.go:167 +0x4b fp=0xc0002dbad8 sp=0xc0002dbaa0 pc=0x6415aa47260b github.com/ollama/ollama/ml/backend/ggml._Cfunc_ggml_backend_sched_graph_compute_async(0x74fbe801a170, 0x7500404de2e0) _cgo_gotypes.go:485 +0x4a fp=0xc0002dbb00 sp=0xc0002dbad8 pc=0x6415aa85d1aa github.com/ollama/ollama/ml/backend/ggml.Context.Compute.func1(...) github.com/ollama/ollama/ml/backend/ggml/ggml.go:497 github.com/ollama/ollama/ml/backend/ggml.Context.Compute({0xc00059e040, 0x74fbb8002fd0, 0x7500404de2e0, 0x0, 0x2000}, {0xc06ffa41a0, 0x1, 0x6415ab75f210?}) github.com/ollama/ollama/ml/backend/ggml/ggml.go:497 +0xbd fp=0xc0002dbb90 sp=0xc0002dbb00 pc=0x6415aa865a7d github.com/ollama/ollama/ml/backend/ggml.(*Context).Compute(0xc06fe4a2d0?, {0xc06ffa41a0?, 0x200?, 0x0?}) <autogenerated>:1 +0x72 fp=0xc0002dbc08 sp=0xc0002dbb90 pc=0x6415aa86b4f2 github.com/ollama/ollama/model.Forward({0x6415ab756c00, 0xc06fe4a2d0}, {0x6415ab74e2b0, 0xc00034e000}, {{0xc00440f800, 0x200, 0x200}, {0x0, 0x0, 0x0}, ...}) github.com/ollama/ollama/model/model.go:303 +0x218 fp=0xc0002dbcf0 sp=0xc0002dbc08 pc=0x6415aa892718 github.com/ollama/ollama/runner/ollamarunner.(*Server).processBatch(0xc0005c19e0) github.com/ollama/ollama/runner/ollamarunner/runner.go:395 +0x3bb fp=0xc0002dbf98 sp=0xc0002dbcf0 pc=0x6415aa8fedfb github.com/ollama/ollama/runner/ollamarunner.(*Server).run(0xc0005c19e0, {0x6415ab74f5e0, 0xc000524370}) github.com/ollama/ollama/runner/ollamarunner/runner.go:321 +0x4e fp=0xc0002dbfb8 sp=0xc0002dbf98 pc=0x6415aa8fe9ee github.com/ollama/ollama/runner/ollamarunner.Execute.gowrap2() github.com/ollama/ollama/runner/ollamarunner/runner.go:919 +0x28 fp=0xc0002dbfe0 sp=0xc0002dbfb8 pc=0x6415aa903668 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0002dbfe8 sp=0xc0002dbfe0 pc=0x6415aa47d021 created by github.com/ollama/ollama/runner/ollamarunner.Execute in goroutine 1 github.com/ollama/ollama/runner/ollamarunner/runner.go:919 +0xa9c goroutine 1 gp=0xc000002380 m=nil [IO wait]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc0002d7648 sp=0xc0002d7628 pc=0x6415aa4758ee runtime.netpollblock(0xc0002d7698?, 0xaa40f226?, 0x15?) runtime/netpoll.go:575 +0xf7 fp=0xc0002d7680 sp=0xc0002d7648 pc=0x6415aa43a6f7 internal/poll.runtime_pollWait(0x75018525feb0, 0x72) runtime/netpoll.go:351 +0x85 fp=0xc0002d76a0 sp=0xc0002d7680 pc=0x6415aa474b05 internal/poll.(*pollDesc).wait(0xc0004a3e00?, 0x0?, 0x0) internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc0002d76c8 sp=0xc0002d76a0 pc=0x6415aa4fbf87 internal/poll.(*pollDesc).waitRead(...) internal/poll/fd_poll_runtime.go:89 internal/poll.(*FD).Accept(0xc0004a3e00) internal/poll/fd_unix.go:620 +0x295 fp=0xc0002d7770 sp=0xc0002d76c8 pc=0x6415aa501355 net.(*netFD).accept(0xc0004a3e00) net/fd_unix.go:172 +0x29 fp=0xc0002d7828 sp=0xc0002d7770 pc=0x6415aa574169 net.(*TCPListener).accept(0xc000520800) net/tcpsock_posix.go:159 +0x1b fp=0xc0002d7878 sp=0xc0002d7828 pc=0x6415aa589b1b net.(*TCPListener).Accept(0xc000520800) net/tcpsock.go:380 +0x30 fp=0xc0002d78a8 sp=0xc0002d7878 pc=0x6415aa5889d0 net/http.(*onceCloseListener).Accept(0xc0004c0120?) <autogenerated>:1 +0x24 fp=0xc0002d78c0 sp=0xc0002d78a8 pc=0x6415aa79fb44 net/http.(*Server).Serve(0xc000587100, {0x6415ab74d318, 0xc000520800}) net/http/server.go:3424 +0x30c fp=0xc0002d79f0 sp=0xc0002d78c0 pc=0x6415aa77740c github.com/ollama/ollama/runner/ollamarunner.Execute({0xc000034150, 0xe, 0xf}) github.com/ollama/ollama/runner/ollamarunner/runner.go:939 +0xe6a fp=0xc0002d7d08 sp=0xc0002d79f0 pc=0x6415aa9033aa github.com/ollama/ollama/runner.Execute({0xc000034130?, 0x0?, 0x0?}) github.com/ollama/ollama/runner/runner.go:20 +0xc9 fp=0xc0002d7d30 sp=0xc0002d7d08 pc=0x6415aa903f09 github.com/ollama/ollama/cmd.NewCLI.func2(0xc000586f00?, {0x6415ab2bf054?, 0x4?, 0x6415ab2bf058?}) github.com/ollama/ollama/cmd/cmd.go:1285 +0x45 fp=0xc0002d7d58 sp=0xc0002d7d30 pc=0x6415ab0746a5 github.com/spf13/cobra.(*Command).execute(0xc0004c2f08, {0xc0005d8c30, 0xf, 0xf}) github.com/spf13/cobra@v1.7.0/command.go:940 +0x85c fp=0xc0002d7e78 sp=0xc0002d7d58 pc=0x6415aa5ed2fc github.com/spf13/cobra.(*Command).ExecuteC(0xc00049af08) github.com/spf13/cobra@v1.7.0/command.go:1068 +0x3a5 fp=0xc0002d7f30 sp=0xc0002d7e78 pc=0x6415aa5edb45 github.com/spf13/cobra.(*Command).Execute(...) github.com/spf13/cobra@v1.7.0/command.go:992 github.com/spf13/cobra.(*Command).ExecuteContext(...) github.com/spf13/cobra@v1.7.0/command.go:985 main.main() github.com/ollama/ollama/main.go:12 +0x4d fp=0xc0002d7f50 sp=0xc0002d7f30 pc=0x6415ab074a0d runtime.main() runtime/proc.go:283 +0x29d fp=0xc0002d7fe0 sp=0xc0002d7f50 pc=0x6415aa441cfd runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0002d7fe8 sp=0xc0002d7fe0 pc=0x6415aa47d021 goroutine 2 gp=0xc000002e00 m=nil [force gc (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000090fa8 sp=0xc000090f88 pc=0x6415aa4758ee runtime.goparkunlock(...) runtime/proc.go:441 runtime.forcegchelper() runtime/proc.go:348 +0xb8 fp=0xc000090fe0 sp=0xc000090fa8 pc=0x6415aa442038 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000090fe8 sp=0xc000090fe0 pc=0x6415aa47d021 created by runtime.init.7 in goroutine 1 runtime/proc.go:336 +0x1a goroutine 3 gp=0xc000003340 m=nil [GC sweep wait]: runtime.gopark(0x1?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000091780 sp=0xc000091760 pc=0x6415aa4758ee runtime.goparkunlock(...) runtime/proc.go:441 runtime.bgsweep(0xc000042100) runtime/mgcsweep.go:316 +0xdf fp=0xc0000917c8 sp=0xc000091780 pc=0x6415aa42c85f runtime.gcenable.gowrap1() runtime/mgc.go:204 +0x25 fp=0xc0000917e0 sp=0xc0000917c8 pc=0x6415aa420c45 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0000917e8 sp=0xc0000917e0 pc=0x6415aa47d021 created by runtime.gcenable in goroutine 1 runtime/mgc.go:204 +0x66 goroutine 4 gp=0xc000003500 m=nil [GC scavenge wait]: runtime.gopark(0x10000?, 0x6415ab475a00?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000091f78 sp=0xc000091f58 pc=0x6415aa4758ee runtime.goparkunlock(...) runtime/proc.go:441 runtime.(*scavengerState).park(0x6415abfb3b40) runtime/mgcscavenge.go:425 +0x49 fp=0xc000091fa8 sp=0xc000091f78 pc=0x6415aa42a2a9 runtime.bgscavenge(0xc000042100) runtime/mgcscavenge.go:658 +0x59 fp=0xc000091fc8 sp=0xc000091fa8 pc=0x6415aa42a839 runtime.gcenable.gowrap2() runtime/mgc.go:205 +0x25 fp=0xc000091fe0 sp=0xc000091fc8 pc=0x6415aa420be5 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000091fe8 sp=0xc000091fe0 pc=0x6415aa47d021 created by runtime.gcenable in goroutine 1 runtime/mgc.go:205 +0xa5 goroutine 5 gp=0xc000003dc0 m=nil [finalizer wait]: runtime.gopark(0x1b8?, 0xc000002380?, 0x1?, 0x23?, 0xc000090688?) runtime/proc.go:435 +0xce fp=0xc000090630 sp=0xc000090610 pc=0x6415aa4758ee runtime.runfinq() runtime/mfinal.go:196 +0x107 fp=0xc0000907e0 sp=0xc000090630 pc=0x6415aa41fc07 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0000907e8 sp=0xc0000907e0 pc=0x6415aa47d021 created by runtime.createfing in goroutine 1 runtime/mfinal.go:166 +0x3d goroutine 6 gp=0xc00016e8c0 m=nil [chan receive]: runtime.gopark(0xc0001c74a0?, 0xc0700c0018?, 0x60?, 0x27?, 0x6415aa55aea8?) runtime/proc.go:435 +0xce fp=0xc000092718 sp=0xc0000926f8 pc=0x6415aa4758ee runtime.chanrecv(0xc0000c8380, 0x0, 0x1) runtime/chan.go:664 +0x445 fp=0xc000092790 sp=0xc000092718 pc=0x6415aa411e05 runtime.chanrecv1(0x0?, 0x0?) runtime/chan.go:506 +0x12 fp=0xc0000927b8 sp=0xc000092790 pc=0x6415aa411992 runtime.unique_runtime_registerUniqueMapCleanup.func2(...) runtime/mgc.go:1796 runtime.unique_runtime_registerUniqueMapCleanup.gowrap1() runtime/mgc.go:1799 +0x2f fp=0xc0000927e0 sp=0xc0000927b8 pc=0x6415aa423def runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0000927e8 sp=0xc0000927e0 pc=0x6415aa47d021 created by unique.runtime_registerUniqueMapCleanup in goroutine 1 runtime/mgc.go:1794 +0x85 goroutine 7 gp=0xc00016ee00 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000092f38 sp=0xc000092f18 pc=0x6415aa4758ee runtime.gcBgMarkWorker(0xc0000c97a0) runtime/mgc.go:1423 +0xe9 fp=0xc000092fc8 sp=0xc000092f38 pc=0x6415aa423109 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc000092fe0 sp=0xc000092fc8 pc=0x6415aa422fe5 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000092fe8 sp=0xc000092fe0 pc=0x6415aa47d021 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 18 gp=0xc000502380 m=nil [GC worker (idle)]: runtime.gopark(0x3fbf24fd87c1a?, 0x1?, 0x3?, 0x8f?, 0x0?) runtime/proc.go:435 +0xce fp=0xc00008c738 sp=0xc00008c718 pc=0x6415aa4758ee runtime.gcBgMarkWorker(0xc0000c97a0) runtime/mgc.go:1423 +0xe9 fp=0xc00008c7c8 sp=0xc00008c738 pc=0x6415aa423109 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc00008c7e0 sp=0xc00008c7c8 pc=0x6415aa422fe5 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc00008c7e8 sp=0xc00008c7e0 pc=0x6415aa47d021 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 19 gp=0xc000502540 m=nil [GC worker (idle)]: runtime.gopark(0x3fbf245d1a7cd?, 0x3?, 0x59?, 0xef?, 0x0?) runtime/proc.go:435 +0xce fp=0xc00008cf38 sp=0xc00008cf18 pc=0x6415aa4758ee runtime.gcBgMarkWorker(0xc0000c97a0) runtime/mgc.go:1423 +0xe9 fp=0xc00008cfc8 sp=0xc00008cf38 pc=0x6415aa423109 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc00008cfe0 sp=0xc00008cfc8 pc=0x6415aa422fe5 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc00008cfe8 sp=0xc00008cfe0 pc=0x6415aa47d021 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 20 gp=0xc000502700 m=nil [GC worker (idle)]: runtime.gopark(0x6415ac0622c0?, 0x3?, 0x92?, 0x4e?, 0x0?) runtime/proc.go:435 +0xce fp=0xc00008d738 sp=0xc00008d718 pc=0x6415aa4758ee runtime.gcBgMarkWorker(0xc0000c97a0) runtime/mgc.go:1423 +0xe9 fp=0xc00008d7c8 sp=0xc00008d738 pc=0x6415aa423109 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc00008d7e0 sp=0xc00008d7c8 pc=0x6415aa422fe5 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc00008d7e8 sp=0xc00008d7e0 pc=0x6415aa47d021 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 34 gp=0xc000284380 m=nil [GC worker (idle)]: runtime.gopark(0x3fbf245d1aed8?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc00029a738 sp=0xc00029a718 pc=0x6415aa4758ee runtime.gcBgMarkWorker(0xc0000c97a0) runtime/mgc.go:1423 +0xe9 fp=0xc00029a7c8 sp=0xc00029a738 pc=0x6415aa423109 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc00029a7e0 sp=0xc00029a7c8 pc=0x6415aa422fe5 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc00029a7e8 sp=0xc00029a7e0 pc=0x6415aa47d021 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 35 gp=0xc000284540 m=nil [GC worker (idle)]: runtime.gopark(0x3fbf245d1c08e?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc00029af38 sp=0xc00029af18 pc=0x6415aa4758ee runtime.gcBgMarkWorker(0xc0000c97a0) runtime/mgc.go:1423 +0xe9 fp=0xc00029afc8 sp=0xc00029af38 pc=0x6415aa423109 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc00029afe0 sp=0xc00029afc8 pc=0x6415aa422fe5 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc00029afe8 sp=0xc00029afe0 pc=0x6415aa47d021 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 50 gp=0xc000584000 m=nil [GC worker (idle)]: runtime.gopark(0x3fbf25be6146f?, 0x1?, 0x6c?, 0xfd?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000296738 sp=0xc000296718 pc=0x6415aa4758ee runtime.gcBgMarkWorker(0xc0000c97a0) runtime/mgc.go:1423 +0xe9 fp=0xc0002967c8 sp=0xc000296738 pc=0x6415aa423109 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc0002967e0 sp=0xc0002967c8 pc=0x6415aa422fe5 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0002967e8 sp=0xc0002967e0 pc=0x6415aa47d021 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 51 gp=0xc0005841c0 m=nil [GC worker (idle)]: runtime.gopark(0x3fbf245d1c14c?, 0x3?, 0x82?, 0xc?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000296f38 sp=0xc000296f18 pc=0x6415aa4758ee runtime.gcBgMarkWorker(0xc0000c97a0) runtime/mgc.go:1423 +0xe9 fp=0xc000296fc8 sp=0xc000296f38 pc=0x6415aa423109 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc000296fe0 sp=0xc000296fc8 pc=0x6415aa422fe5 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000296fe8 sp=0xc000296fe0 pc=0x6415aa47d021 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 52 gp=0xc000584380 m=nil [GC worker (idle)]: runtime.gopark(0x3fbf24f7c0fdc?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000297738 sp=0xc000297718 pc=0x6415aa4758ee runtime.gcBgMarkWorker(0xc0000c97a0) runtime/mgc.go:1423 +0xe9 fp=0xc0002977c8 sp=0xc000297738 pc=0x6415aa423109 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc0002977e0 sp=0xc0002977c8 pc=0x6415aa422fe5 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0002977e8 sp=0xc0002977e0 pc=0x6415aa47d021 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 53 gp=0xc000584540 m=nil [GC worker (idle)]: runtime.gopark(0x6415ac0622c0?, 0x1?, 0x58?, 0xd7?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000297f38 sp=0xc000297f18 pc=0x6415aa4758ee runtime.gcBgMarkWorker(0xc0000c97a0) runtime/mgc.go:1423 +0xe9 fp=0xc000297fc8 sp=0xc000297f38 pc=0x6415aa423109 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc000297fe0 sp=0xc000297fc8 pc=0x6415aa422fe5 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000297fe8 sp=0xc000297fe0 pc=0x6415aa47d021 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 54 gp=0xc000584700 m=nil [GC worker (idle)]: runtime.gopark(0x6415ac0622c0?, 0x1?, 0xe1?, 0xfc?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000298738 sp=0xc000298718 pc=0x6415aa4758ee runtime.gcBgMarkWorker(0xc0000c97a0) runtime/mgc.go:1423 +0xe9 fp=0xc0002987c8 sp=0xc000298738 pc=0x6415aa423109 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc0002987e0 sp=0xc0002987c8 pc=0x6415aa422fe5 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0002987e8 sp=0xc0002987e0 pc=0x6415aa47d021 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 55 gp=0xc0005848c0 m=nil [GC worker (idle)]: runtime.gopark(0x3fbf25be81a73?, 0x1?, 0x3?, 0x25?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000298f38 sp=0xc000298f18 pc=0x6415aa4758ee runtime.gcBgMarkWorker(0xc0000c97a0) runtime/mgc.go:1423 +0xe9 fp=0xc000298fc8 sp=0xc000298f38 pc=0x6415aa423109 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc000298fe0 sp=0xc000298fc8 pc=0x6415aa422fe5 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000298fe8 sp=0xc000298fe0 pc=0x6415aa47d021 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 56 gp=0xc000584a80 m=nil [GC worker (idle)]: runtime.gopark(0x6415ab73b068?, 0xc0005800e0?, 0x1b?, 0xa?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000299738 sp=0xc000299718 pc=0x6415aa4758ee runtime.gcBgMarkWorker(0xc0000c97a0) runtime/mgc.go:1423 +0xe9 fp=0xc0002997c8 sp=0xc000299738 pc=0x6415aa423109 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc0002997e0 sp=0xc0002997c8 pc=0x6415aa422fe5 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0002997e8 sp=0xc0002997e0 pc=0x6415aa47d021 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 57 gp=0xc000584c40 m=nil [GC worker (idle)]: runtime.gopark(0x3fbf245d1b123?, 0x1?, 0xc1?, 0x29?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000299f38 sp=0xc000299f18 pc=0x6415aa4758ee runtime.gcBgMarkWorker(0xc0000c97a0) runtime/mgc.go:1423 +0xe9 fp=0xc000299fc8 sp=0xc000299f38 pc=0x6415aa423109 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc000299fe0 sp=0xc000299fc8 pc=0x6415aa422fe5 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000299fe8 sp=0xc000299fe0 pc=0x6415aa47d021 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 58 gp=0xc000584e00 m=nil [GC worker (idle)]: runtime.gopark(0x3fbf245d1a8e0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc00058c738 sp=0xc00058c718 pc=0x6415aa4758ee runtime.gcBgMarkWorker(0xc0000c97a0) runtime/mgc.go:1423 +0xe9 fp=0xc00058c7c8 sp=0xc00058c738 pc=0x6415aa423109 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc00058c7e0 sp=0xc00058c7c8 pc=0x6415aa422fe5 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc00058c7e8 sp=0xc00058c7e0 pc=0x6415aa47d021 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 8 gp=0xc00016efc0 m=nil [GC worker (idle)]: runtime.gopark(0x3fbf245d1ba55?, 0x3?, 0xee?, 0x4?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000093738 sp=0xc000093718 pc=0x6415aa4758ee runtime.gcBgMarkWorker(0xc0000c97a0) runtime/mgc.go:1423 +0xe9 fp=0xc0000937c8 sp=0xc000093738 pc=0x6415aa423109 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc0000937e0 sp=0xc0000937c8 pc=0x6415aa422fe5 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0000937e8 sp=0xc0000937e0 pc=0x6415aa47d021 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 36 gp=0xc000284700 m=nil [GC worker (idle)]: runtime.gopark(0x3fbf24f7d65af?, 0x1?, 0x14?, 0x74?, 0x0?) runtime/proc.go:435 +0xce fp=0xc00029b738 sp=0xc00029b718 pc=0x6415aa4758ee runtime.gcBgMarkWorker(0xc0000c97a0) runtime/mgc.go:1423 +0xe9 fp=0xc00029b7c8 sp=0xc00029b738 pc=0x6415aa423109 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc00029b7e0 sp=0xc00029b7c8 pc=0x6415aa422fe5 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc00029b7e8 sp=0xc00029b7e0 pc=0x6415aa47d021 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 37 gp=0xc0002848c0 m=nil [GC worker (idle)]: runtime.gopark(0x3fbf245d1c19d?, 0x1?, 0x89?, 0x4c?, 0x0?) runtime/proc.go:435 +0xce fp=0xc00029bf38 sp=0xc00029bf18 pc=0x6415aa4758ee runtime.gcBgMarkWorker(0xc0000c97a0) runtime/mgc.go:1423 +0xe9 fp=0xc00029bfc8 sp=0xc00029bf38 pc=0x6415aa423109 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc00029bfe0 sp=0xc00029bfc8 pc=0x6415aa422fe5 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc00029bfe8 sp=0xc00029bfe0 pc=0x6415aa47d021 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 38 gp=0xc000284a80 m=nil [GC worker (idle)]: runtime.gopark(0x3fbf245d1ae3a?, 0x1?, 0xe5?, 0x1a?, 0x0?) runtime/proc.go:435 +0xce fp=0xc00029c738 sp=0xc00029c718 pc=0x6415aa4758ee runtime.gcBgMarkWorker(0xc0000c97a0) runtime/mgc.go:1423 +0xe9 fp=0xc00029c7c8 sp=0xc00029c738 pc=0x6415aa423109 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc00029c7e0 sp=0xc00029c7c8 pc=0x6415aa422fe5 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc00029c7e8 sp=0xc00029c7e0 pc=0x6415aa47d021 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 39 gp=0xc000284c40 m=nil [GC worker (idle)]: runtime.gopark(0x3fbf24f841efe?, 0xc0002920c0?, 0x1b?, 0xa?, 0x0?) runtime/proc.go:435 +0xce fp=0xc00029cf38 sp=0xc00029cf18 pc=0x6415aa4758ee runtime.gcBgMarkWorker(0xc0000c97a0) runtime/mgc.go:1423 +0xe9 fp=0xc00029cfc8 sp=0xc00029cf38 pc=0x6415aa423109 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc00029cfe0 sp=0xc00029cfc8 pc=0x6415aa422fe5 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc00029cfe8 sp=0xc00029cfe0 pc=0x6415aa47d021 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 940 gp=0xc003232380 m=nil [IO wait]: runtime.gopark(0x6415aa41a046?, 0xc0041bc808?, 0x0?, 0x0?, 0xb?) runtime/proc.go:435 +0xce fp=0xc00321a5d8 sp=0xc00321a5b8 pc=0x6415aa4758ee runtime.netpollblock(0x6415aa498d78?, 0xaa40f226?, 0x15?) runtime/netpoll.go:575 +0xf7 fp=0xc00321a610 sp=0xc00321a5d8 pc=0x6415aa43a6f7 internal/poll.runtime_pollWait(0x75018525fd98, 0x72) runtime/netpoll.go:351 +0x85 fp=0xc00321a630 sp=0xc00321a610 pc=0x6415aa474b05 internal/poll.(*pollDesc).wait(0xc0004a2000?, 0xc06908e4f1?, 0x0) internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc00321a658 sp=0xc00321a630 pc=0x6415aa4fbf87 internal/poll.(*pollDesc).waitRead(...) internal/poll/fd_poll_runtime.go:89 internal/poll.(*FD).Read(0xc0004a2000, {0xc06908e4f1, 0x1, 0x1}) internal/poll/fd_unix.go:165 +0x27a fp=0xc00321a6f0 sp=0xc00321a658 pc=0x6415aa4fd27a net.(*netFD).Read(0xc0004a2000, {0xc06908e4f1?, 0xc003002058?, 0xc00321a770?}) net/fd_posix.go:55 +0x25 fp=0xc00321a738 sp=0xc00321a6f0 pc=0x6415aa5721c5 net.(*conn).Read(0xc000094090, {0xc06908e4f1?, 0xc003003880?, 0x6415aa85be40?}) net/net.go:194 +0x45 fp=0xc00321a780 sp=0xc00321a738 pc=0x6415aa580585 net/http.(*connReader).backgroundRead(0xc06908e4e0) net/http/server.go:690 +0x37 fp=0xc00321a7c8 sp=0xc00321a780 pc=0x6415aa76be17 net/http.(*connReader).startBackgroundRead.gowrap2() net/http/server.go:686 +0x25 fp=0xc00321a7e0 sp=0xc00321a7c8 pc=0x6415aa76bd45 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc00321a7e8 sp=0xc00321a7e0 pc=0x6415aa47d021 created by net/http.(*connReader).startBackgroundRead in goroutine 982 net/http/server.go:686 +0xb6 goroutine 982 gp=0xc06e978540 m=nil [select]: runtime.gopark(0xc00442da68?, 0x2?, 0x0?, 0x65?, 0xc00442d80c?) runtime/proc.go:435 +0xce fp=0xc00442d620 sp=0xc00442d600 pc=0x6415aa4758ee runtime.selectgo(0xc00442da68, 0xc00442d808, 0x821?, 0x0, 0x1?, 0x1) runtime/select.go:351 +0x837 fp=0xc00442d758 sp=0xc00442d620 pc=0x6415aa4541f7 github.com/ollama/ollama/runner/ollamarunner.(*Server).completion(0xc0005c19e0, {0x6415ab74d4f8, 0xc00440c0e0}, 0xc003564140) github.com/ollama/ollama/runner/ollamarunner/runner.go:649 +0xad0 fp=0xc00442dac0 sp=0xc00442d758 pc=0x6415aa900e70 github.com/ollama/ollama/runner/ollamarunner.(*Server).completion-fm({0x6415ab74d4f8?, 0xc00440c0e0?}, 0xc0002dbb40?) <autogenerated>:1 +0x36 fp=0xc00442daf0 sp=0xc00442dac0 pc=0x6415aa903a36 net/http.HandlerFunc.ServeHTTP(0xc000533140?, {0x6415ab74d4f8?, 0xc00440c0e0?}, 0xc0002dbb60?) net/http/server.go:2294 +0x29 fp=0xc00442db18 sp=0xc00442daf0 pc=0x6415aa773a49 net/http.(*ServeMux).ServeHTTP(0x6415aa41a125?, {0x6415ab74d4f8, 0xc00440c0e0}, 0xc003564140) net/http/server.go:2822 +0x1c4 fp=0xc00442db68 sp=0xc00442db18 pc=0x6415aa775944 net/http.serverHandler.ServeHTTP({0x6415ab749b10?}, {0x6415ab74d4f8?, 0xc00440c0e0?}, 0x1?) net/http/server.go:3301 +0x8e fp=0xc00442db98 sp=0xc00442db68 pc=0x6415aa7933ce net/http.(*conn).serve(0xc0004c0120, {0x6415ab74f5a8, 0xc0004a7bc0}) net/http/server.go:2102 +0x625 fp=0xc00442dfb8 sp=0xc00442db98 pc=0x6415aa771f45 net/http.(*Server).Serve.gowrap3() net/http/server.go:3454 +0x28 fp=0xc00442dfe0 sp=0xc00442dfb8 pc=0x6415aa777808 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc00442dfe8 sp=0xc00442dfe0 pc=0x6415aa47d021 created by net/http.(*Server).Serve in goroutine 1 net/http/server.go:3454 +0x485 rax 0x74fbe8354380 rbx 0x74fbe83542f0 rcx 0x2 rdx 0x74fbb81c0f00 rdi 0x0 rsi 0x74fb9eb33030 rbp 0x74fbb81c0ef0 rsp 0x74fbadffac48 r8 0x4 r9 0xc000094048 r10 0x1 r11 0x206 r12 0x0 r13 0x74fbe801a2c8 r14 0xa52 r15 0x74fbe83542f0 rip 0x6415ab24ffd0 rflags 0x10206 cs 0x33 fs 0x0 gs 0x0 [GIN] 2025/03/12 - 14:23:31 | 200 | 2.505928799s | 172.17.0.2 | POST "/api/chat" time=2025-03-12T14:23:31.575Z level=ERROR source=server.go:449 msg="llama runner terminated" error="exit status 2" time=2025-03-12T14:23:36.546Z level=WARN source=sched.go:647 msg="gpu VRAM usage didn't recover within timeout" seconds=5.191615043 model=/home/arch/Desktop/AIstuff/llms/models/ollama/blobs/sha256-adca500fad9b54c565ae672184e0c9eb690eb6014ba63f8ec13849d4f73a32d3 time=2025-03-12T14:23:36.796Z level=WARN source=sched.go:647 msg="gpu VRAM usage didn't recover within timeout" seconds=5.440854627 model=/home/arch/Desktop/AIstuff/llms/models/ollama/blobs/sha256-adca500fad9b54c565ae672184e0c9eb690eb6014ba63f8ec13849d4f73a32d3 time=2025-03-12T14:23:37.108Z level=WARN source=sched.go:647 msg="gpu VRAM usage didn't recover within timeout" seconds=5.753601007 model=/home/arch/Desktop/AIstuff/llms/models/ollama/blobs/sha256-adca500fad9b54c565ae672184e0c9eb690eb6014ba63f8ec13849d4f73a32d3 time=2025-03-12T14:23:37.278Z level=INFO source=server.go:105 msg="system memory" total="62.4 GiB" free="54.7 GiB" free_swap="0 B" time=2025-03-12T14:23:37.279Z level=INFO source=server.go:138 msg=offload library=cuda layers.requested=-1 layers.model=49 layers.offload=33 layers.split="" memory.available="[14.5 GiB]" memory.gpu_overhead="0 B" memory.required.full="20.8 GiB" memory.required.partial="14.3 GiB" memory.required.kv="11.7 GiB" memory.required.allocations="[14.3 GiB]" memory.weights.total="17.7 GiB" memory.weights.repeating="17.0 GiB" memory.weights.nonrepeating="787.5 MiB" memory.graph.full="1.1 GiB" memory.graph.partial="1.3 GiB" time=2025-03-12T14:23:37.325Z level=WARN source=ggml.go:149 msg="key not found" key=tokenizer.ggml.pretokenizer default="(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\\r\\n\\p{L}\\p{N}]?\\p{L}+|\\p{N}{1,3}| ?[^\\s\\p{L}\\p{N}]+[\\r\\n]*|\\s*[\\r\\n]+|\\s+(?!\\S)|\\s+" time=2025-03-12T14:23:37.327Z level=WARN source=ggml.go:149 msg="key not found" key=tokenizer.ggml.add_eot_token default=false time=2025-03-12T14:23:37.328Z level=WARN source=ggml.go:149 msg="key not found" key=tokenizer.ggml.pretokenizer default="(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\\r\\n\\p{L}\\p{N}]?\\p{L}+|\\p{N}{1,3}| ?[^\\s\\p{L}\\p{N}]+[\\r\\n]*|\\s*[\\r\\n]+|\\s+(?!\\S)|\\s+" time=2025-03-12T14:23:37.331Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.attention.layer_norm_rms_epsilon default=9.999999974752427e-07 time=2025-03-12T14:23:37.331Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.rope.local.freq_base default=10000 time=2025-03-12T14:23:37.331Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06 time=2025-03-12T14:23:37.331Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.rope.freq_scale default=1 time=2025-03-12T14:23:37.331Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.final_logit_softcapping default=30 time=2025-03-12T14:23:37.331Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.mm_tokens_per_image default=256 time=2025-03-12T14:23:37.331Z level=INFO source=server.go:405 msg="starting llama server" cmd="/usr/local/bin/ollama runner --ollama-engine --model /home/arch/Desktop/AIstuff/llms/models/ollama/blobs/sha256-adca500fad9b54c565ae672184e0c9eb690eb6014ba63f8ec13849d4f73a32d3 --ctx-size 32000 --batch-size 512 --n-gpu-layers 33 --threads 6 --parallel 1 --port 43925" time=2025-03-12T14:23:37.331Z level=INFO source=sched.go:450 msg="loaded runners" count=1 time=2025-03-12T14:23:37.331Z level=INFO source=server.go:585 msg="waiting for llama runner to start responding" time=2025-03-12T14:23:37.332Z level=INFO source=server.go:619 msg="waiting for server to become available" status="llm server error" time=2025-03-12T14:23:37.337Z level=INFO source=runner.go:882 msg="starting ollama engine" time=2025-03-12T14:23:37.338Z level=INFO source=runner.go:938 msg="Server listening on 127.0.0.1:43925" time=2025-03-12T14:23:37.383Z level=WARN source=ggml.go:149 msg="key not found" key=general.name default="" time=2025-03-12T14:23:37.383Z level=WARN source=ggml.go:149 msg="key not found" key=general.description default="" time=2025-03-12T14:23:37.383Z level=INFO source=ggml.go:67 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=1065 num_key_values=36 ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 CUDA devices: Device 0: NVIDIA GeForce RTX 4090 Laptop GPU, compute capability 8.9, VMM: yes load_backend: loaded CUDA backend from /usr/local/lib/ollama/cuda_v12/libggml-cuda.so load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-alderlake.so time=2025-03-12T14:23:37.420Z level=INFO source=ggml.go:109 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX_VNNI=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=500,600,610,700,750,800,860,870,890,900,1200 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc) time=2025-03-12T14:23:37.530Z level=INFO source=ggml.go:289 msg="model weights" buffer=CUDA0 size="4.1 GiB" time=2025-03-12T14:23:37.530Z level=INFO source=ggml.go:289 msg="model weights" buffer=CPU size="4.2 GiB" time=2025-03-12T14:23:37.784Z level=INFO source=server.go:619 msg="waiting for server to become available" status="llm server not responding" time=2025-03-12T14:23:38.037Z level=INFO source=server.go:619 msg="waiting for server to become available" status="llm server loading model" time=2025-03-12T14:23:38.463Z level=INFO source=ggml.go:356 msg="compute graph" backend=CUDA0 buffer_type=CUDA0 time=2025-03-12T14:23:38.463Z level=INFO source=ggml.go:356 msg="compute graph" backend=CPU buffer_type=CUDA_Host time=2025-03-12T14:23:38.463Z level=WARN source=ggml.go:149 msg="key not found" key=tokenizer.ggml.pretokenizer default="(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\\r\\n\\p{L}\\p{N}]?\\p{L}+|\\p{N}{1,3}| ?[^\\s\\p{L}\\p{N}]+[\\r\\n]*|\\s*[\\r\\n]+|\\s+(?!\\S)|\\s+" time=2025-03-12T14:23:38.465Z level=WARN source=ggml.go:149 msg="key not found" key=tokenizer.ggml.add_eot_token default=false time=2025-03-12T14:23:38.466Z level=WARN source=ggml.go:149 msg="key not found" key=tokenizer.ggml.pretokenizer default="(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\\r\\n\\p{L}\\p{N}]?\\p{L}+|\\p{N}{1,3}| ?[^\\s\\p{L}\\p{N}]+[\\r\\n]*|\\s*[\\r\\n]+|\\s+(?!\\S)|\\s+" time=2025-03-12T14:23:38.469Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.attention.layer_norm_rms_epsilon default=9.999999974752427e-07 time=2025-03-12T14:23:38.469Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.rope.local.freq_base default=10000 time=2025-03-12T14:23:38.469Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06 time=2025-03-12T14:23:38.469Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.rope.freq_scale default=1 time=2025-03-12T14:23:38.469Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.final_logit_softcapping default=30 time=2025-03-12T14:23:38.469Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.mm_tokens_per_image default=256 time=2025-03-12T14:23:38.539Z level=INFO source=server.go:624 msg="llama runner started in 1.21 seconds" ggml_backend_cuda_buffer_type_alloc_buffer: allocating 15554.32 MiB on device 0: cudaMalloc failed: out of memory ggml_gallocr_reserve_n: failed to allocate CUDA0 buffer of size 16309888000 SIGSEGV: segmentation violation PC=0x60fc5fd56fd0 m=135 sigcode=1 addr=0x58 signal arrived during cgo execution goroutine 9 gp=0xc000103dc0 m=135 mp=0xc1ad8e8008 [syscall]: runtime.cgocall(0x60fc5fdaafe0, 0xc000153b00) runtime/cgocall.go:167 +0x4b fp=0xc000153ad8 sp=0xc000153aa0 pc=0x60fc5ef7960b github.com/ollama/ollama/ml/backend/ggml._Cfunc_ggml_backend_sched_graph_compute_async(0x7ee3b4003430, 0x7ee6281bd2e0) _cgo_gotypes.go:485 +0x4a fp=0xc000153b00 sp=0xc000153ad8 pc=0x60fc5f3641aa github.com/ollama/ollama/ml/backend/ggml.Context.Compute.func1(...) github.com/ollama/ollama/ml/backend/ggml/ggml.go:497 github.com/ollama/ollama/ml/backend/ggml.Context.Compute({0xc000142000, 0x7ee1f0000ba0, 0x7ee6281bd2e0, 0x0, 0x2000}, {0xc21a1c7ca0, 0x1, 0x60fc60266210?}) github.com/ollama/ollama/ml/backend/ggml/ggml.go:497 +0xbd fp=0xc000153b90 sp=0xc000153b00 pc=0x60fc5f36ca7d github.com/ollama/ollama/ml/backend/ggml.(*Context).Compute(0xc21a20b410?, {0xc21a1c7ca0?, 0x200?, 0x0?}) <autogenerated>:1 +0x72 fp=0xc000153c08 sp=0xc000153b90 pc=0x60fc5f3724f2 github.com/ollama/ollama/model.Forward({0x60fc6025dc00, 0xc21a20b410}, {0x60fc602552b0, 0xc0003580e0}, {{0xc005ced800, 0x200, 0x200}, {0x0, 0x0, 0x0}, ...}) github.com/ollama/ollama/model/model.go:303 +0x218 fp=0xc000153cf0 sp=0xc000153c08 pc=0x60fc5f399718 github.com/ollama/ollama/runner/ollamarunner.(*Server).processBatch(0xc00013d9e0) github.com/ollama/ollama/runner/ollamarunner/runner.go:395 +0x3bb fp=0xc000153f98 sp=0xc000153cf0 pc=0x60fc5f405dfb github.com/ollama/ollama/runner/ollamarunner.(*Server).run(0xc00013d9e0, {0x60fc602565e0, 0xc000524910}) github.com/ollama/ollama/runner/ollamarunner/runner.go:321 +0x4e fp=0xc000153fb8 sp=0xc000153f98 pc=0x60fc5f4059ee github.com/ollama/ollama/runner/ollamarunner.Execute.gowrap2() github.com/ollama/ollama/runner/ollamarunner/runner.go:919 +0x28 fp=0xc000153fe0 sp=0xc000153fb8 pc=0x60fc5f40a668 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000153fe8 sp=0xc000153fe0 pc=0x60fc5ef84021 created by github.com/ollama/ollama/runner/ollamarunner.Execute in goroutine 1 github.com/ollama/ollama/runner/ollamarunner/runner.go:919 +0xa9c goroutine 1 gp=0xc000002380 m=nil [IO wait]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc00014f648 sp=0xc00014f628 pc=0x60fc5ef7c8ee runtime.netpollblock(0xc00014f698?, 0x5ef16226?, 0xfc?) runtime/netpoll.go:575 +0xf7 fp=0xc00014f680 sp=0xc00014f648 pc=0x60fc5ef416f7 internal/poll.runtime_pollWait(0x7ee7261a6eb0, 0x72) runtime/netpoll.go:351 +0x85 fp=0xc00014f6a0 sp=0xc00014f680 pc=0x60fc5ef7bb05 internal/poll.(*pollDesc).wait(0xc0005a2800?, 0x900f1fcfe?, 0x0) internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc00014f6c8 sp=0xc00014f6a0 pc=0x60fc5f002f87 internal/poll.(*pollDesc).waitRead(...) internal/poll/fd_poll_runtime.go:89 internal/poll.(*FD).Accept(0xc0005a2800) internal/poll/fd_unix.go:620 +0x295 fp=0xc00014f770 sp=0xc00014f6c8 pc=0x60fc5f008355 net.(*netFD).accept(0xc0005a2800) net/fd_unix.go:172 +0x29 fp=0xc00014f828 sp=0xc00014f770 pc=0x60fc5f07b169 net.(*TCPListener).accept(0xc000522cc0) net/tcpsock_posix.go:159 +0x1b fp=0xc00014f878 sp=0xc00014f828 pc=0x60fc5f090b1b net.(*TCPListener).Accept(0xc000522cc0) net/tcpsock.go:380 +0x30 fp=0xc00014f8a8 sp=0xc00014f878 pc=0x60fc5f08f9d0 net/http.(*onceCloseListener).Accept(0xc0004c2480?) <autogenerated>:1 +0x24 fp=0xc00014f8c0 sp=0xc00014f8a8 pc=0x60fc5f2a6b44 net/http.(*Server).Serve(0xc000511100, {0x60fc60254318, 0xc000522cc0}) net/http/server.go:3424 +0x30c fp=0xc00014f9f0 sp=0xc00014f8c0 pc=0x60fc5f27e40c github.com/ollama/ollama/runner/ollamarunner.Execute({0xc000034150, 0xe, 0xf}) github.com/ollama/ollama/runner/ollamarunner/runner.go:939 +0xe6a fp=0xc00014fd08 sp=0xc00014f9f0 pc=0x60fc5f40a3aa github.com/ollama/ollama/runner.Execute({0xc000034130?, 0x0?, 0x0?}) github.com/ollama/ollama/runner/runner.go:20 +0xc9 fp=0xc00014fd30 sp=0xc00014fd08 pc=0x60fc5f40af09 github.com/ollama/ollama/cmd.NewCLI.func2(0xc000510f00?, {0x60fc5fdc6054?, 0x4?, 0x60fc5fdc6058?}) github.com/ollama/ollama/cmd/cmd.go:1285 +0x45 fp=0xc00014fd58 sp=0xc00014fd30 pc=0x60fc5fb7b6a5 github.com/spf13/cobra.(*Command).execute(0xc0004c4f08, {0xc000144c30, 0xf, 0xf}) github.com/spf13/cobra@v1.7.0/command.go:940 +0x85c fp=0xc00014fe78 sp=0xc00014fd58 pc=0x60fc5f0f42fc github.com/spf13/cobra.(*Command).ExecuteC(0xc0004ae908) github.com/spf13/cobra@v1.7.0/command.go:1068 +0x3a5 fp=0xc00014ff30 sp=0xc00014fe78 pc=0x60fc5f0f4b45 github.com/spf13/cobra.(*Command).Execute(...) github.com/spf13/cobra@v1.7.0/command.go:992 github.com/spf13/cobra.(*Command).ExecuteContext(...) github.com/spf13/cobra@v1.7.0/command.go:985 main.main() github.com/ollama/ollama/main.go:12 +0x4d fp=0xc00014ff50 sp=0xc00014ff30 pc=0x60fc5fb7ba0d runtime.main() runtime/proc.go:283 +0x29d fp=0xc00014ffe0 sp=0xc00014ff50 pc=0x60fc5ef48cfd runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc00014ffe8 sp=0xc00014ffe0 pc=0x60fc5ef84021 goroutine 2 gp=0xc000002e00 m=nil [force gc (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000090fa8 sp=0xc000090f88 pc=0x60fc5ef7c8ee runtime.goparkunlock(...) runtime/proc.go:441 runtime.forcegchelper() runtime/proc.go:348 +0xb8 fp=0xc000090fe0 sp=0xc000090fa8 pc=0x60fc5ef49038 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000090fe8 sp=0xc000090fe0 pc=0x60fc5ef84021 created by runtime.init.7 in goroutine 1 runtime/proc.go:336 +0x1a goroutine 3 gp=0xc000003340 m=nil [GC sweep wait]: runtime.gopark(0x1?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000091780 sp=0xc000091760 pc=0x60fc5ef7c8ee runtime.goparkunlock(...) runtime/proc.go:441 runtime.bgsweep(0xc000042100) runtime/mgcsweep.go:316 +0xdf fp=0xc0000917c8 sp=0xc000091780 pc=0x60fc5ef3385f runtime.gcenable.gowrap1() runtime/mgc.go:204 +0x25 fp=0xc0000917e0 sp=0xc0000917c8 pc=0x60fc5ef27c45 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0000917e8 sp=0xc0000917e0 pc=0x60fc5ef84021 created by runtime.gcenable in goroutine 1 runtime/mgc.go:204 +0x66 goroutine 4 gp=0xc000003500 m=nil [GC scavenge wait]: runtime.gopark(0x60fc60abab40?, 0x60fc5ff7ca00?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000091f78 sp=0xc000091f58 pc=0x60fc5ef7c8ee runtime.goparkunlock(...) runtime/proc.go:441 runtime.(*scavengerState).park(0x60fc60abab40) runtime/mgcscavenge.go:425 +0x49 fp=0xc000091fa8 sp=0xc000091f78 pc=0x60fc5ef312a9 runtime.bgscavenge(0xc000042100) runtime/mgcscavenge.go:658 +0x59 fp=0xc000091fc8 sp=0xc000091fa8 pc=0x60fc5ef31839 runtime.gcenable.gowrap2() runtime/mgc.go:205 +0x25 fp=0xc000091fe0 sp=0xc000091fc8 pc=0x60fc5ef27be5 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000091fe8 sp=0xc000091fe0 pc=0x60fc5ef84021 created by runtime.gcenable in goroutine 1 runtime/mgc.go:205 +0xa5 goroutine 5 gp=0xc000003dc0 m=nil [finalizer wait]: runtime.gopark(0x1b8?, 0x60fc5ef4b2a9?, 0x1?, 0x23?, 0xc000090688?) runtime/proc.go:435 +0xce fp=0xc000090630 sp=0xc000090610 pc=0x60fc5ef7c8ee runtime.runfinq() runtime/mfinal.go:196 +0x107 fp=0xc0000907e0 sp=0xc000090630 pc=0x60fc5ef26c07 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0000907e8 sp=0xc0000907e0 pc=0x60fc5ef84021 created by runtime.createfing in goroutine 1 runtime/mfinal.go:166 +0x3d goroutine 6 gp=0xc0001f08c0 m=nil [chan receive]: runtime.gopark(0xc0001979a0?, 0xc0032fc030?, 0x60?, 0x27?, 0x60fc5f061ea8?) runtime/proc.go:435 +0xce fp=0xc000092718 sp=0xc0000926f8 pc=0x60fc5ef7c8ee runtime.chanrecv(0xc0000c8380, 0x0, 0x1) runtime/chan.go:664 +0x445 fp=0xc000092790 sp=0xc000092718 pc=0x60fc5ef18e05 runtime.chanrecv1(0x0?, 0x0?) runtime/chan.go:506 +0x12 fp=0xc0000927b8 sp=0xc000092790 pc=0x60fc5ef18992 runtime.unique_runtime_registerUniqueMapCleanup.func2(...) runtime/mgc.go:1796 runtime.unique_runtime_registerUniqueMapCleanup.gowrap1() runtime/mgc.go:1799 +0x2f fp=0xc0000927e0 sp=0xc0000927b8 pc=0x60fc5ef2adef runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0000927e8 sp=0xc0000927e0 pc=0x60fc5ef84021 created by unique.runtime_registerUniqueMapCleanup in goroutine 1 runtime/mgc.go:1794 +0x85 goroutine 7 gp=0xc0001f0e00 m=nil [GC worker (idle)]: runtime.gopark(0x3fbf424c610e4?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000092f38 sp=0xc000092f18 pc=0x60fc5ef7c8ee runtime.gcBgMarkWorker(0xc0000c9960) runtime/mgc.go:1423 +0xe9 fp=0xc000092fc8 sp=0xc000092f38 pc=0x60fc5ef2a109 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc000092fe0 sp=0xc000092fc8 pc=0x60fc5ef29fe5 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000092fe8 sp=0xc000092fe0 pc=0x60fc5ef84021 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 18 gp=0xc000504000 m=nil [GC worker (idle)]: runtime.gopark(0x3fbf424c612d9?, 0x3?, 0xf9?, 0xfc?, 0x0?) runtime/proc.go:435 +0xce fp=0xc00008c738 sp=0xc00008c718 pc=0x60fc5ef7c8ee runtime.gcBgMarkWorker(0xc0000c9960) runtime/mgc.go:1423 +0xe9 fp=0xc00008c7c8 sp=0xc00008c738 pc=0x60fc5ef2a109 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc00008c7e0 sp=0xc00008c7c8 pc=0x60fc5ef29fe5 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc00008c7e8 sp=0xc00008c7e0 pc=0x60fc5ef84021 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 19 gp=0xc0005041c0 m=nil [GC worker (idle)]: runtime.gopark(0x3fbf424c61029?, 0x3?, 0xea?, 0x7e?, 0x0?) runtime/proc.go:435 +0xce fp=0xc00008cf38 sp=0xc00008cf18 pc=0x60fc5ef7c8ee runtime.gcBgMarkWorker(0xc0000c9960) runtime/mgc.go:1423 +0xe9 fp=0xc00008cfc8 sp=0xc00008cf38 pc=0x60fc5ef2a109 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc00008cfe0 sp=0xc00008cfc8 pc=0x60fc5ef29fe5 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc00008cfe8 sp=0xc00008cfe0 pc=0x60fc5ef84021 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 20 gp=0xc000504380 m=nil [GC worker (idle)]: runtime.gopark(0x3fbf424c60fe9?, 0x3?, 0x7a?, 0x8d?, 0x0?) runtime/proc.go:435 +0xce fp=0xc00008d738 sp=0xc00008d718 pc=0x60fc5ef7c8ee runtime.gcBgMarkWorker(0xc0000c9960) runtime/mgc.go:1423 +0xe9 fp=0xc00008d7c8 sp=0xc00008d738 pc=0x60fc5ef2a109 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc00008d7e0 sp=0xc00008d7c8 pc=0x60fc5ef29fe5 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc00008d7e8 sp=0xc00008d7e0 pc=0x60fc5ef84021 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 21 gp=0xc000504540 m=nil [GC worker (idle)]: runtime.gopark(0x3fbf424c615d6?, 0x3?, 0x9?, 0x76?, 0x0?) runtime/proc.go:435 +0xce fp=0xc00008df38 sp=0xc00008df18 pc=0x60fc5ef7c8ee runtime.gcBgMarkWorker(0xc0000c9960) runtime/mgc.go:1423 +0xe9 fp=0xc00008dfc8 sp=0xc00008df38 pc=0x60fc5ef2a109 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc00008dfe0 sp=0xc00008dfc8 pc=0x60fc5ef29fe5 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc00008dfe8 sp=0xc00008dfe0 pc=0x60fc5ef84021 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 22 gp=0xc000504700 m=nil [GC worker (idle)]: runtime.gopark(0x3fbf434724ef4?, 0x1?, 0xe6?, 0x8?, 0x0?) runtime/proc.go:435 +0xce fp=0xc00008e738 sp=0xc00008e718 pc=0x60fc5ef7c8ee runtime.gcBgMarkWorker(0xc0000c9960) runtime/mgc.go:1423 +0xe9 fp=0xc00008e7c8 sp=0xc00008e738 pc=0x60fc5ef2a109 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc00008e7e0 sp=0xc00008e7c8 pc=0x60fc5ef29fe5 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc00008e7e8 sp=0xc00008e7e0 pc=0x60fc5ef84021 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 23 gp=0xc0005048c0 m=nil [GC worker (idle)]: runtime.gopark(0x60fc60b692c0?, 0x1?, 0x80?, 0x85?, 0x0?) runtime/proc.go:435 +0xce fp=0xc00008ef38 sp=0xc00008ef18 pc=0x60fc5ef7c8ee runtime.gcBgMarkWorker(0xc0000c9960) runtime/mgc.go:1423 +0xe9 fp=0xc00008efc8 sp=0xc00008ef38 pc=0x60fc5ef2a109 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc00008efe0 sp=0xc00008efc8 pc=0x60fc5ef29fe5 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc00008efe8 sp=0xc00008efe0 pc=0x60fc5ef84021 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 24 gp=0xc000504a80 m=nil [GC worker (idle)]: runtime.gopark(0x3fbf424c5fe6a?, 0x3?, 0xd5?, 0xab?, 0x0?) runtime/proc.go:435 +0xce fp=0xc00008f738 sp=0xc00008f718 pc=0x60fc5ef7c8ee runtime.gcBgMarkWorker(0xc0000c9960) runtime/mgc.go:1423 +0xe9 fp=0xc00008f7c8 sp=0xc00008f738 pc=0x60fc5ef2a109 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc00008f7e0 sp=0xc00008f7c8 pc=0x60fc5ef29fe5 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc00008f7e8 sp=0xc00008f7e0 pc=0x60fc5ef84021 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 25 gp=0xc000504c40 m=nil [GC worker (idle)]: runtime.gopark(0x3fbf424c6114d?, 0x1?, 0xcf?, 0x37?, 0x0?) runtime/proc.go:435 +0xce fp=0xc00008ff38 sp=0xc00008ff18 pc=0x60fc5ef7c8ee runtime.gcBgMarkWorker(0xc0000c9960) runtime/mgc.go:1423 +0xe9 fp=0xc00008ffc8 sp=0xc00008ff38 pc=0x60fc5ef2a109 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc00008ffe0 sp=0xc00008ffc8 pc=0x60fc5ef29fe5 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc00008ffe8 sp=0xc00008ffe0 pc=0x60fc5ef84021 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 26 gp=0xc000504e00 m=nil [GC worker (idle)]: runtime.gopark(0x3fbf424c6108f?, 0x3?, 0xd2?, 0xb8?, 0x0?) runtime/proc.go:435 +0xce fp=0xc00050c738 sp=0xc00050c718 pc=0x60fc5ef7c8ee runtime.gcBgMarkWorker(0xc0000c9960) runtime/mgc.go:1423 +0xe9 fp=0xc00050c7c8 sp=0xc00050c738 pc=0x60fc5ef2a109 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc00050c7e0 sp=0xc00050c7c8 pc=0x60fc5ef29fe5 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc00050c7e8 sp=0xc00050c7e0 pc=0x60fc5ef84021 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 27 gp=0xc000504fc0 m=nil [GC worker (idle)]: runtime.gopark(0x60fc60b692c0?, 0x1?, 0xf5?, 0x2c?, 0x0?) runtime/proc.go:435 +0xce fp=0xc00050cf38 sp=0xc00050cf18 pc=0x60fc5ef7c8ee runtime.gcBgMarkWorker(0xc0000c9960) runtime/mgc.go:1423 +0xe9 fp=0xc00050cfc8 sp=0xc00050cf38 pc=0x60fc5ef2a109 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc00050cfe0 sp=0xc00050cfc8 pc=0x60fc5ef29fe5 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc00050cfe8 sp=0xc00050cfe0 pc=0x60fc5ef84021 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 34 gp=0xc000102700 m=nil [GC worker (idle)]: runtime.gopark(0x3fbf424c60ffd?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000508738 sp=0xc000508718 pc=0x60fc5ef7c8ee runtime.gcBgMarkWorker(0xc0000c9960) runtime/mgc.go:1423 +0xe9 fp=0xc0005087c8 sp=0xc000508738 pc=0x60fc5ef2a109 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc0005087e0 sp=0xc0005087c8 pc=0x60fc5ef29fe5 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0005087e8 sp=0xc0005087e0 pc=0x60fc5ef84021 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 35 gp=0xc0001028c0 m=nil [GC worker (idle)]: runtime.gopark(0x60fc60b692c0?, 0x3?, 0x9c?, 0x45?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000508f38 sp=0xc000508f18 pc=0x60fc5ef7c8ee runtime.gcBgMarkWorker(0xc0000c9960) runtime/mgc.go:1423 +0xe9 fp=0xc000508fc8 sp=0xc000508f38 pc=0x60fc5ef2a109 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc000508fe0 sp=0xc000508fc8 pc=0x60fc5ef29fe5 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000508fe8 sp=0xc000508fe0 pc=0x60fc5ef84021 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 36 gp=0xc000102a80 m=nil [GC worker (idle)]: runtime.gopark(0x3fbf424c61173?, 0x3?, 0x1d?, 0x40?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000509738 sp=0xc000509718 pc=0x60fc5ef7c8ee runtime.gcBgMarkWorker(0xc0000c9960) runtime/mgc.go:1423 +0xe9 fp=0xc0005097c8 sp=0xc000509738 pc=0x60fc5ef2a109 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc0005097e0 sp=0xc0005097c8 pc=0x60fc5ef29fe5 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0005097e8 sp=0xc0005097e0 pc=0x60fc5ef84021 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 37 gp=0xc000102c40 m=nil [GC worker (idle)]: runtime.gopark(0x60fc60b692c0?, 0x1?, 0x35?, 0x12?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000509f38 sp=0xc000509f18 pc=0x60fc5ef7c8ee runtime.gcBgMarkWorker(0xc0000c9960) runtime/mgc.go:1423 +0xe9 fp=0xc000509fc8 sp=0xc000509f38 pc=0x60fc5ef2a109 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc000509fe0 sp=0xc000509fc8 pc=0x60fc5ef29fe5 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000509fe8 sp=0xc000509fe0 pc=0x60fc5ef84021 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 38 gp=0xc000102e00 m=nil [GC worker (idle)]: runtime.gopark(0x3fbf4325936f0?, 0x1?, 0x1f?, 0xdb?, 0x0?) runtime/proc.go:435 +0xce fp=0xc00050a738 sp=0xc00050a718 pc=0x60fc5ef7c8ee runtime.gcBgMarkWorker(0xc0000c9960) runtime/mgc.go:1423 +0xe9 fp=0xc00050a7c8 sp=0xc00050a738 pc=0x60fc5ef2a109 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc00050a7e0 sp=0xc00050a7c8 pc=0x60fc5ef29fe5 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc00050a7e8 sp=0xc00050a7e0 pc=0x60fc5ef84021 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 39 gp=0xc000102fc0 m=nil [GC worker (idle)]: runtime.gopark(0x3fbf424c614a6?, 0x1?, 0xa0?, 0x20?, 0x0?) runtime/proc.go:435 +0xce fp=0xc00050af38 sp=0xc00050af18 pc=0x60fc5ef7c8ee runtime.gcBgMarkWorker(0xc0000c9960) runtime/mgc.go:1423 +0xe9 fp=0xc00050afc8 sp=0xc00050af38 pc=0x60fc5ef2a109 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc00050afe0 sp=0xc00050afc8 pc=0x60fc5ef29fe5 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc00050afe8 sp=0xc00050afe0 pc=0x60fc5ef84021 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 50 gp=0xc000584000 m=nil [GC worker (idle)]: runtime.gopark(0x3fbf432593e51?, 0x1?, 0xdf?, 0xee?, 0x0?) runtime/proc.go:435 +0xce fp=0xc00058a738 sp=0xc00058a718 pc=0x60fc5ef7c8ee runtime.gcBgMarkWorker(0xc0000c9960) runtime/mgc.go:1423 +0xe9 fp=0xc00058a7c8 sp=0xc00058a738 pc=0x60fc5ef2a109 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc00058a7e0 sp=0xc00058a7c8 pc=0x60fc5ef29fe5 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc00058a7e8 sp=0xc00058a7e0 pc=0x60fc5ef84021 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 51 gp=0xc0005841c0 m=nil [GC worker (idle)]: runtime.gopark(0x3fbf432594712?, 0x1?, 0xd5?, 0xf5?, 0x0?) runtime/proc.go:435 +0xce fp=0xc00058af38 sp=0xc00058af18 pc=0x60fc5ef7c8ee runtime.gcBgMarkWorker(0xc0000c9960) runtime/mgc.go:1423 +0xe9 fp=0xc00058afc8 sp=0xc00058af38 pc=0x60fc5ef2a109 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc00058afe0 sp=0xc00058afc8 pc=0x60fc5ef29fe5 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc00058afe8 sp=0xc00058afe0 pc=0x60fc5ef84021 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 52 gp=0xc000584380 m=nil [GC worker (idle)]: runtime.gopark(0x3fbf432613ecb?, 0x1?, 0x91?, 0xeb?, 0x0?) runtime/proc.go:435 +0xce fp=0xc00058b738 sp=0xc00058b718 pc=0x60fc5ef7c8ee runtime.gcBgMarkWorker(0xc0000c9960) runtime/mgc.go:1423 +0xe9 fp=0xc00058b7c8 sp=0xc00058b738 pc=0x60fc5ef2a109 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc00058b7e0 sp=0xc00058b7c8 pc=0x60fc5ef29fe5 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc00058b7e8 sp=0xc00058b7e0 pc=0x60fc5ef84021 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 1138 gp=0xc003132700 m=nil [select]: runtime.gopark(0xc038221a68?, 0x2?, 0x0?, 0x8b?, 0xc03822180c?) runtime/proc.go:435 +0xce fp=0xc038221620 sp=0xc038221600 pc=0x60fc5ef7c8ee runtime.selectgo(0xc038221a68, 0xc038221808, 0x901?, 0x0, 0x1?, 0x1) runtime/select.go:351 +0x837 fp=0xc038221758 sp=0xc038221620 pc=0x60fc5ef5b1f7 github.com/ollama/ollama/runner/ollamarunner.(*Server).completion(0xc00013d9e0, {0x60fc602544f8, 0xc005cea1c0}, 0xc000128280) github.com/ollama/ollama/runner/ollamarunner/runner.go:649 +0xad0 fp=0xc038221ac0 sp=0xc038221758 pc=0x60fc5f407e70 github.com/ollama/ollama/runner/ollamarunner.(*Server).completion-fm({0x60fc602544f8?, 0xc005cea1c0?}, 0xc000153b40?) <autogenerated>:1 +0x36 fp=0xc038221af0 sp=0xc038221ac0 pc=0x60fc5f40aa36 net/http.HandlerFunc.ServeHTTP(0xc000141500?, {0x60fc602544f8?, 0xc005cea1c0?}, 0xc000153b60?) net/http/server.go:2294 +0x29 fp=0xc038221b18 sp=0xc038221af0 pc=0x60fc5f27aa49 net/http.(*ServeMux).ServeHTTP(0x60fc5ef21125?, {0x60fc602544f8, 0xc005cea1c0}, 0xc000128280) net/http/server.go:2822 +0x1c4 fp=0xc038221b68 sp=0xc038221b18 pc=0x60fc5f27c944 net/http.serverHandler.ServeHTTP({0x60fc60250b10?}, {0x60fc602544f8?, 0xc005cea1c0?}, 0x1?) net/http/server.go:3301 +0x8e fp=0xc038221b98 sp=0xc038221b68 pc=0x60fc5f29a3ce net/http.(*conn).serve(0xc0004c2480, {0x60fc602565a8, 0xc000688d50}) net/http/server.go:2102 +0x625 fp=0xc038221fb8 sp=0xc038221b98 pc=0x60fc5f278f45 net/http.(*Server).Serve.gowrap3() net/http/server.go:3454 +0x28 fp=0xc038221fe0 sp=0xc038221fb8 pc=0x60fc5f27e808 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc038221fe8 sp=0xc038221fe0 pc=0x60fc5ef84021 created by net/http.(*Server).Serve in goroutine 1 net/http/server.go:3454 +0x485 goroutine 30 gp=0xc0032ff6c0 m=nil [IO wait]: runtime.gopark(0xc004883df8?, 0x60fc5ef83c7c?, 0x20?, 0x3e?, 0xb?) runtime/proc.go:435 +0xce fp=0xc004883dd8 sp=0xc004883db8 pc=0x60fc5ef7c8ee runtime.netpollblock(0x60fc5ef9fd78?, 0x5ef16226?, 0xfc?) runtime/netpoll.go:575 +0xf7 fp=0xc004883e10 sp=0xc004883dd8 pc=0x60fc5ef416f7 internal/poll.runtime_pollWait(0x7ee7261a6d98, 0x72) runtime/netpoll.go:351 +0x85 fp=0xc004883e30 sp=0xc004883e10 pc=0x60fc5ef7bb05 internal/poll.(*pollDesc).wait(0xc0005a2080?, 0xc00652c821?, 0x0) internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc004883e58 sp=0xc004883e30 pc=0x60fc5f002f87 internal/poll.(*pollDesc).waitRead(...) internal/poll/fd_poll_runtime.go:89 internal/poll.(*FD).Read(0xc0005a2080, {0xc00652c821, 0x1, 0x1}) internal/poll/fd_unix.go:165 +0x27a fp=0xc004883ef0 sp=0xc004883e58 pc=0x60fc5f00427a net.(*netFD).Read(0xc0005a2080, {0xc00652c821?, 0xc0030ae1d8?, 0xc004883f70?}) net/fd_posix.go:55 +0x25 fp=0xc004883f38 sp=0xc004883ef0 pc=0x60fc5f0791c5 net.(*conn).Read(0xc000134190, {0xc00652c821?, 0xc0030af880?, 0x60fc5f362e40?}) net/net.go:194 +0x45 fp=0xc004883f80 sp=0xc004883f38 pc=0x60fc5f087585 net/http.(*connReader).backgroundRead(0xc00652c810) net/http/server.go:690 +0x37 fp=0xc004883fc8 sp=0xc004883f80 pc=0x60fc5f272e17 net/http.(*connReader).startBackgroundRead.gowrap2() net/http/server.go:686 +0x25 fp=0xc004883fe0 sp=0xc004883fc8 pc=0x60fc5f272d45 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc004883fe8 sp=0xc004883fe0 pc=0x60fc5ef84021 created by net/http.(*connReader).startBackgroundRead in goroutine 1138 net/http/server.go:686 +0xb6 rax 0x7ee3b433d620 rbx 0x7ee3b433d590 rcx 0x2 rdx 0x7ee1f01beac0 rdi 0x0 rsi 0x7ee18cb3f030 rbp 0x7ee1f01beab0 rsp 0x7ee20affcc48 r8 0x4 r9 0xc000094048 r10 0x1 r11 0x206 r12 0x0 r13 0x7ee3b4003588 r14 0xa52 r15 0x7ee3b433d590 rip 0x60fc5fd56fd0 rflags 0x10206 cs 0x33 fs 0x0 gs 0x0 [GIN] 2025/03/12 - 14:23:39 | 500 | 8.107990206s | 172.17.0.2 | POST "/api/chat" time=2025-03-12T14:23:39.716Z level=ERROR source=server.go:449 msg="llama runner terminated" error="exit status 2" time=2025-03-12T14:23:44.670Z level=WARN source=sched.go:647 msg="gpu VRAM usage didn't recover within timeout" seconds=5.197443181 model=/home/arch/Desktop/AIstuff/llms/models/ollama/blobs/sha256-adca500fad9b54c565ae672184e0c9eb690eb6014ba63f8ec13849d4f73a32d3 time=2025-03-12T14:23:44.921Z level=WARN source=sched.go:647 msg="gpu VRAM usage didn't recover within timeout" seconds=5.447655667 model=/home/arch/Desktop/AIstuff/llms/models/ollama/blobs/sha256-adca500fad9b54c565ae672184e0c9eb690eb6014ba63f8ec13849d4f73a32d3 time=2025-03-12T14:23:45.180Z level=WARN source=sched.go:647 msg="gpu VRAM usage didn't recover within timeout" seconds=5.706586906 model=/home/arch/Desktop/AIstuff/llms/models/ollama/blobs/sha256-adca500fad9b54c565ae672184e0c9eb690eb6014ba63f8ec13849d4f73a32d3 time=2025-03-12T14:23:45.354Z level=INFO source=server.go:105 msg="system memory" total="62.4 GiB" free="54.1 GiB" free_swap="0 B" time=2025-03-12T14:23:45.354Z level=INFO source=server.go:138 msg=offload library=cuda layers.requested=-1 layers.model=49 layers.offload=33 layers.split="" memory.available="[14.5 GiB]" memory.gpu_overhead="0 B" memory.required.full="20.8 GiB" memory.required.partial="14.3 GiB" memory.required.kv="11.7 GiB" memory.required.allocations="[14.3 GiB]" memory.weights.total="17.7 GiB" memory.weights.repeating="17.0 GiB" memory.weights.nonrepeating="787.5 MiB" memory.graph.full="1.1 GiB" memory.graph.partial="1.3 GiB" time=2025-03-12T14:23:45.403Z level=WARN source=ggml.go:149 msg="key not found" key=tokenizer.ggml.pretokenizer default="(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\\r\\n\\p{L}\\p{N}]?\\p{L}+|\\p{N}{1,3}| ?[^\\s\\p{L}\\p{N}]+[\\r\\n]*|\\s*[\\r\\n]+|\\s+(?!\\S)|\\s+" time=2025-03-12T14:23:45.405Z level=WARN source=ggml.go:149 msg="key not found" key=tokenizer.ggml.add_eot_token default=false time=2025-03-12T14:23:45.406Z level=WARN source=ggml.go:149 msg="key not found" key=tokenizer.ggml.pretokenizer default="(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\\r\\n\\p{L}\\p{N}]?\\p{L}+|\\p{N}{1,3}| ?[^\\s\\p{L}\\p{N}]+[\\r\\n]*|\\s*[\\r\\n]+|\\s+(?!\\S)|\\s+" time=2025-03-12T14:23:45.408Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.attention.layer_norm_rms_epsilon default=9.999999974752427e-07 time=2025-03-12T14:23:45.409Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.rope.local.freq_base default=10000 time=2025-03-12T14:23:45.409Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06 time=2025-03-12T14:23:45.409Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.rope.freq_scale default=1 time=2025-03-12T14:23:45.409Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.final_logit_softcapping default=30 time=2025-03-12T14:23:45.409Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.mm_tokens_per_image default=256 time=2025-03-12T14:23:45.409Z level=INFO source=server.go:405 msg="starting llama server" cmd="/usr/local/bin/ollama runner --ollama-engine --model /home/arch/Desktop/AIstuff/llms/models/ollama/blobs/sha256-adca500fad9b54c565ae672184e0c9eb690eb6014ba63f8ec13849d4f73a32d3 --ctx-size 32000 --batch-size 512 --n-gpu-layers 33 --threads 6 --parallel 1 --port 33003" time=2025-03-12T14:23:45.409Z level=INFO source=sched.go:450 msg="loaded runners" count=1 time=2025-03-12T14:23:45.409Z level=INFO source=server.go:585 msg="waiting for llama runner to start responding" time=2025-03-12T14:23:45.409Z level=INFO source=server.go:619 msg="waiting for server to become available" status="llm server error" time=2025-03-12T14:23:45.417Z level=INFO source=runner.go:882 msg="starting ollama engine" time=2025-03-12T14:23:45.417Z level=INFO source=runner.go:938 msg="Server listening on 127.0.0.1:33003" time=2025-03-12T14:23:45.465Z level=WARN source=ggml.go:149 msg="key not found" key=general.name default="" time=2025-03-12T14:23:45.465Z level=WARN source=ggml.go:149 msg="key not found" key=general.description default="" time=2025-03-12T14:23:45.465Z level=INFO source=ggml.go:67 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=1065 num_key_values=36 ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 CUDA devices: Device 0: NVIDIA GeForce RTX 4090 Laptop GPU, compute capability 8.9, VMM: yes load_backend: loaded CUDA backend from /usr/local/lib/ollama/cuda_v12/libggml-cuda.so load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-alderlake.so time=2025-03-12T14:23:45.502Z level=INFO source=ggml.go:109 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX_VNNI=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=500,600,610,700,750,800,860,870,890,900,1200 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc) time=2025-03-12T14:23:45.610Z level=INFO source=ggml.go:289 msg="model weights" buffer=CPU size="4.2 GiB" time=2025-03-12T14:23:45.610Z level=INFO source=ggml.go:289 msg="model weights" buffer=CUDA0 size="4.1 GiB" time=2025-03-12T14:23:45.673Z level=INFO source=server.go:619 msg="waiting for server to become available" status="llm server loading model" time=2025-03-12T14:23:46.577Z level=INFO source=ggml.go:356 msg="compute graph" backend=CUDA0 buffer_type=CUDA0 time=2025-03-12T14:23:46.577Z level=INFO source=ggml.go:356 msg="compute graph" backend=CPU buffer_type=CUDA_Host time=2025-03-12T14:23:46.577Z level=WARN source=ggml.go:149 msg="key not found" key=tokenizer.ggml.pretokenizer default="(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\\r\\n\\p{L}\\p{N}]?\\p{L}+|\\p{N}{1,3}| ?[^\\s\\p{L}\\p{N}]+[\\r\\n]*|\\s*[\\r\\n]+|\\s+(?!\\S)|\\s+" time=2025-03-12T14:23:46.578Z level=WARN source=ggml.go:149 msg="key not found" key=tokenizer.ggml.add_eot_token default=false time=2025-03-12T14:23:46.580Z level=WARN source=ggml.go:149 msg="key not found" key=tokenizer.ggml.pretokenizer default="(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\\r\\n\\p{L}\\p{N}]?\\p{L}+|\\p{N}{1,3}| ?[^\\s\\p{L}\\p{N}]+[\\r\\n]*|\\s*[\\r\\n]+|\\s+(?!\\S)|\\s+" time=2025-03-12T14:23:46.582Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.attention.layer_norm_rms_epsilon default=9.999999974752427e-07 time=2025-03-12T14:23:46.582Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.rope.local.freq_base default=10000 time=2025-03-12T14:23:46.582Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06 time=2025-03-12T14:23:46.582Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.rope.freq_scale default=1 time=2025-03-12T14:23:46.582Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.final_logit_softcapping default=30 time=2025-03-12T14:23:46.582Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.mm_tokens_per_image default=256 time=2025-03-12T14:23:46.683Z level=INFO source=server.go:624 msg="llama runner started in 1.27 seconds" ggml_backend_cuda_buffer_type_alloc_buffer: allocating 15554.32 MiB on device 0: cudaMalloc failed: out of memory ggml_gallocr_reserve_n: failed to allocate CUDA0 buffer of size 16309888000 SIGSEGV: segmentation violation PC=0x64a7b881cfd0 m=43 sigcode=1 addr=0x58 signal arrived during cgo execution goroutine 15 gp=0xc00048ae00 m=43 mp=0xc004f00008 [syscall]: runtime.cgocall(0x64a7b8870fe0, 0xc0005f7b00) runtime/cgocall.go:167 +0x4b fp=0xc0005f7ad8 sp=0xc0005f7aa0 pc=0x64a7b7a3f60b github.com/ollama/ollama/ml/backend/ggml._Cfunc_ggml_backend_sched_graph_compute_async(0x7e6eb8003770, 0x7e73062bf2e0) _cgo_gotypes.go:485 +0x4a fp=0xc0005f7b00 sp=0xc0005f7ad8 pc=0x64a7b7e2a1aa github.com/ollama/ollama/ml/backend/ggml.Context.Compute.func1(...) github.com/ollama/ollama/ml/backend/ggml/ggml.go:497 github.com/ollama/ollama/ml/backend/ggml.Context.Compute({0xc00043c040, 0x7e707c002d40, 0x7e73062bf2e0, 0x0, 0x2000}, {0xc0050d0e00, 0x1, 0x64a7b8d2c210?}) github.com/ollama/ollama/ml/backend/ggml/ggml.go:497 +0xbd fp=0xc0005f7b90 sp=0xc0005f7b00 pc=0x64a7b7e32a7d github.com/ollama/ollama/ml/backend/ggml.(*Context).Compute(0xc005103b30?, {0xc0050d0e00?, 0x200?, 0x0?}) <autogenerated>:1 +0x72 fp=0xc0005f7c08 sp=0xc0005f7b90 pc=0x64a7b7e384f2 github.com/ollama/ollama/model.Forward({0x64a7b8d23c00, 0xc005103b30}, {0x64a7b8d1b2b0, 0xc0000e8000}, {{0xc004b69800, 0x200, 0x200}, {0x0, 0x0, 0x0}, ...}) github.com/ollama/ollama/model/model.go:303 +0x218 fp=0xc0005f7cf0 sp=0xc0005f7c08 pc=0x64a7b7e5f718 github.com/ollama/ollama/runner/ollamarunner.(*Server).processBatch(0xc000035d40) github.com/ollama/ollama/runner/ollamarunner/runner.go:395 +0x3bb fp=0xc0005f7f98 sp=0xc0005f7cf0 pc=0x64a7b7ecbdfb github.com/ollama/ollama/runner/ollamarunner.(*Server).run(0xc000035d40, {0x64a7b8d1c5e0, 0xc000611540}) github.com/ollama/ollama/runner/ollamarunner/runner.go:321 +0x4e fp=0xc0005f7fb8 sp=0xc0005f7f98 pc=0x64a7b7ecb9ee github.com/ollama/ollama/runner/ollamarunner.Execute.gowrap2() github.com/ollama/ollama/runner/ollamarunner/runner.go:919 +0x28 fp=0xc0005f7fe0 sp=0xc0005f7fb8 pc=0x64a7b7ed0668 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0005f7fe8 sp=0xc0005f7fe0 pc=0x64a7b7a4a021 created by github.com/ollama/ollama/runner/ollamarunner.Execute in goroutine 1 github.com/ollama/ollama/runner/ollamarunner/runner.go:919 +0xa9c goroutine 1 gp=0xc000002380 m=nil [IO wait]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc0005f3648 sp=0xc0005f3628 pc=0x64a7b7a428ee runtime.netpollblock(0xc0005f3698?, 0xb79dc226?, 0xa7?) runtime/netpoll.go:575 +0xf7 fp=0xc0005f3680 sp=0xc0005f3648 pc=0x64a7b7a076f7 internal/poll.runtime_pollWait(0x7e7405985de0, 0x72) runtime/netpoll.go:351 +0x85 fp=0xc0005f36a0 sp=0xc0005f3680 pc=0x64a7b7a41b05 internal/poll.(*pollDesc).wait(0xc0005b2600?, 0x9009e5cfe?, 0x0) internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc0005f36c8 sp=0xc0005f36a0 pc=0x64a7b7ac8f87 internal/poll.(*pollDesc).waitRead(...) internal/poll/fd_poll_runtime.go:89 internal/poll.(*FD).Accept(0xc0005b2600) internal/poll/fd_unix.go:620 +0x295 fp=0xc0005f3770 sp=0xc0005f36c8 pc=0x64a7b7ace355 net.(*netFD).accept(0xc0005b2600) net/fd_unix.go:172 +0x29 fp=0xc0005f3828 sp=0xc0005f3770 pc=0x64a7b7b41169 net.(*TCPListener).accept(0xc0000bcf80) net/tcpsock_posix.go:159 +0x1b fp=0xc0005f3878 sp=0xc0005f3828 pc=0x64a7b7b56b1b net.(*TCPListener).Accept(0xc0000bcf80) net/tcpsock.go:380 +0x30 fp=0xc0005f38a8 sp=0xc0005f3878 pc=0x64a7b7b559d0 net/http.(*onceCloseListener).Accept(0xc0005c2120?) <autogenerated>:1 +0x24 fp=0xc0005f38c0 sp=0xc0005f38a8 pc=0x64a7b7d6cb44 net/http.(*Server).Serve(0xc0001f9400, {0x64a7b8d1a318, 0xc0000bcf80}) net/http/server.go:3424 +0x30c fp=0xc0005f39f0 sp=0xc0005f38c0 pc=0x64a7b7d4440c github.com/ollama/ollama/runner/ollamarunner.Execute({0xc000132030, 0xe, 0xf}) github.com/ollama/ollama/runner/ollamarunner/runner.go:939 +0xe6a fp=0xc0005f3d08 sp=0xc0005f39f0 pc=0x64a7b7ed03aa github.com/ollama/ollama/runner.Execute({0xc000132010?, 0x0?, 0x0?}) github.com/ollama/ollama/runner/runner.go:20 +0xc9 fp=0xc0005f3d30 sp=0xc0005f3d08 pc=0x64a7b7ed0f09 github.com/ollama/ollama/cmd.NewCLI.func2(0xc0001f9200?, {0x64a7b888c054?, 0x4?, 0x64a7b888c058?}) github.com/ollama/ollama/cmd/cmd.go:1285 +0x45 fp=0xc0005f3d58 sp=0xc0005f3d30 pc=0x64a7b86416a5 github.com/spf13/cobra.(*Command).execute(0xc0005c4f08, {0xc00067eff0, 0xf, 0xf}) github.com/spf13/cobra@v1.7.0/command.go:940 +0x85c fp=0xc0005f3e78 sp=0xc0005f3d58 pc=0x64a7b7bba2fc github.com/spf13/cobra.(*Command).ExecuteC(0xc0005ac908) github.com/spf13/cobra@v1.7.0/command.go:1068 +0x3a5 fp=0xc0005f3f30 sp=0xc0005f3e78 pc=0x64a7b7bbab45 github.com/spf13/cobra.(*Command).Execute(...) github.com/spf13/cobra@v1.7.0/command.go:992 github.com/spf13/cobra.(*Command).ExecuteContext(...) github.com/spf13/cobra@v1.7.0/command.go:985 main.main() github.com/ollama/ollama/main.go:12 +0x4d fp=0xc0005f3f50 sp=0xc0005f3f30 pc=0x64a7b8641a0d runtime.main() runtime/proc.go:283 +0x29d fp=0xc0005f3fe0 sp=0xc0005f3f50 pc=0x64a7b7a0ecfd runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0005f3fe8 sp=0xc0005f3fe0 pc=0x64a7b7a4a021 goroutine 2 gp=0xc000002e00 m=nil [force gc (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000090fa8 sp=0xc000090f88 pc=0x64a7b7a428ee runtime.goparkunlock(...) runtime/proc.go:441 runtime.forcegchelper() runtime/proc.go:348 +0xb8 fp=0xc000090fe0 sp=0xc000090fa8 pc=0x64a7b7a0f038 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000090fe8 sp=0xc000090fe0 pc=0x64a7b7a4a021 created by runtime.init.7 in goroutine 1 runtime/proc.go:336 +0x1a goroutine 3 gp=0xc000003340 m=nil [GC sweep wait]: runtime.gopark(0x1?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000091780 sp=0xc000091760 pc=0x64a7b7a428ee runtime.goparkunlock(...) runtime/proc.go:441 runtime.bgsweep(0xc000042100) runtime/mgcsweep.go:316 +0xdf fp=0xc0000917c8 sp=0xc000091780 pc=0x64a7b79f985f runtime.gcenable.gowrap1() runtime/mgc.go:204 +0x25 fp=0xc0000917e0 sp=0xc0000917c8 pc=0x64a7b79edc45 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0000917e8 sp=0xc0000917e0 pc=0x64a7b7a4a021 created by runtime.gcenable in goroutine 1 runtime/mgc.go:204 +0x66 goroutine 4 gp=0xc000003500 m=nil [GC scavenge wait]: runtime.gopark(0x10000?, 0x64a7b8a42a00?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000091f78 sp=0xc000091f58 pc=0x64a7b7a428ee runtime.goparkunlock(...) runtime/proc.go:441 runtime.(*scavengerState).park(0x64a7b9580b40) runtime/mgcscavenge.go:425 +0x49 fp=0xc000091fa8 sp=0xc000091f78 pc=0x64a7b79f72a9 runtime.bgscavenge(0xc000042100) runtime/mgcscavenge.go:658 +0x59 fp=0xc000091fc8 sp=0xc000091fa8 pc=0x64a7b79f7839 runtime.gcenable.gowrap2() runtime/mgc.go:205 +0x25 fp=0xc000091fe0 sp=0xc000091fc8 pc=0x64a7b79edbe5 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000091fe8 sp=0xc000091fe0 pc=0x64a7b7a4a021 created by runtime.gcenable in goroutine 1 runtime/mgc.go:205 +0xa5 goroutine 18 gp=0xc000102700 m=nil [finalizer wait]: runtime.gopark(0x1b8?, 0xc000002380?, 0x1?, 0x23?, 0xc000090688?) runtime/proc.go:435 +0xce fp=0xc000090630 sp=0xc000090610 pc=0x64a7b7a428ee runtime.runfinq() runtime/mfinal.go:196 +0x107 fp=0xc0000907e0 sp=0xc000090630 pc=0x64a7b79ecc07 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0000907e8 sp=0xc0000907e0 pc=0x64a7b7a4a021 created by runtime.createfing in goroutine 1 runtime/mfinal.go:166 +0x3d goroutine 19 gp=0xc000103180 m=nil [chan receive]: runtime.gopark(0xc0002297c0?, 0xc00016c180?, 0x60?, 0xc7?, 0x64a7b7b27ea8?) runtime/proc.go:435 +0xce fp=0xc00008c718 sp=0xc00008c6f8 pc=0x64a7b7a428ee runtime.chanrecv(0xc000110380, 0x0, 0x1) runtime/chan.go:664 +0x445 fp=0xc00008c790 sp=0xc00008c718 pc=0x64a7b79dee05 runtime.chanrecv1(0x0?, 0x0?) runtime/chan.go:506 +0x12 fp=0xc00008c7b8 sp=0xc00008c790 pc=0x64a7b79de992 runtime.unique_runtime_registerUniqueMapCleanup.func2(...) runtime/mgc.go:1796 runtime.unique_runtime_registerUniqueMapCleanup.gowrap1() runtime/mgc.go:1799 +0x2f fp=0xc00008c7e0 sp=0xc00008c7b8 pc=0x64a7b79f0def runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc00008c7e8 sp=0xc00008c7e0 pc=0x64a7b7a4a021 created by unique.runtime_registerUniqueMapCleanup in goroutine 1 runtime/mgc.go:1794 +0x85 goroutine 20 gp=0xc000103500 m=nil [GC worker (idle)]: runtime.gopark(0x64a7b962f2c0?, 0x1?, 0xf6?, 0x45?, 0x0?) runtime/proc.go:435 +0xce fp=0xc00008cf38 sp=0xc00008cf18 pc=0x64a7b7a428ee runtime.gcBgMarkWorker(0xc0001117a0) runtime/mgc.go:1423 +0xe9 fp=0xc00008cfc8 sp=0xc00008cf38 pc=0x64a7b79f0109 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc00008cfe0 sp=0xc00008cfc8 pc=0x64a7b79effe5 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc00008cfe8 sp=0xc00008cfe0 pc=0x64a7b7a4a021 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 21 gp=0xc0001036c0 m=nil [GC worker (idle)]: runtime.gopark(0x3fbf62c640f7e?, 0x1?, 0x87?, 0xaa?, 0x0?) runtime/proc.go:435 +0xce fp=0xc00008d738 sp=0xc00008d718 pc=0x64a7b7a428ee runtime.gcBgMarkWorker(0xc0001117a0) runtime/mgc.go:1423 +0xe9 fp=0xc00008d7c8 sp=0xc00008d738 pc=0x64a7b79f0109 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc00008d7e0 sp=0xc00008d7c8 pc=0x64a7b79effe5 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc00008d7e8 sp=0xc00008d7e0 pc=0x64a7b7a4a021 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 22 gp=0xc000103880 m=nil [GC worker (idle)]: runtime.gopark(0x3fbf60fff63af?, 0x3?, 0x51?, 0x97?, 0x0?) runtime/proc.go:435 +0xce fp=0xc00008df38 sp=0xc00008df18 pc=0x64a7b7a428ee runtime.gcBgMarkWorker(0xc0001117a0) runtime/mgc.go:1423 +0xe9 fp=0xc00008dfc8 sp=0xc00008df38 pc=0x64a7b79f0109 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc00008dfe0 sp=0xc00008dfc8 pc=0x64a7b79effe5 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc00008dfe8 sp=0xc00008dfe0 pc=0x64a7b7a4a021 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 23 gp=0xc000103a40 m=nil [GC worker (idle)]: runtime.gopark(0x3fbf60fff7152?, 0x3?, 0x93?, 0xf7?, 0x0?) runtime/proc.go:435 +0xce fp=0xc00008e738 sp=0xc00008e718 pc=0x64a7b7a428ee runtime.gcBgMarkWorker(0xc0001117a0) runtime/mgc.go:1423 +0xe9 fp=0xc00008e7c8 sp=0xc00008e738 pc=0x64a7b79f0109 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc00008e7e0 sp=0xc00008e7c8 pc=0x64a7b79effe5 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc00008e7e8 sp=0xc00008e7e0 pc=0x64a7b7a4a021 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 24 gp=0xc000103c00 m=nil [GC worker (idle)]: runtime.gopark(0x3fbf62d507189?, 0x1?, 0x4e?, 0x73?, 0x0?) runtime/proc.go:435 +0xce fp=0xc00008ef38 sp=0xc00008ef18 pc=0x64a7b7a428ee runtime.gcBgMarkWorker(0xc0001117a0) runtime/mgc.go:1423 +0xe9 fp=0xc00008efc8 sp=0xc00008ef38 pc=0x64a7b79f0109 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc00008efe0 sp=0xc00008efc8 pc=0x64a7b79effe5 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc00008efe8 sp=0xc00008efe0 pc=0x64a7b7a4a021 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 25 gp=0xc000103dc0 m=nil [GC worker (idle)]: runtime.gopark(0x3fbf62d05f63a?, 0x1?, 0x24?, 0x11?, 0x0?) runtime/proc.go:435 +0xce fp=0xc00008f738 sp=0xc00008f718 pc=0x64a7b7a428ee runtime.gcBgMarkWorker(0xc0001117a0) runtime/mgc.go:1423 +0xe9 fp=0xc00008f7c8 sp=0xc00008f738 pc=0x64a7b79f0109 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc00008f7e0 sp=0xc00008f7c8 pc=0x64a7b79effe5 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc00008f7e8 sp=0xc00008f7e0 pc=0x64a7b7a4a021 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 26 gp=0xc000478000 m=nil [GC worker (idle)]: runtime.gopark(0x64a7b962f2c0?, 0x1?, 0x3a?, 0x7?, 0x0?) runtime/proc.go:435 +0xce fp=0xc00008ff38 sp=0xc00008ff18 pc=0x64a7b7a428ee runtime.gcBgMarkWorker(0xc0001117a0) runtime/mgc.go:1423 +0xe9 fp=0xc00008ffc8 sp=0xc00008ff38 pc=0x64a7b79f0109 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc00008ffe0 sp=0xc00008ffc8 pc=0x64a7b79effe5 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc00008ffe8 sp=0xc00008ffe0 pc=0x64a7b7a4a021 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 34 gp=0xc0003b8000 m=nil [GC worker (idle)]: runtime.gopark(0x64a7b962f2c0?, 0x1?, 0xd3?, 0x46?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000484738 sp=0xc000484718 pc=0x64a7b7a428ee runtime.gcBgMarkWorker(0xc0001117a0) runtime/mgc.go:1423 +0xe9 fp=0xc0004847c8 sp=0xc000484738 pc=0x64a7b79f0109 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc0004847e0 sp=0xc0004847c8 pc=0x64a7b79effe5 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0004847e8 sp=0xc0004847e0 pc=0x64a7b7a4a021 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 35 gp=0xc0003b81c0 m=nil [GC worker (idle)]: runtime.gopark(0x3fbf60ffee25c?, 0x1?, 0x79?, 0x74?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000484f38 sp=0xc000484f18 pc=0x64a7b7a428ee runtime.gcBgMarkWorker(0xc0001117a0) runtime/mgc.go:1423 +0xe9 fp=0xc000484fc8 sp=0xc000484f38 pc=0x64a7b79f0109 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc000484fe0 sp=0xc000484fc8 pc=0x64a7b79effe5 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000484fe8 sp=0xc000484fe0 pc=0x64a7b7a4a021 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 36 gp=0xc0003b8380 m=nil [GC worker (idle)]: runtime.gopark(0x3fbf62d05e2ed?, 0x1?, 0xb7?, 0x49?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000485738 sp=0xc000485718 pc=0x64a7b7a428ee runtime.gcBgMarkWorker(0xc0001117a0) runtime/mgc.go:1423 +0xe9 fp=0xc0004857c8 sp=0xc000485738 pc=0x64a7b79f0109 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc0004857e0 sp=0xc0004857c8 pc=0x64a7b79effe5 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0004857e8 sp=0xc0004857e0 pc=0x64a7b7a4a021 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 37 gp=0xc0003b8540 m=nil [GC worker (idle)]: runtime.gopark(0x64a7b962f2c0?, 0x3?, 0x7?, 0x4f?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000485f38 sp=0xc000485f18 pc=0x64a7b7a428ee runtime.gcBgMarkWorker(0xc0001117a0) runtime/mgc.go:1423 +0xe9 fp=0xc000485fc8 sp=0xc000485f38 pc=0x64a7b79f0109 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc000485fe0 sp=0xc000485fc8 pc=0x64a7b79effe5 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000485fe8 sp=0xc000485fe0 pc=0x64a7b7a4a021 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 5 gp=0xc000003a40 m=nil [GC worker (idle)]: runtime.gopark(0x3fbf60fff4eb4?, 0x3?, 0x54?, 0x29?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000092738 sp=0xc000092718 pc=0x64a7b7a428ee runtime.gcBgMarkWorker(0xc0001117a0) runtime/mgc.go:1423 +0xe9 fp=0xc0000927c8 sp=0xc000092738 pc=0x64a7b79f0109 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc0000927e0 sp=0xc0000927c8 pc=0x64a7b79effe5 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0000927e8 sp=0xc0000927e0 pc=0x64a7b7a4a021 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 6 gp=0xc000003c00 m=nil [GC worker (idle)]: runtime.gopark(0x3fbf610034dde?, 0x3?, 0x8d?, 0x13?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000092f38 sp=0xc000092f18 pc=0x64a7b7a428ee runtime.gcBgMarkWorker(0xc0001117a0) runtime/mgc.go:1423 +0xe9 fp=0xc000092fc8 sp=0xc000092f38 pc=0x64a7b79f0109 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc000092fe0 sp=0xc000092fc8 pc=0x64a7b79effe5 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000092fe8 sp=0xc000092fe0 pc=0x64a7b7a4a021 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 7 gp=0xc000003dc0 m=nil [GC worker (idle)]: runtime.gopark(0x3fbf60fff0551?, 0x3?, 0xfd?, 0x35?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000093738 sp=0xc000093718 pc=0x64a7b7a428ee runtime.gcBgMarkWorker(0xc0001117a0) runtime/mgc.go:1423 +0xe9 fp=0xc0000937c8 sp=0xc000093738 pc=0x64a7b79f0109 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc0000937e0 sp=0xc0000937c8 pc=0x64a7b79effe5 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0000937e8 sp=0xc0000937e0 pc=0x64a7b7a4a021 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 8 gp=0xc0000c8000 m=nil [GC worker (idle)]: runtime.gopark(0x3fbf62c5331e9?, 0x1?, 0x8d?, 0xe5?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000093f38 sp=0xc000093f18 pc=0x64a7b7a428ee runtime.gcBgMarkWorker(0xc0001117a0) runtime/mgc.go:1423 +0xe9 fp=0xc000093fc8 sp=0xc000093f38 pc=0x64a7b79f0109 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc000093fe0 sp=0xc000093fc8 pc=0x64a7b79effe5 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000093fe8 sp=0xc000093fe0 pc=0x64a7b7a4a021 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 9 gp=0xc0000c81c0 m=nil [GC worker (idle)]: runtime.gopark(0x3fbf610030717?, 0x1?, 0xaa?, 0x25?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000480738 sp=0xc000480718 pc=0x64a7b7a428ee runtime.gcBgMarkWorker(0xc0001117a0) runtime/mgc.go:1423 +0xe9 fp=0xc0004807c8 sp=0xc000480738 pc=0x64a7b79f0109 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc0004807e0 sp=0xc0004807c8 pc=0x64a7b79effe5 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0004807e8 sp=0xc0004807e0 pc=0x64a7b7a4a021 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 10 gp=0xc0000c8380 m=nil [GC worker (idle)]: runtime.gopark(0x3fbf62ca87c98?, 0x1?, 0xfb?, 0xc?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000480f38 sp=0xc000480f18 pc=0x64a7b7a428ee runtime.gcBgMarkWorker(0xc0001117a0) runtime/mgc.go:1423 +0xe9 fp=0xc000480fc8 sp=0xc000480f38 pc=0x64a7b79f0109 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc000480fe0 sp=0xc000480fc8 pc=0x64a7b79effe5 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000480fe8 sp=0xc000480fe0 pc=0x64a7b7a4a021 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 11 gp=0xc0000c8540 m=nil [GC worker (idle)]: runtime.gopark(0x3fbf60ffe5948?, 0x1?, 0x23?, 0xa2?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000481738 sp=0xc000481718 pc=0x64a7b7a428ee runtime.gcBgMarkWorker(0xc0001117a0) runtime/mgc.go:1423 +0xe9 fp=0xc0004817c8 sp=0xc000481738 pc=0x64a7b79f0109 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc0004817e0 sp=0xc0004817c8 pc=0x64a7b79effe5 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0004817e8 sp=0xc0004817e0 pc=0x64a7b7a4a021 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 12 gp=0xc0000c8700 m=nil [GC worker (idle)]: runtime.gopark(0x3fbf62c420cca?, 0x1?, 0xca?, 0xc?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000481f38 sp=0xc000481f18 pc=0x64a7b7a428ee runtime.gcBgMarkWorker(0xc0001117a0) runtime/mgc.go:1423 +0xe9 fp=0xc000481fc8 sp=0xc000481f38 pc=0x64a7b79f0109 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc000481fe0 sp=0xc000481fc8 pc=0x64a7b79effe5 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000481fe8 sp=0xc000481fe0 pc=0x64a7b7a4a021 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 13 gp=0xc0000c88c0 m=nil [GC worker (idle)]: runtime.gopark(0x64a7b962f2c0?, 0x1?, 0x76?, 0x60?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000482738 sp=0xc000482718 pc=0x64a7b7a428ee runtime.gcBgMarkWorker(0xc0001117a0) runtime/mgc.go:1423 +0xe9 fp=0xc0004827c8 sp=0xc000482738 pc=0x64a7b79f0109 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc0004827e0 sp=0xc0004827c8 pc=0x64a7b79effe5 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0004827e8 sp=0xc0004827e0 pc=0x64a7b7a4a021 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 1170 gp=0xc00048ac40 m=nil [IO wait]: runtime.gopark(0x64a7b79e7046?, 0xc000489808?, 0x0?, 0xc0?, 0xb?) runtime/proc.go:435 +0xce fp=0xc004c23dd8 sp=0xc004c23db8 pc=0x64a7b7a428ee runtime.netpollblock(0x64a7b7a65d78?, 0xb79dc226?, 0xa7?) runtime/netpoll.go:575 +0xf7 fp=0xc004c23e10 sp=0xc004c23dd8 pc=0x64a7b7a076f7 internal/poll.runtime_pollWait(0x7e7405985cc8, 0x72) runtime/netpoll.go:351 +0x85 fp=0xc004c23e30 sp=0xc004c23e10 pc=0x64a7b7a41b05 internal/poll.(*pollDesc).wait(0xc0005b2000?, 0xc003b007f1?, 0x0) internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc004c23e58 sp=0xc004c23e30 pc=0x64a7b7ac8f87 internal/poll.(*pollDesc).waitRead(...) internal/poll/fd_poll_runtime.go:89 internal/poll.(*FD).Read(0xc0005b2000, {0xc003b007f1, 0x1, 0x1}) internal/poll/fd_unix.go:165 +0x27a fp=0xc004c23ef0 sp=0xc004c23e58 pc=0x64a7b7aca27a net.(*netFD).Read(0xc0005b2000, {0xc003b007f1?, 0x0?, 0x0?}) net/fd_posix.go:55 +0x25 fp=0xc004c23f38 sp=0xc004c23ef0 pc=0x64a7b7b3f1c5 net.(*conn).Read(0xc003aa80c8, {0xc003b007f1?, 0xc00313b880?, 0x64a7b7e28e40?}) net/net.go:194 +0x45 fp=0xc004c23f80 sp=0xc004c23f38 pc=0x64a7b7b4d585 net/http.(*connReader).backgroundRead(0xc003b007e0) net/http/server.go:690 +0x37 fp=0xc004c23fc8 sp=0xc004c23f80 pc=0x64a7b7d38e17 net/http.(*connReader).startBackgroundRead.gowrap2() net/http/server.go:686 +0x25 fp=0xc004c23fe0 sp=0xc004c23fc8 pc=0x64a7b7d38d45 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc004c23fe8 sp=0xc004c23fe0 pc=0x64a7b7a4a021 created by net/http.(*connReader).startBackgroundRead in goroutine 914 net/http/server.go:686 +0xb6 goroutine 914 gp=0xc000502fc0 m=nil [select]: runtime.gopark(0xc00004ba68?, 0x2?, 0x0?, 0x5?, 0xc00004b80c?) runtime/proc.go:435 +0xce fp=0xc00004b620 sp=0xc00004b600 pc=0x64a7b7a428ee runtime.selectgo(0xc00004ba68, 0xc00004b808, 0x8d9?, 0x0, 0x1?, 0x1) runtime/select.go:351 +0x837 fp=0xc00004b758 sp=0xc00004b620 pc=0x64a7b7a211f7 github.com/ollama/ollama/runner/ollamarunner.(*Server).completion(0xc000035d40, {0x64a7b8d1a4f8, 0xc004b66000}, 0xc004ac5400) github.com/ollama/ollama/runner/ollamarunner/runner.go:649 +0xad0 fp=0xc00004bac0 sp=0xc00004b758 pc=0x64a7b7ecde70 github.com/ollama/ollama/runner/ollamarunner.(*Server).completion-fm({0x64a7b8d1a4f8?, 0xc004b66000?}, 0xc0005f7b40?) <autogenerated>:1 +0x36 fp=0xc00004baf0 sp=0xc00004bac0 pc=0x64a7b7ed0a36 net/http.HandlerFunc.ServeHTTP(0xc00065c780?, {0x64a7b8d1a4f8?, 0xc004b66000?}, 0xc0005f7b60?) net/http/server.go:2294 +0x29 fp=0xc00004bb18 sp=0xc00004baf0 pc=0x64a7b7d40a49 net/http.(*ServeMux).ServeHTTP(0x64a7b79e7125?, {0x64a7b8d1a4f8, 0xc004b66000}, 0xc004ac5400) net/http/server.go:2822 +0x1c4 fp=0xc00004bb68 sp=0xc00004bb18 pc=0x64a7b7d42944 net/http.serverHandler.ServeHTTP({0x64a7b8d16b10?}, {0x64a7b8d1a4f8?, 0xc004b66000?}, 0x1?) net/http/server.go:3301 +0x8e fp=0xc00004bb98 sp=0xc00004bb68 pc=0x64a7b7d603ce net/http.(*conn).serve(0xc0005c2120, {0x64a7b8d1c5a8, 0xc0005c0b10}) net/http/server.go:2102 +0x625 fp=0xc00004bfb8 sp=0xc00004bb98 pc=0x64a7b7d3ef45 net/http.(*Server).Serve.gowrap3() net/http/server.go:3454 +0x28 fp=0xc00004bfe0 sp=0xc00004bfb8 pc=0x64a7b7d44808 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc00004bfe8 sp=0xc00004bfe0 pc=0x64a7b7a4a021 created by net/http.(*Server).Serve in goroutine 1 net/http/server.go:3454 +0x485 rax 0x7e6eb833d960 rbx 0x7e6eb833d8d0 rcx 0x2 rdx 0x7e707c1c7840 rdi 0x0 rsi 0x7e6e6db39030 rbp 0x7e707c1c7830 rsp 0x7e70937fdc48 r8 0x4 r9 0xc00011c040 r10 0x1 r11 0x206 r12 0x0 r13 0x7e6eb80038c8 r14 0xa52 r15 0x7e6eb833d8d0 rip 0x64a7b881cfd0 rflags 0x10206 cs 0x33 fs 0x0 gs 0x0 [GIN] 2025/03/12 - 14:23:47 | 500 | 7.998409014s | 172.17.0.2 | POST "/api/chat" time=2025-03-12T14:23:47.709Z level=ERROR source=server.go:449 msg="llama runner terminated" error="exit status 2" ``` here's the logs when using ollama run.. ```2025/03/12 14:42:20 routes.go:1225: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:2048 OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/home/arch/Desktop/AIstuff/llms/models/ollama OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]" time=2025-03-12T14:42:20.304Z level=INFO source=images.go:432 msg="total blobs: 55" time=2025-03-12T14:42:20.304Z level=INFO source=images.go:439 msg="total unused blobs removed: 0" time=2025-03-12T14:42:20.304Z level=INFO source=routes.go:1292 msg="Listening on [::]:11434 (version 0.6.0)" time=2025-03-12T14:42:20.304Z level=INFO source=gpu.go:217 msg="looking for compatible GPUs" time=2025-03-12T14:42:20.540Z level=INFO source=types.go:130 msg="inference compute" id=GPU-07f2832b-0aaa-0fda-9cc3-d53aecc3d142 library=cuda variant=v12 compute=8.9 driver=12.8 name="NVIDIA GeForce RTX 4090 Laptop GPU" total="15.6 GiB" available="14.5 GiB" [GIN] 2025/03/12 - 14:42:26 | 200 | 27.388µs | 127.0.0.1 | HEAD "/" [GIN] 2025/03/12 - 14:42:26 | 200 | 26.808883ms | 127.0.0.1 | POST "/api/show" time=2025-03-12T14:42:26.763Z level=INFO source=server.go:105 msg="system memory" total="62.4 GiB" free="54.9 GiB" free_swap="0 B" time=2025-03-12T14:42:26.764Z level=INFO source=server.go:138 msg=offload library=cuda layers.requested=-1 layers.model=63 layers.offload=49 layers.split="" memory.available="[14.5 GiB]" memory.gpu_overhead="0 B" memory.required.full="18.8 GiB" memory.required.partial="14.3 GiB" memory.required.kv="992.0 MiB" memory.required.allocations="[14.3 GiB]" memory.weights.total="15.3 GiB" memory.weights.repeating="14.2 GiB" memory.weights.nonrepeating="1.1 GiB" memory.graph.full="522.5 MiB" memory.graph.partial="1.6 GiB" time=2025-03-12T14:42:26.813Z level=WARN source=ggml.go:149 msg="key not found" key=tokenizer.ggml.pretokenizer default="(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\\r\\n\\p{L}\\p{N}]?\\p{L}+|\\p{N}{1,3}| ?[^\\s\\p{L}\\p{N}]+[\\r\\n]*|\\s*[\\r\\n]+|\\s+(?!\\S)|\\s+" time=2025-03-12T14:42:26.815Z level=WARN source=ggml.go:149 msg="key not found" key=tokenizer.ggml.add_eot_token default=false time=2025-03-12T14:42:26.816Z level=WARN source=ggml.go:149 msg="key not found" key=tokenizer.ggml.pretokenizer default="(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\\r\\n\\p{L}\\p{N}]?\\p{L}+|\\p{N}{1,3}| ?[^\\s\\p{L}\\p{N}]+[\\r\\n]*|\\s*[\\r\\n]+|\\s+(?!\\S)|\\s+" time=2025-03-12T14:42:26.819Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.attention.layer_norm_rms_epsilon default=9.999999974752427e-07 time=2025-03-12T14:42:26.819Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.rope.local.freq_base default=10000 time=2025-03-12T14:42:26.819Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06 time=2025-03-12T14:42:26.819Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.rope.freq_scale default=1 time=2025-03-12T14:42:26.819Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.final_logit_softcapping default=30 time=2025-03-12T14:42:26.819Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.mm_tokens_per_image default=256 time=2025-03-12T14:42:26.819Z level=INFO source=server.go:405 msg="starting llama server" cmd="/usr/local/bin/ollama runner --ollama-engine --model /home/arch/Desktop/AIstuff/llms/models/ollama/blobs/sha256-afa0ea2ef463c87a1eebb9af070e76a353107493b5d9a62e5e66f65a65409541 --ctx-size 2048 --batch-size 512 --n-gpu-layers 49 --threads 6 --parallel 1 --port 38197" time=2025-03-12T14:42:26.820Z level=INFO source=sched.go:450 msg="loaded runners" count=1 time=2025-03-12T14:42:26.820Z level=INFO source=server.go:585 msg="waiting for llama runner to start responding" time=2025-03-12T14:42:26.820Z level=INFO source=server.go:619 msg="waiting for server to become available" status="llm server error" time=2025-03-12T14:42:26.827Z level=INFO source=runner.go:882 msg="starting ollama engine" time=2025-03-12T14:42:26.827Z level=INFO source=runner.go:938 msg="Server listening on 127.0.0.1:38197" time=2025-03-12T14:42:26.875Z level=WARN source=ggml.go:149 msg="key not found" key=general.name default="" time=2025-03-12T14:42:26.875Z level=WARN source=ggml.go:149 msg="key not found" key=general.description default="" time=2025-03-12T14:42:26.875Z level=INFO source=ggml.go:67 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=1247 num_key_values=36 ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 CUDA devices: Device 0: NVIDIA GeForce RTX 4090 Laptop GPU, compute capability 8.9, VMM: yes load_backend: loaded CUDA backend from /usr/local/lib/ollama/cuda_v12/libggml-cuda.so load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-alderlake.so time=2025-03-12T14:42:26.930Z level=INFO source=ggml.go:109 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX_VNNI=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=500,600,610,700,750,800,860,870,890,900,1200 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc) time=2025-03-12T14:42:27.037Z level=INFO source=ggml.go:289 msg="model weights" buffer=CPU size="6.0 GiB" time=2025-03-12T14:42:27.037Z level=INFO source=ggml.go:289 msg="model weights" buffer=CUDA0 size="11.2 GiB" time=2025-03-12T14:42:27.072Z level=INFO source=server.go:619 msg="waiting for server to become available" status="llm server loading model" time=2025-03-12T14:42:29.261Z level=INFO source=ggml.go:356 msg="compute graph" backend=CUDA0 buffer_type=CUDA0 time=2025-03-12T14:42:29.261Z level=INFO source=ggml.go:356 msg="compute graph" backend=CPU buffer_type=CUDA_Host time=2025-03-12T14:42:29.262Z level=WARN source=ggml.go:149 msg="key not found" key=tokenizer.ggml.pretokenizer default="(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\\r\\n\\p{L}\\p{N}]?\\p{L}+|\\p{N}{1,3}| ?[^\\s\\p{L}\\p{N}]+[\\r\\n]*|\\s*[\\r\\n]+|\\s+(?!\\S)|\\s+" time=2025-03-12T14:42:29.263Z level=WARN source=ggml.go:149 msg="key not found" key=tokenizer.ggml.add_eot_token default=false time=2025-03-12T14:42:29.264Z level=WARN source=ggml.go:149 msg="key not found" key=tokenizer.ggml.pretokenizer default="(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\\r\\n\\p{L}\\p{N}]?\\p{L}+|\\p{N}{1,3}| ?[^\\s\\p{L}\\p{N}]+[\\r\\n]*|\\s*[\\r\\n]+|\\s+(?!\\S)|\\s+" time=2025-03-12T14:42:29.267Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.attention.layer_norm_rms_epsilon default=9.999999974752427e-07 time=2025-03-12T14:42:29.267Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.rope.local.freq_base default=10000 time=2025-03-12T14:42:29.267Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06 time=2025-03-12T14:42:29.267Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.rope.freq_scale default=1 time=2025-03-12T14:42:29.267Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.final_logit_softcapping default=30 time=2025-03-12T14:42:29.267Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.mm_tokens_per_image default=256 time=2025-03-12T14:42:29.343Z level=INFO source=server.go:624 msg="llama runner started in 2.52 seconds" [GIN] 2025/03/12 - 14:42:29 | 200 | 3.024439048s | 127.0.0.1 | POST "/api/generate" [GIN] 2025/03/12 - 14:42:48 | 200 | 4.071745585s | 127.0.0.1 | POST "/api/chat" ```

GiteaMirror commented

2026-05-04 13:41:34 -05:00

@steren commented on GitHub (Mar 12, 2025):

I am hitting the same issue when deploying to Google Cloud Run. Which offers NVIDIA L4 GPUs that has 24GB of vRAM on an instance with 32GB of RAM

4b and 12b work without issue

Here are my logs:

ERROR 2025-03-12T14:49:05.102075Z [httpRequest.requestMethod: POST] [httpRequest.status: 500] [httpRequest.responseSize: 165 B] [httpRequest.latency: 5.352 s] [httpRequest.userAgent: ollama/0.6.0 (amd64 linux) Go/go1.24.0] https://ollama-250756049697.us-central1.run.app/api/generate
DEFAULT 2025-03-12T14:49:05.367645Z time=2025-03-12T14:49:05.365Z level=INFO source=sched.go:715 msg="new model will fit in available VRAM in single GPU, loading" model=/root/.ollama/models/blobs/sha256-afa0ea2ef463c87a1eebb9af070e76a353107493b5d9a62e5e66f65a65409541 gpu=GPU-1f3cb124-028c-8cbf-9910-e49db5cfbbcd parallel=4 available=23603838976 required="20.6 GiB"
DEFAULT 2025-03-12T14:49:05.437706Z time=2025-03-12T14:49:05.435Z level=INFO source=server.go:105 msg="system memory" total="31.3 GiB" free="14.4 GiB" free_swap="0 B"
DEFAULT 2025-03-12T14:49:05.439057Z time=2025-03-12T14:49:05.436Z level=INFO source=server.go:138 msg=offload library=cuda layers.requested=-1 layers.model=63 layers.offload=63 layers.split="" memory.available="[22.0 GiB]" memory.gpu_overhead="0 B" memory.required.full="20.6 GiB" memory.required.partial="20.6 GiB" memory.required.kv="3.9 GiB" memory.required.allocations="[20.6 GiB]" memory.weights.total="18.2 GiB" memory.weights.repeating="17.1 GiB" memory.weights.nonrepeating="1.1 GiB" memory.graph.full="565.0 MiB" memory.graph.partial="1.6 GiB"
DEFAULT 2025-03-12T14:49:05.587054Z time=2025-03-12T14:49:05.584Z level=WARN source=ggml.go:149 msg="key not found" key=tokenizer.ggml.pretokenizer default="(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\\r\\n\\p{L}\\p{N}]?\\p{L}+|\\p{N}{1,3}| ?[^\\s\\p{L}\\p{N}]+[\\r\\n]*|\\s*[\\r\\n]+|\\s+(?!\\S)|\\s+"
DEFAULT 2025-03-12T14:49:05.591923Z time=2025-03-12T14:49:05.589Z level=WARN source=ggml.go:149 msg="key not found" key=tokenizer.ggml.add_eot_token default=false
DEFAULT 2025-03-12T14:49:05.596248Z time=2025-03-12T14:49:05.594Z level=WARN source=ggml.go:149 msg="key not found" key=tokenizer.ggml.pretokenizer default="(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\\r\\n\\p{L}\\p{N}]?\\p{L}+|\\p{N}{1,3}| ?[^\\s\\p{L}\\p{N}]+[\\r\\n]*|\\s*[\\r\\n]+|\\s+(?!\\S)|\\s+"
DEFAULT 2025-03-12T14:49:05.605531Z time=2025-03-12T14:49:05.603Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.attention.layer_norm_rms_epsilon default=9.999999974752427e-07
DEFAULT 2025-03-12T14:49:05.605541Z time=2025-03-12T14:49:05.603Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.rope.local.freq_base default=10000
DEFAULT 2025-03-12T14:49:05.605546Z time=2025-03-12T14:49:05.603Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06
DEFAULT 2025-03-12T14:49:05.605552Z time=2025-03-12T14:49:05.603Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.rope.freq_scale default=1
DEFAULT 2025-03-12T14:49:05.605557Z time=2025-03-12T14:49:05.603Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.final_logit_softcapping default=30
DEFAULT 2025-03-12T14:49:05.605561Z time=2025-03-12T14:49:05.603Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.mm_tokens_per_image default=256
DEFAULT 2025-03-12T14:49:05.605859Z time=2025-03-12T14:49:05.603Z level=INFO source=server.go:405 msg="starting llama server" cmd="/usr/bin/ollama runner --ollama-engine --model /root/.ollama/models/blobs/sha256-afa0ea2ef463c87a1eebb9af070e76a353107493b5d9a62e5e66f65a65409541 --ctx-size 8192 --batch-size 512 --n-gpu-layers 63 --threads 4 --no-mmap --parallel 4 --port 38003"
DEFAULT 2025-03-12T14:49:05.606828Z time=2025-03-12T14:49:05.604Z level=INFO source=sched.go:450 msg="loaded runners" count=1
DEFAULT 2025-03-12T14:49:05.606838Z time=2025-03-12T14:49:05.604Z level=INFO source=server.go:585 msg="waiting for llama runner to start responding"
DEFAULT 2025-03-12T14:49:05.607085Z time=2025-03-12T14:49:05.605Z level=INFO source=server.go:619 msg="waiting for server to become available" status="llm server error"
DEFAULT 2025-03-12T14:49:05.624627Z time=2025-03-12T14:49:05.622Z level=INFO source=runner.go:882 msg="starting ollama engine"
DEFAULT 2025-03-12T14:49:05.626860Z time=2025-03-12T14:49:05.624Z level=INFO source=runner.go:938 msg="Server listening on 127.0.0.1:38003"
DEFAULT 2025-03-12T14:49:05.773332Z time=2025-03-12T14:49:05.771Z level=WARN source=ggml.go:149 msg="key not found" key=general.name default=""
DEFAULT 2025-03-12T14:49:05.773344Z time=2025-03-12T14:49:05.771Z level=WARN source=ggml.go:149 msg="key not found" key=general.description default=""
DEFAULT 2025-03-12T14:49:05.773348Z time=2025-03-12T14:49:05.771Z level=INFO source=ggml.go:67 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=1247 num_key_values=36
DEFAULT 2025-03-12T14:49:05.858184Z time=2025-03-12T14:49:05.856Z level=INFO source=server.go:619 msg="waiting for server to become available" status="llm server loading model"
DEFAULT 2025-03-12T14:49:07.530615Z ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
DEFAULT 2025-03-12T14:49:07.530627Z ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
DEFAULT 2025-03-12T14:49:07.530631Z ggml_cuda_init: found 1 CUDA devices:
DEFAULT 2025-03-12T14:49:07.530635Z Device 0: NVIDIA L4, compute capability 8.9, VMM: yes
DEFAULT 2025-03-12T14:49:07.530816Z load_backend: loaded CUDA backend from /usr/lib/ollama/cuda_v12/libggml-cuda.so
DEFAULT 2025-03-12T14:49:07.600891Z load_backend: loaded CPU backend from /usr/lib/ollama/libggml-cpu-skylakex.so
DEFAULT 2025-03-12T14:49:07.601184Z time=2025-03-12T14:49:07.598Z level=INFO source=ggml.go:109 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.AVX512=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=500,600,610,700,750,800,860,870,890,900,1200 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc)
DEFAULT 2025-03-12T14:49:07.710969Z time=2025-03-12T14:49:07.708Z level=INFO source=ggml.go:289 msg="model weights" buffer=CUDA0 size="16.2 GiB"
DEFAULT 2025-03-12T14:49:07.710980Z time=2025-03-12T14:49:07.708Z level=INFO source=ggml.go:289 msg="model weights" buffer=CPU size="1.1 GiB"
DEFAULT 2025-03-12T14:49:08.072685Z time=2025-03-12T14:49:08.070Z level=INFO source=server.go:619 msg="waiting for server to become available" status="llm server not responding"
DEFAULT 2025-03-12T14:49:08.329904Z time=2025-03-12T14:49:08.327Z level=INFO source=server.go:619 msg="waiting for server to become available" status="llm server loading model"
DEFAULT 2025-03-12T14:49:09.801727Z time=2025-03-12T14:49:09.799Z level=INFO source=server.go:619 msg="waiting for server to become available" status="llm server not responding"
DEFAULT 2025-03-12T14:49:10.208920Z time=2025-03-12T14:49:10.207Z level=INFO source=server.go:619 msg="waiting for server to become available" status="llm server error"
DEFAULT 2025-03-12T14:49:10.459336Z time=2025-03-12T14:49:10.457Z level=ERROR source=sched.go:456 msg="error loading llama server" error="llama runner process has terminated: signal: killed"
DEFAULT 2025-03-12T14:49:10.460166Z [GIN] 2025/03/12 - 14:49:10 | 500 | 5.35220097s | 146.148.56.220 | POST "/api/generate"
DEFAULT 2025-03-12T14:49:15.595658Z time=2025-03-12T14:49:15.594Z level=WARN source=sched.go:647 msg="gpu VRAM usage didn't recover within timeout" seconds=5.13629568 model=/root/.ollama/models/blobs/sha256-afa0ea2ef463c87a1eebb9af070e76a353107493b5d9a62e5e66f65a65409541
DEFAULT 2025-03-12T14:49:15.846047Z time=2025-03-12T14:49:15.844Z level=WARN source=sched.go:647 msg="gpu VRAM usage didn't recover within timeout" seconds=5.3869796 model=/root/.ollama/models/blobs/sha256-afa0ea2ef463c87a1eebb9af070e76a353107493b5d9a62e5e66f65a65409541
DEFAULT 2025-03-12T14:49:16.095508Z time=2025-03-12T14:49:16.094Z level=WARN source=sched.go:647 msg="gpu VRAM usage didn't recover within timeout" seconds=5.636363222 model=/root/.ollama/models/blobs/sha256-afa0ea2ef463c87a1eebb9af070e76a353107493b5d9a62e5e66f65a65409541

@steren commented on GitHub (Mar 12, 2025): I am hitting the same issue when deploying to Google Cloud Run. Which offers NVIDIA L4 GPUs that has 24GB of vRAM on an instance with 32GB of RAM 4b and 12b work without issue Here are my logs: ``` ERROR 2025-03-12T14:49:05.102075Z [httpRequest.requestMethod: POST] [httpRequest.status: 500] [httpRequest.responseSize: 165 B] [httpRequest.latency: 5.352 s] [httpRequest.userAgent: ollama/0.6.0 (amd64 linux) Go/go1.24.0] https://ollama-250756049697.us-central1.run.app/api/generate DEFAULT 2025-03-12T14:49:05.367645Z time=2025-03-12T14:49:05.365Z level=INFO source=sched.go:715 msg="new model will fit in available VRAM in single GPU, loading" model=/root/.ollama/models/blobs/sha256-afa0ea2ef463c87a1eebb9af070e76a353107493b5d9a62e5e66f65a65409541 gpu=GPU-1f3cb124-028c-8cbf-9910-e49db5cfbbcd parallel=4 available=23603838976 required="20.6 GiB" DEFAULT 2025-03-12T14:49:05.437706Z time=2025-03-12T14:49:05.435Z level=INFO source=server.go:105 msg="system memory" total="31.3 GiB" free="14.4 GiB" free_swap="0 B" DEFAULT 2025-03-12T14:49:05.439057Z time=2025-03-12T14:49:05.436Z level=INFO source=server.go:138 msg=offload library=cuda layers.requested=-1 layers.model=63 layers.offload=63 layers.split="" memory.available="[22.0 GiB]" memory.gpu_overhead="0 B" memory.required.full="20.6 GiB" memory.required.partial="20.6 GiB" memory.required.kv="3.9 GiB" memory.required.allocations="[20.6 GiB]" memory.weights.total="18.2 GiB" memory.weights.repeating="17.1 GiB" memory.weights.nonrepeating="1.1 GiB" memory.graph.full="565.0 MiB" memory.graph.partial="1.6 GiB" DEFAULT 2025-03-12T14:49:05.587054Z time=2025-03-12T14:49:05.584Z level=WARN source=ggml.go:149 msg="key not found" key=tokenizer.ggml.pretokenizer default="(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\\r\\n\\p{L}\\p{N}]?\\p{L}+|\\p{N}{1,3}| ?[^\\s\\p{L}\\p{N}]+[\\r\\n]*|\\s*[\\r\\n]+|\\s+(?!\\S)|\\s+" DEFAULT 2025-03-12T14:49:05.591923Z time=2025-03-12T14:49:05.589Z level=WARN source=ggml.go:149 msg="key not found" key=tokenizer.ggml.add_eot_token default=false DEFAULT 2025-03-12T14:49:05.596248Z time=2025-03-12T14:49:05.594Z level=WARN source=ggml.go:149 msg="key not found" key=tokenizer.ggml.pretokenizer default="(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\\r\\n\\p{L}\\p{N}]?\\p{L}+|\\p{N}{1,3}| ?[^\\s\\p{L}\\p{N}]+[\\r\\n]*|\\s*[\\r\\n]+|\\s+(?!\\S)|\\s+" DEFAULT 2025-03-12T14:49:05.605531Z time=2025-03-12T14:49:05.603Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.attention.layer_norm_rms_epsilon default=9.999999974752427e-07 DEFAULT 2025-03-12T14:49:05.605541Z time=2025-03-12T14:49:05.603Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.rope.local.freq_base default=10000 DEFAULT 2025-03-12T14:49:05.605546Z time=2025-03-12T14:49:05.603Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06 DEFAULT 2025-03-12T14:49:05.605552Z time=2025-03-12T14:49:05.603Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.rope.freq_scale default=1 DEFAULT 2025-03-12T14:49:05.605557Z time=2025-03-12T14:49:05.603Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.final_logit_softcapping default=30 DEFAULT 2025-03-12T14:49:05.605561Z time=2025-03-12T14:49:05.603Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.mm_tokens_per_image default=256 DEFAULT 2025-03-12T14:49:05.605859Z time=2025-03-12T14:49:05.603Z level=INFO source=server.go:405 msg="starting llama server" cmd="/usr/bin/ollama runner --ollama-engine --model /root/.ollama/models/blobs/sha256-afa0ea2ef463c87a1eebb9af070e76a353107493b5d9a62e5e66f65a65409541 --ctx-size 8192 --batch-size 512 --n-gpu-layers 63 --threads 4 --no-mmap --parallel 4 --port 38003" DEFAULT 2025-03-12T14:49:05.606828Z time=2025-03-12T14:49:05.604Z level=INFO source=sched.go:450 msg="loaded runners" count=1 DEFAULT 2025-03-12T14:49:05.606838Z time=2025-03-12T14:49:05.604Z level=INFO source=server.go:585 msg="waiting for llama runner to start responding" DEFAULT 2025-03-12T14:49:05.607085Z time=2025-03-12T14:49:05.605Z level=INFO source=server.go:619 msg="waiting for server to become available" status="llm server error" DEFAULT 2025-03-12T14:49:05.624627Z time=2025-03-12T14:49:05.622Z level=INFO source=runner.go:882 msg="starting ollama engine" DEFAULT 2025-03-12T14:49:05.626860Z time=2025-03-12T14:49:05.624Z level=INFO source=runner.go:938 msg="Server listening on 127.0.0.1:38003" DEFAULT 2025-03-12T14:49:05.773332Z time=2025-03-12T14:49:05.771Z level=WARN source=ggml.go:149 msg="key not found" key=general.name default="" DEFAULT 2025-03-12T14:49:05.773344Z time=2025-03-12T14:49:05.771Z level=WARN source=ggml.go:149 msg="key not found" key=general.description default="" DEFAULT 2025-03-12T14:49:05.773348Z time=2025-03-12T14:49:05.771Z level=INFO source=ggml.go:67 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=1247 num_key_values=36 DEFAULT 2025-03-12T14:49:05.858184Z time=2025-03-12T14:49:05.856Z level=INFO source=server.go:619 msg="waiting for server to become available" status="llm server loading model" DEFAULT 2025-03-12T14:49:07.530615Z ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no DEFAULT 2025-03-12T14:49:07.530627Z ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no DEFAULT 2025-03-12T14:49:07.530631Z ggml_cuda_init: found 1 CUDA devices: DEFAULT 2025-03-12T14:49:07.530635Z Device 0: NVIDIA L4, compute capability 8.9, VMM: yes DEFAULT 2025-03-12T14:49:07.530816Z load_backend: loaded CUDA backend from /usr/lib/ollama/cuda_v12/libggml-cuda.so DEFAULT 2025-03-12T14:49:07.600891Z load_backend: loaded CPU backend from /usr/lib/ollama/libggml-cpu-skylakex.so DEFAULT 2025-03-12T14:49:07.601184Z time=2025-03-12T14:49:07.598Z level=INFO source=ggml.go:109 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.AVX512=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=500,600,610,700,750,800,860,870,890,900,1200 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc) DEFAULT 2025-03-12T14:49:07.710969Z time=2025-03-12T14:49:07.708Z level=INFO source=ggml.go:289 msg="model weights" buffer=CUDA0 size="16.2 GiB" DEFAULT 2025-03-12T14:49:07.710980Z time=2025-03-12T14:49:07.708Z level=INFO source=ggml.go:289 msg="model weights" buffer=CPU size="1.1 GiB" DEFAULT 2025-03-12T14:49:08.072685Z time=2025-03-12T14:49:08.070Z level=INFO source=server.go:619 msg="waiting for server to become available" status="llm server not responding" DEFAULT 2025-03-12T14:49:08.329904Z time=2025-03-12T14:49:08.327Z level=INFO source=server.go:619 msg="waiting for server to become available" status="llm server loading model" DEFAULT 2025-03-12T14:49:09.801727Z time=2025-03-12T14:49:09.799Z level=INFO source=server.go:619 msg="waiting for server to become available" status="llm server not responding" DEFAULT 2025-03-12T14:49:10.208920Z time=2025-03-12T14:49:10.207Z level=INFO source=server.go:619 msg="waiting for server to become available" status="llm server error" DEFAULT 2025-03-12T14:49:10.459336Z time=2025-03-12T14:49:10.457Z level=ERROR source=sched.go:456 msg="error loading llama server" error="llama runner process has terminated: signal: killed" DEFAULT 2025-03-12T14:49:10.460166Z [GIN] 2025/03/12 - 14:49:10 | 500 | 5.35220097s | 146.148.56.220 | POST "/api/generate" DEFAULT 2025-03-12T14:49:15.595658Z time=2025-03-12T14:49:15.594Z level=WARN source=sched.go:647 msg="gpu VRAM usage didn't recover within timeout" seconds=5.13629568 model=/root/.ollama/models/blobs/sha256-afa0ea2ef463c87a1eebb9af070e76a353107493b5d9a62e5e66f65a65409541 DEFAULT 2025-03-12T14:49:15.846047Z time=2025-03-12T14:49:15.844Z level=WARN source=sched.go:647 msg="gpu VRAM usage didn't recover within timeout" seconds=5.3869796 model=/root/.ollama/models/blobs/sha256-afa0ea2ef463c87a1eebb9af070e76a353107493b5d9a62e5e66f65a65409541 DEFAULT 2025-03-12T14:49:16.095508Z time=2025-03-12T14:49:16.094Z level=WARN source=sched.go:647 msg="gpu VRAM usage didn't recover within timeout" seconds=5.636363222 model=/root/.ollama/models/blobs/sha256-afa0ea2ef463c87a1eebb9af070e76a353107493b5d9a62e5e66f65a65409541 ```

GiteaMirror commented

2026-05-04 13:41:35 -05:00

@rick-github commented on GitHub (Mar 12, 2025):

time=2025-03-12T14:23:29.240Z level=INFO source=server.go:138 msg=offload library=cuda layers.requested=-1 layers.model=49 layers.offload=33 layers.split="" memory.available="[14.5 GiB]" memory.gpu_overhead="0 B" memory.required.full="20.8 GiB" memory.required.partial="14.3 GiB" memory.required.kv="11.7 GiB" memory.required.allocations="[14.3 GiB]" memory.weights.total="17.7 GiB" memory.weights.repeating="17.0 GiB" memory.weights.nonrepeating="787.5 MiB" memory.graph.full="1.1 GiB" memory.graph.partial="1.3 GiB"

ggml_backend_cuda_buffer_type_alloc_buffer: allocating 15554.32 MiB on device 0: cudaMalloc failed: out of memory
ggml_gallocr_reserve_n: failed to allocate CUDA0 buffer of size 16309888000

ollama allocated 14.3G or 14.5G available and them OOMed. See here for mitigations.

@rick-github commented on GitHub (Mar 12, 2025): ``` time=2025-03-12T14:23:29.240Z level=INFO source=server.go:138 msg=offload library=cuda layers.requested=-1 layers.model=49 layers.offload=33 layers.split="" memory.available="[14.5 GiB]" memory.gpu_overhead="0 B" memory.required.full="20.8 GiB" memory.required.partial="14.3 GiB" memory.required.kv="11.7 GiB" memory.required.allocations="[14.3 GiB]" memory.weights.total="17.7 GiB" memory.weights.repeating="17.0 GiB" memory.weights.nonrepeating="787.5 MiB" memory.graph.full="1.1 GiB" memory.graph.partial="1.3 GiB" ggml_backend_cuda_buffer_type_alloc_buffer: allocating 15554.32 MiB on device 0: cudaMalloc failed: out of memory ggml_gallocr_reserve_n: failed to allocate CUDA0 buffer of size 16309888000 ``` ollama allocated 14.3G or 14.5G available and them OOMed. See [here](https://github.com/ollama/ollama/issues/8597#issuecomment-2614533288) for mitigations.

GiteaMirror commented

2026-05-04 13:41:36 -05:00

@nachocheeseburger commented on GitHub (Mar 12, 2025):

time=2025-03-12T14:23:29.240Z level=INFO source=server.go:138 msg=offload library=cuda layers.requested=-1 layers.model=49 layers.offload=33 layers.split="" memory.available="[14.5 GiB]" memory.gpu_overhead="0 B" memory.required.full="20.8 GiB" memory.required.partial="14.3 GiB" memory.required.kv="11.7 GiB" memory.required.allocations="[14.3 GiB]" memory.weights.total="17.7 GiB" memory.weights.repeating="17.0 GiB" memory.weights.nonrepeating="787.5 MiB" memory.graph.full="1.1 GiB" memory.graph.partial="1.3 GiB"

ggml_backend_cuda_buffer_type_alloc_buffer: allocating 15554.32 MiB on device 0: cudaMalloc failed: out of memory
ggml_gallocr_reserve_n: failed to allocate CUDA0 buffer of size 16309888000

ollama allocated 14.3G or 14.5G available and them OOMed. See here for mitigations.

but why does it work fine when using ollama run? shouldn't i have the same issue both ways?

@nachocheeseburger commented on GitHub (Mar 12, 2025): > ``` > time=2025-03-12T14:23:29.240Z level=INFO source=server.go:138 msg=offload library=cuda layers.requested=-1 layers.model=49 layers.offload=33 layers.split="" memory.available="[14.5 GiB]" memory.gpu_overhead="0 B" memory.required.full="20.8 GiB" memory.required.partial="14.3 GiB" memory.required.kv="11.7 GiB" memory.required.allocations="[14.3 GiB]" memory.weights.total="17.7 GiB" memory.weights.repeating="17.0 GiB" memory.weights.nonrepeating="787.5 MiB" memory.graph.full="1.1 GiB" memory.graph.partial="1.3 GiB" > > ggml_backend_cuda_buffer_type_alloc_buffer: allocating 15554.32 MiB on device 0: cudaMalloc failed: out of memory > ggml_gallocr_reserve_n: failed to allocate CUDA0 buffer of size 16309888000 > ``` > > ollama allocated 14.3G or 14.5G available and them OOMed. See [here](https://github.com/ollama/ollama/issues/8597#issuecomment-2614533288) for mitigations. but why does it work fine when using ollama run? shouldn't i have the same issue both ways?

GiteaMirror commented

2026-05-04 13:41:39 -05:00

@rick-github commented on GitHub (Mar 12, 2025):

Do you have a log from using ollama run?

@rick-github commented on GitHub (Mar 12, 2025): Do you have a log from using `ollama run`?

GiteaMirror commented

2026-05-04 13:41:40 -05:00

@Fade78 commented on GitHub (Mar 12, 2025):

Using latest docker container, upgraded today to ollama 0.6.0 (using open-webui 0.5.20). Same problem: runs gemma3 models, except 27b (tested with very small context).

time=2025-03-12T15:34:20.706Z level=INFO source=sched.go:731 msg="new model will fit in available VRAM, loading" model=/root/.ollama/models/blobs/sha256-afa0ea2ef463c87a1eebb9af070e76a353107493b5d9a62e5e66f65a65409541 library=cuda parallel=1 required="20.2 GiB"
time=2025-03-12T15:34:20.848Z level=INFO source=server.go:105 msg="system memory" total="62.4 GiB" free="40.1 GiB" free_swap="235.8 GiB"
time=2025-03-12T15:34:20.849Z level=INFO source=server.go:138 msg=offload library=cuda layers.requested=-1 layers.model=63 layers.offload=63 layers.split=32,31 memory.available="[15.6 GiB 10.1 GiB]" memory.gpu_overhead="0 B" memory.required.full="20.2 GiB" memory.required.partial="20.2 GiB" memory.required.kv="248.0 MiB" memory.required.allocations="[10.6 GiB 9.6 GiB]" memory.weights.total="14.6 GiB" memory.weights.repeating="13.5 GiB" memory.weights.nonrepeating="1.1 GiB" memory.graph.full="1.6 GiB" memory.graph.partial="1.6 GiB"
time=2025-03-12T15:34:20.915Z level=WARN source=ggml.go:149 msg="key not found" key=tokenizer.ggml.pretokenizer default="(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\\r\\n\\p{L}\\p{N}]?\\p{L}+|\\p{N}{1,3}| ?[^\\s\\p{L}\\p{N}]+[\\r\\n]*|\\s*[\\r\\n]+|\\s+(?!\\S)|\\s+"
time=2025-03-12T15:34:20.918Z level=WARN source=ggml.go:149 msg="key not found" key=tokenizer.ggml.add_eot_token default=false
time=2025-03-12T15:34:20.920Z level=WARN source=ggml.go:149 msg="key not found" key=tokenizer.ggml.pretokenizer default="(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\\r\\n\\p{L}\\p{N}]?\\p{L}+|\\p{N}{1,3}| ?[^\\s\\p{L}\\p{N}]+[\\r\\n]*|\\s*[\\r\\n]+|\\s+(?!\\S)|\\s+"
time=2025-03-12T15:34:20.924Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.attention.layer_norm_rms_epsilon default=9.999999974752427e-07
time=2025-03-12T15:34:20.924Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.rope.local.freq_base default=10000
time=2025-03-12T15:34:20.924Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06
time=2025-03-12T15:34:20.924Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.rope.freq_scale default=1
time=2025-03-12T15:34:20.925Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.final_logit_softcapping default=30
time=2025-03-12T15:34:20.925Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.mm_tokens_per_image default=256
time=2025-03-12T15:34:20.925Z level=INFO source=server.go:405 msg="starting llama server" cmd="/usr/bin/ollama runner --ollama-engine --model /root/.ollama/models/blobs/sha256-afa0ea2ef463c87a1eebb9af070e76a353107493b5d9a62e5e66f65a65409541 --ctx-size 512 --batch-size 512 --n-gpu-layers 63 --threads 8 --parallel 1 --tensor-split 32,31 --port 40645"
time=2025-03-12T15:34:20.925Z level=INFO source=sched.go:450 msg="loaded runners" count=1
time=2025-03-12T15:34:20.925Z level=INFO source=server.go:585 msg="waiting for llama runner to start responding"
time=2025-03-12T15:34:20.925Z level=INFO source=server.go:619 msg="waiting for server to become available" status="llm server error"
time=2025-03-12T15:34:20.933Z level=INFO source=runner.go:882 msg="starting ollama engine"
time=2025-03-12T15:34:20.933Z level=INFO source=runner.go:938 msg="Server listening on 127.0.0.1:40645"
time=2025-03-12T15:34:21.003Z level=WARN source=ggml.go:149 msg="key not found" key=general.name default=""
time=2025-03-12T15:34:21.003Z level=WARN source=ggml.go:149 msg="key not found" key=general.description default=""
time=2025-03-12T15:34:21.003Z level=INFO source=ggml.go:67 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=1247 num_key_values=36
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 2 CUDA devices:
  Device 0: NVIDIA GeForce RTX 4060 Ti, compute capability 8.9, VMM: yes
  Device 1: NVIDIA GeForce RTX 4070, compute capability 8.9, VMM: yes
load_backend: loaded CUDA backend from /usr/lib/ollama/cuda_v12/libggml-cuda.so
load_backend: loaded CPU backend from /usr/lib/ollama/libggml-cpu-icelake.so
time=2025-03-12T15:34:21.099Z level=INFO source=ggml.go:109 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.AVX512=1 CPU.0.AVX512_VBMI=1 CPU.0.AVX512_VNNI=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=500,600,610,700,750,800,860,870,890,900,1200 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 CUDA.1.ARCHS=500,600,610,700,750,800,860,870,890,900,1200 CUDA.1.USE_GRAPHS=1 CUDA.1.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc)
time=2025-03-12T15:34:21.147Z level=INFO source=ggml.go:289 msg="model weights" buffer=CPU size="1.1 GiB"
time=2025-03-12T15:34:21.147Z level=INFO source=ggml.go:289 msg="model weights" buffer=CUDA0 size="7.4 GiB"
time=2025-03-12T15:34:21.147Z level=INFO source=ggml.go:289 msg="model weights" buffer=CUDA1 size="8.8 GiB"
time=2025-03-12T15:34:21.178Z level=INFO source=server.go:619 msg="waiting for server to become available" status="llm server loading model"
time=2025-03-12T15:34:25.770Z level=INFO source=ggml.go:356 msg="compute graph" backend=CUDA0 buffer_type=CUDA0
time=2025-03-12T15:34:25.770Z level=INFO source=ggml.go:356 msg="compute graph" backend=CUDA1 buffer_type=CUDA1
time=2025-03-12T15:34:25.770Z level=INFO source=ggml.go:356 msg="compute graph" backend=CPU buffer_type=CUDA_Host
time=2025-03-12T15:34:25.770Z level=WARN source=ggml.go:149 msg="key not found" key=tokenizer.ggml.pretokenizer default="(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\\r\\n\\p{L}\\p{N}]?\\p{L}+|\\p{N}{1,3}| ?[^\\s\\p{L}\\p{N}]+[\\r\\n]*|\\s*[\\r\\n]+|\\s+(?!\\S)|\\s+"
time=2025-03-12T15:34:25.771Z level=WARN source=ggml.go:149 msg="key not found" key=tokenizer.ggml.add_eot_token default=false
time=2025-03-12T15:34:25.773Z level=WARN source=ggml.go:149 msg="key not found" key=tokenizer.ggml.pretokenizer default="(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\\r\\n\\p{L}\\p{N}]?\\p{L}+|\\p{N}{1,3}| ?[^\\s\\p{L}\\p{N}]+[\\r\\n]*|\\s*[\\r\\n]+|\\s+(?!\\S)|\\s+"
time=2025-03-12T15:34:25.776Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.attention.layer_norm_rms_epsilon default=9.999999974752427e-07
time=2025-03-12T15:34:25.776Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.rope.local.freq_base default=10000
time=2025-03-12T15:34:25.776Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06
time=2025-03-12T15:34:25.776Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.rope.freq_scale default=1
time=2025-03-12T15:34:25.776Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.final_logit_softcapping default=30
time=2025-03-12T15:34:25.776Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.mm_tokens_per_image default=256
time=2025-03-12T15:34:25.828Z level=INFO source=server.go:624 msg="llama runner started in 4.90 seconds"
ggml_backend_cuda_buffer_type_alloc_buffer: allocating 1209.99 MiB on device 1: cudaMalloc failed: out of memory
ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 1268767232
SIGSEGV: segmentation violation
PC=0x618cf015bfd0 m=345 sigcode=1 addr=0x58
signal arrived during cgo execution

goroutine 9 gp=0xc000502c40 m=345 mp=0xc08f156808 [syscall]:
runtime.cgocall(0x618cf01affe0, 0xc000519b00)
	runtime/cgocall.go:167 +0x4b fp=0xc000519ad8 sp=0xc000519aa0 pc=0x618cef37e60b
github.com/ollama/ollama/ml/backend/ggml._Cfunc_ggml_backend_sched_graph_compute_async(0x7c495c013270, 0x7c506c1bbec0)
	_cgo_gotypes.go:485 +0x4a fp=0xc000519b00 sp=0xc000519ad8 pc=0x618cef7691aa
github.com/ollama/ollama/ml/backend/ggml.Context.Compute.func1(...)
	github.com/ollama/ollama/ml/backend/ggml/ggml.go:497
github.com/ollama/ollama/ml/backend/ggml.Context.Compute({0xc000420280, 0x7c4ae000d230, 0x7c506c1bbec0, 0x0, 0x2000}, {0xc09359aa70, 0x1, 0x618cf066b210?})
	github.com/ollama/ollama/ml/backend/ggml/ggml.go:497 +0xbd fp=0xc000519b90 sp=0xc000519b00 pc=0x618cef771a7d
github.com/ollama/ollama/ml/backend/ggml.(*Context).Compute(0xc094789c80?, {0xc09359aa70?, 0x200?, 0xc0afc14008?})
	<autogenerated>:1 +0x72 fp=0xc000519c08 sp=0xc000519b90 pc=0x618cef7774f2
github.com/ollama/ollama/model.Forward({0x618cf0662c00, 0xc094789c80}, {0x618cf065a2b0, 0xc0003420e0}, {{0xc0afc12800, 0x112, 0x200}, {0xc0afc14008, 0x100, 0x155}, ...})
	github.com/ollama/ollama/model/model.go:303 +0x218 fp=0xc000519cf0 sp=0xc000519c08 pc=0x618cef79e718
github.com/ollama/ollama/runner/ollamarunner.(*Server).processBatch(0xc0004d6120)
	github.com/ollama/ollama/runner/ollamarunner/runner.go:395 +0x3bb fp=0xc000519f98 sp=0xc000519cf0 pc=0x618cef80adfb
github.com/ollama/ollama/runner/ollamarunner.(*Server).run(0xc0004d6120, {0x618cf065b5e0, 0xc0000fb590})
	github.com/ollama/ollama/runner/ollamarunner/runner.go:321 +0x4e fp=0xc000519fb8 sp=0xc000519f98 pc=0x618cef80a9ee
github.com/ollama/ollama/runner/ollamarunner.Execute.gowrap2()
	github.com/ollama/ollama/runner/ollamarunner/runner.go:919 +0x28 fp=0xc000519fe0 sp=0xc000519fb8 pc=0x618cef80f668
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc000519fe8 sp=0xc000519fe0 pc=0x618cef389021
created by github.com/ollama/ollama/runner/ollamarunner.Execute in goroutine 1
	github.com/ollama/ollama/runner/ollamarunner/runner.go:919 +0xa9c

goroutine 1 gp=0xc000002380 m=nil [IO wait]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:435 +0xce fp=0xc000515648 sp=0xc000515628 pc=0x618cef3818ee
runtime.netpollblock(0xc000515698?, 0xef31b226?, 0x8c?)
	runtime/netpoll.go:575 +0xf7 fp=0xc000515680 sp=0xc000515648 pc=0x618cef3466f7
internal/poll.runtime_pollWait(0x7c50faa08eb0, 0x72)
	runtime/netpoll.go:351 +0x85 fp=0xc0005156a0 sp=0xc000515680 pc=0x618cef380b05
internal/poll.(*pollDesc).wait(0xc000050d80?, 0x900324cfe?, 0x0)
	internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc0005156c8 sp=0xc0005156a0 pc=0x618cef407f87
internal/poll.(*pollDesc).waitRead(...)
	internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Accept(0xc000050d80)
	internal/poll/fd_unix.go:620 +0x295 fp=0xc000515770 sp=0xc0005156c8 pc=0x618cef40d355
net.(*netFD).accept(0xc000050d80)
	net/fd_unix.go:172 +0x29 fp=0xc000515828 sp=0xc000515770 pc=0x618cef480169
net.(*TCPListener).accept(0xc00014cd40)
	net/tcpsock_posix.go:159 +0x1b fp=0xc000515878 sp=0xc000515828 pc=0x618cef495b1b
net.(*TCPListener).Accept(0xc00014cd40)
	net/tcpsock.go:380 +0x30 fp=0xc0005158a8 sp=0xc000515878 pc=0x618cef4949d0
net/http.(*onceCloseListener).Accept(0xc000246090?)
	<autogenerated>:1 +0x24 fp=0xc0005158c0 sp=0xc0005158a8 pc=0x618cef6abb44
net/http.(*Server).Serve(0xc000200600, {0x618cf0659318, 0xc00014cd40})
	net/http/server.go:3424 +0x30c fp=0xc0005159f0 sp=0xc0005158c0 pc=0x618cef68340c
github.com/ollama/ollama/runner/ollamarunner.Execute({0xc000034170, 0x10, 0x11})
	github.com/ollama/ollama/runner/ollamarunner/runner.go:939 +0xe6a fp=0xc000515d08 sp=0xc0005159f0 pc=0x618cef80f3aa
github.com/ollama/ollama/runner.Execute({0xc000034150?, 0x0?, 0x0?})
	github.com/ollama/ollama/runner/runner.go:20 +0xc9 fp=0xc000515d30 sp=0xc000515d08 pc=0x618cef80ff09
github.com/ollama/ollama/cmd.NewCLI.func2(0xc000120f00?, {0x618cf01cb054?, 0x4?, 0x618cf01cb058?})
	github.com/ollama/ollama/cmd/cmd.go:1285 +0x45 fp=0xc000515d58 sp=0xc000515d30 pc=0x618ceff806a5
github.com/spf13/cobra.(*Command).execute(0xc0004c2f08, {0xc0001afe60, 0x11, 0x12})
	github.com/spf13/cobra@v1.7.0/command.go:940 +0x85c fp=0xc000515e78 sp=0xc000515d58 pc=0x618cef4f92fc
github.com/spf13/cobra.(*Command).ExecuteC(0xc0004a6908)
	github.com/spf13/cobra@v1.7.0/command.go:1068 +0x3a5 fp=0xc000515f30 sp=0xc000515e78 pc=0x618cef4f9b45
github.com/spf13/cobra.(*Command).Execute(...)
	github.com/spf13/cobra@v1.7.0/command.go:992
github.com/spf13/cobra.(*Command).ExecuteContext(...)
	github.com/spf13/cobra@v1.7.0/command.go:985
main.main()
	github.com/ollama/ollama/main.go:12 +0x4d fp=0xc000515f50 sp=0xc000515f30 pc=0x618ceff80a0d
runtime.main()
	runtime/proc.go:283 +0x29d fp=0xc000515fe0 sp=0xc000515f50 pc=0x618cef34dcfd
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc000515fe8 sp=0xc000515fe0 pc=0x618cef389021

goroutine 2 gp=0xc000002e00 m=nil [force gc (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:435 +0xce fp=0xc000084fa8 sp=0xc000084f88 pc=0x618cef3818ee
runtime.goparkunlock(...)
	runtime/proc.go:441
runtime.forcegchelper()
	runtime/proc.go:348 +0xb8 fp=0xc000084fe0 sp=0xc000084fa8 pc=0x618cef34e038
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc000084fe8 sp=0xc000084fe0 pc=0x618cef389021
created by runtime.init.7 in goroutine 1
	runtime/proc.go:336 +0x1a

goroutine 3 gp=0xc000003340 m=nil [GC sweep wait]:
runtime.gopark(0x1?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:435 +0xce fp=0xc000085780 sp=0xc000085760 pc=0x618cef3818ee
runtime.goparkunlock(...)
	runtime/proc.go:441
runtime.bgsweep(0xc00003e080)
	runtime/mgcsweep.go:316 +0xdf fp=0xc0000857c8 sp=0xc000085780 pc=0x618cef33885f
runtime.gcenable.gowrap1()
	runtime/mgc.go:204 +0x25 fp=0xc0000857e0 sp=0xc0000857c8 pc=0x618cef32cc45
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc0000857e8 sp=0xc0000857e0 pc=0x618cef389021
created by runtime.gcenable in goroutine 1
	runtime/mgc.go:204 +0x66

goroutine 4 gp=0xc000003500 m=nil [GC scavenge wait]:
runtime.gopark(0x10000?, 0x618cf0381a00?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:435 +0xce fp=0xc000085f78 sp=0xc000085f58 pc=0x618cef3818ee
runtime.goparkunlock(...)
	runtime/proc.go:441
runtime.(*scavengerState).park(0x618cf0ebfb40)
	runtime/mgcscavenge.go:425 +0x49 fp=0xc000085fa8 sp=0xc000085f78 pc=0x618cef3362a9
runtime.bgscavenge(0xc00003e080)
	runtime/mgcscavenge.go:658 +0x59 fp=0xc000085fc8 sp=0xc000085fa8 pc=0x618cef336839
runtime.gcenable.gowrap2()
	runtime/mgc.go:205 +0x25 fp=0xc000085fe0 sp=0xc000085fc8 pc=0x618cef32cbe5
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc000085fe8 sp=0xc000085fe0 pc=0x618cef389021
created by runtime.gcenable in goroutine 1
	runtime/mgc.go:205 +0xa5

goroutine 5 gp=0xc000003dc0 m=nil [finalizer wait]:
runtime.gopark(0x1b8?, 0xc000002380?, 0x1?, 0x23?, 0xc000084688?)
	runtime/proc.go:435 +0xce fp=0xc000084630 sp=0xc000084610 pc=0x618cef3818ee
runtime.runfinq()
	runtime/mfinal.go:196 +0x107 fp=0xc0000847e0 sp=0xc000084630 pc=0x618cef32bc07
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc0000847e8 sp=0xc0000847e0 pc=0x618cef389021
created by runtime.createfing in goroutine 1
	runtime/mfinal.go:166 +0x3d

goroutine 6 gp=0xc0001de8c0 m=nil [chan receive]:
runtime.gopark(0xc000233720?, 0xc00023c078?, 0x60?, 0xf?, 0x618cef466ea8?)
	runtime/proc.go:435 +0xce fp=0xc091a90f18 sp=0xc091a90ef8 pc=0x618cef3818ee
runtime.chanrecv(0xc0000423f0, 0x0, 0x1)
	runtime/chan.go:664 +0x445 fp=0xc091a90f90 sp=0xc091a90f18 pc=0x618cef31de05
runtime.chanrecv1(0x0?, 0x0?)
	runtime/chan.go:506 +0x12 fp=0xc091a90fb8 sp=0xc091a90f90 pc=0x618cef31d992
runtime.unique_runtime_registerUniqueMapCleanup.func2(...)
	runtime/mgc.go:1796
runtime.unique_runtime_registerUniqueMapCleanup.gowrap1()
	runtime/mgc.go:1799 +0x2f fp=0xc091a90fe0 sp=0xc091a90fb8 pc=0x618cef32fdef
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc091a90fe8 sp=0xc091a90fe0 pc=0x618cef389021
created by unique.runtime_registerUniqueMapCleanup in goroutine 1
	runtime/mgc.go:1794 +0x85

goroutine 7 gp=0xc0001defc0 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:435 +0xce fp=0xc000086f38 sp=0xc000086f18 pc=0x618cef3818ee
runtime.gcBgMarkWorker(0xc0000439d0)
	runtime/mgc.go:1423 +0xe9 fp=0xc000086fc8 sp=0xc000086f38 pc=0x618cef32f109
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1339 +0x25 fp=0xc000086fe0 sp=0xc000086fc8 pc=0x618cef32efe5
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc000086fe8 sp=0xc000086fe0 pc=0x618cef389021
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1339 +0x105

goroutine 18 gp=0xc000102380 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:435 +0xce fp=0xc000080738 sp=0xc000080718 pc=0x618cef3818ee
runtime.gcBgMarkWorker(0xc0000439d0)
	runtime/mgc.go:1423 +0xe9 fp=0xc0000807c8 sp=0xc000080738 pc=0x618cef32f109
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1339 +0x25 fp=0xc0000807e0 sp=0xc0000807c8 pc=0x618cef32efe5
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc0000807e8 sp=0xc0000807e0 pc=0x618cef389021
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1339 +0x105

goroutine 19 gp=0xc000102540 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:435 +0xce fp=0xc000080f38 sp=0xc000080f18 pc=0x618cef3818ee
runtime.gcBgMarkWorker(0xc0000439d0)
	runtime/mgc.go:1423 +0xe9 fp=0xc000080fc8 sp=0xc000080f38 pc=0x618cef32f109
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1339 +0x25 fp=0xc000080fe0 sp=0xc000080fc8 pc=0x618cef32efe5
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc000080fe8 sp=0xc000080fe0 pc=0x618cef389021
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1339 +0x105

goroutine 20 gp=0xc000102700 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:435 +0xce fp=0xc000081738 sp=0xc000081718 pc=0x618cef3818ee
runtime.gcBgMarkWorker(0xc0000439d0)
	runtime/mgc.go:1423 +0xe9 fp=0xc0000817c8 sp=0xc000081738 pc=0x618cef32f109
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1339 +0x25 fp=0xc0000817e0 sp=0xc0000817c8 pc=0x618cef32efe5
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc0000817e8 sp=0xc0000817e0 pc=0x618cef389021
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1339 +0x105

goroutine 21 gp=0xc0001028c0 m=nil [GC worker (idle)]:
runtime.gopark(0xb8be929d3d8d?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:435 +0xce fp=0xc000081f38 sp=0xc000081f18 pc=0x618cef3818ee
runtime.gcBgMarkWorker(0xc0000439d0)
	runtime/mgc.go:1423 +0xe9 fp=0xc000081fc8 sp=0xc000081f38 pc=0x618cef32f109
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1339 +0x25 fp=0xc000081fe0 sp=0xc000081fc8 pc=0x618cef32efe5
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc000081fe8 sp=0xc000081fe0 pc=0x618cef389021
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1339 +0x105

goroutine 22 gp=0xc000102a80 m=nil [GC worker (idle)]:
runtime.gopark(0x618cf0647068?, 0xc0001120a0?, 0x1b?, 0xa?, 0x0?)
	runtime/proc.go:435 +0xce fp=0xc000082738 sp=0xc000082718 pc=0x618cef3818ee
runtime.gcBgMarkWorker(0xc0000439d0)
	runtime/mgc.go:1423 +0xe9 fp=0xc0000827c8 sp=0xc000082738 pc=0x618cef32f109
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1339 +0x25 fp=0xc0000827e0 sp=0xc0000827c8 pc=0x618cef32efe5
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc0000827e8 sp=0xc0000827e0 pc=0x618cef389021
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1339 +0x105

goroutine 23 gp=0xc000102c40 m=nil [GC worker (idle)]:
runtime.gopark(0xb8be9e4369f3?, 0x1?, 0x8f?, 0x24?, 0x0?)
	runtime/proc.go:435 +0xce fp=0xc000082f38 sp=0xc000082f18 pc=0x618cef3818ee
runtime.gcBgMarkWorker(0xc0000439d0)
	runtime/mgc.go:1423 +0xe9 fp=0xc000082fc8 sp=0xc000082f38 pc=0x618cef32f109
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1339 +0x25 fp=0xc000082fe0 sp=0xc000082fc8 pc=0x618cef32efe5
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc000082fe8 sp=0xc000082fe0 pc=0x618cef389021
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1339 +0x105

goroutine 24 gp=0xc000102e00 m=nil [GC worker (idle)]:
runtime.gopark(0x618cf0f6e2c0?, 0x3?, 0xf4?, 0x29?, 0x0?)
	runtime/proc.go:435 +0xce fp=0xc000083738 sp=0xc000083718 pc=0x618cef3818ee
runtime.gcBgMarkWorker(0xc0000439d0)
	runtime/mgc.go:1423 +0xe9 fp=0xc0000837c8 sp=0xc000083738 pc=0x618cef32f109
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1339 +0x25 fp=0xc0000837e0 sp=0xc0000837c8 pc=0x618cef32efe5
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc0000837e8 sp=0xc0000837e0 pc=0x618cef389021
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1339 +0x105

goroutine 25 gp=0xc000102fc0 m=nil [GC worker (idle)]:
runtime.gopark(0x618cf0f6e2c0?, 0x1?, 0x58?, 0x56?, 0x0?)
	runtime/proc.go:435 +0xce fp=0xc000083f38 sp=0xc000083f18 pc=0x618cef3818ee
runtime.gcBgMarkWorker(0xc0000439d0)
	runtime/mgc.go:1423 +0xe9 fp=0xc000083fc8 sp=0xc000083f38 pc=0x618cef32f109
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1339 +0x25 fp=0xc000083fe0 sp=0xc000083fc8 pc=0x618cef32efe5
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc000083fe8 sp=0xc000083fe0 pc=0x618cef389021
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1339 +0x105

goroutine 26 gp=0xc000103180 m=nil [GC worker (idle)]:
runtime.gopark(0x618cf0f6e2c0?, 0x1?, 0x4?, 0x67?, 0x0?)
	runtime/proc.go:435 +0xce fp=0xc00011c738 sp=0xc00011c718 pc=0x618cef3818ee
runtime.gcBgMarkWorker(0xc0000439d0)
	runtime/mgc.go:1423 +0xe9 fp=0xc00011c7c8 sp=0xc00011c738 pc=0x618cef32f109
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1339 +0x25 fp=0xc00011c7e0 sp=0xc00011c7c8 pc=0x618cef32efe5
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc00011c7e8 sp=0xc00011c7e0 pc=0x618cef389021
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1339 +0x105

goroutine 27 gp=0xc000103340 m=nil [GC worker (idle)]:
runtime.gopark(0xb8be9e42685f?, 0x3?, 0xad?, 0x3b?, 0x0?)
	runtime/proc.go:435 +0xce fp=0xc00011cf38 sp=0xc00011cf18 pc=0x618cef3818ee
runtime.gcBgMarkWorker(0xc0000439d0)
	runtime/mgc.go:1423 +0xe9 fp=0xc00011cfc8 sp=0xc00011cf38 pc=0x618cef32f109
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1339 +0x25 fp=0xc00011cfe0 sp=0xc00011cfc8 pc=0x618cef32efe5
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc00011cfe8 sp=0xc00011cfe0 pc=0x618cef389021
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1339 +0x105

goroutine 28 gp=0xc000103500 m=nil [GC worker (idle)]:
runtime.gopark(0xb8be9e11cedd?, 0x1?, 0x5b?, 0x1e?, 0x0?)
	runtime/proc.go:435 +0xce fp=0xc00011d738 sp=0xc00011d718 pc=0x618cef3818ee
runtime.gcBgMarkWorker(0xc0000439d0)
	runtime/mgc.go:1423 +0xe9 fp=0xc00011d7c8 sp=0xc00011d738 pc=0x618cef32f109
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1339 +0x25 fp=0xc00011d7e0 sp=0xc00011d7c8 pc=0x618cef32efe5
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc00011d7e8 sp=0xc00011d7e0 pc=0x618cef389021
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1339 +0x105

goroutine 29 gp=0xc0001036c0 m=nil [GC worker (idle)]:
runtime.gopark(0xb8bea83aae99?, 0x1?, 0x65?, 0x97?, 0x0?)
	runtime/proc.go:435 +0xce fp=0xc00011df38 sp=0xc00011df18 pc=0x618cef3818ee
runtime.gcBgMarkWorker(0xc0000439d0)
	runtime/mgc.go:1423 +0xe9 fp=0xc00011dfc8 sp=0xc00011df38 pc=0x618cef32f109
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1339 +0x25 fp=0xc00011dfe0 sp=0xc00011dfc8 pc=0x618cef32efe5
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc00011dfe8 sp=0xc00011dfe0 pc=0x618cef389021
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1339 +0x105

goroutine 30 gp=0xc000103880 m=nil [GC worker (idle)]:
runtime.gopark(0x618cf0f6e2c0?, 0x1?, 0x3c?, 0x70?, 0x0?)
	runtime/proc.go:435 +0xce fp=0xc00011e738 sp=0xc00011e718 pc=0x618cef3818ee
runtime.gcBgMarkWorker(0xc0000439d0)
	runtime/mgc.go:1423 +0xe9 fp=0xc00011e7c8 sp=0xc00011e738 pc=0x618cef32f109
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1339 +0x25 fp=0xc00011e7e0 sp=0xc00011e7c8 pc=0x618cef32efe5
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc00011e7e8 sp=0xc00011e7e0 pc=0x618cef389021
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1339 +0x105

goroutine 31 gp=0xc000103a40 m=nil [GC worker (idle)]:
runtime.gopark(0xb8be9470b48f?, 0x1?, 0x8b?, 0xdd?, 0x0?)
	runtime/proc.go:435 +0xce fp=0xc00011ef38 sp=0xc00011ef18 pc=0x618cef3818ee
runtime.gcBgMarkWorker(0xc0000439d0)
	runtime/mgc.go:1423 +0xe9 fp=0xc00011efc8 sp=0xc00011ef38 pc=0x618cef32f109
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1339 +0x25 fp=0xc00011efe0 sp=0xc00011efc8 pc=0x618cef32efe5
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc00011efe8 sp=0xc00011efe0 pc=0x618cef389021
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1339 +0x105

goroutine 32 gp=0xc000103c00 m=nil [GC worker (idle)]:
runtime.gopark(0x618cf0f6e2c0?, 0x1?, 0xf6?, 0x31?, 0x0?)
	runtime/proc.go:435 +0xce fp=0xc00011f738 sp=0xc00011f718 pc=0x618cef3818ee
runtime.gcBgMarkWorker(0xc0000439d0)
	runtime/mgc.go:1423 +0xe9 fp=0xc00011f7c8 sp=0xc00011f738 pc=0x618cef32f109
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1339 +0x25 fp=0xc00011f7e0 sp=0xc00011f7c8 pc=0x618cef32efe5
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc00011f7e8 sp=0xc00011f7e0 pc=0x618cef389021
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1339 +0x105

goroutine 10 gp=0xc0030ba380 m=nil [select]:
runtime.gopark(0xc0935d7a68?, 0x2?, 0x0?, 0x1a?, 0xc0935d780c?)
	runtime/proc.go:435 +0xce fp=0xc0935d7620 sp=0xc0935d7600 pc=0x618cef3818ee
runtime.selectgo(0xc0935d7a68, 0xc0935d7808, 0x112?, 0x0, 0x1?, 0x1)
	runtime/select.go:351 +0x837 fp=0xc0935d7758 sp=0xc0935d7620 pc=0x618cef3601f7
github.com/ollama/ollama/runner/ollamarunner.(*Server).completion(0xc0004d6120, {0x618cf06594f8, 0xc09359e380}, 0xc00037e8c0)
	github.com/ollama/ollama/runner/ollamarunner/runner.go:649 +0xad0 fp=0xc0935d7ac0 sp=0xc0935d7758 pc=0x618cef80ce70
github.com/ollama/ollama/runner/ollamarunner.(*Server).completion-fm({0x618cf06594f8?, 0xc09359e380?}, 0xc000519b40?)
	<autogenerated>:1 +0x36 fp=0xc0935d7af0 sp=0xc0935d7ac0 pc=0x618cef80fa36
net/http.HandlerFunc.ServeHTTP(0xc000143500?, {0x618cf06594f8?, 0xc09359e380?}, 0xc000519b60?)
	net/http/server.go:2294 +0x29 fp=0xc0935d7b18 sp=0xc0935d7af0 pc=0x618cef67fa49
net/http.(*ServeMux).ServeHTTP(0x618cef326125?, {0x618cf06594f8, 0xc09359e380}, 0xc00037e8c0)
	net/http/server.go:2822 +0x1c4 fp=0xc0935d7b68 sp=0xc0935d7b18 pc=0x618cef681944
net/http.serverHandler.ServeHTTP({0x618cf0655b10?}, {0x618cf06594f8?, 0xc09359e380?}, 0x1?)
	net/http/server.go:3301 +0x8e fp=0xc0935d7b98 sp=0xc0935d7b68 pc=0x618cef69f3ce
net/http.(*conn).serve(0xc000246090, {0x618cf065b5a8, 0xc0000fdb60})
	net/http/server.go:2102 +0x625 fp=0xc0935d7fb8 sp=0xc0935d7b98 pc=0x618cef67df45
net/http.(*Server).Serve.gowrap3()
	net/http/server.go:3454 +0x28 fp=0xc0935d7fe0 sp=0xc0935d7fb8 pc=0x618cef683808
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc0935d7fe8 sp=0xc0935d7fe0 pc=0x618cef389021
created by net/http.(*Server).Serve in goroutine 1
	net/http/server.go:3454 +0x485

goroutine 1366 gp=0xc08e57f180 m=nil [IO wait]:
runtime.gopark(0xc0484665f8?, 0x618cef388c7c?, 0x20?, 0x66?, 0xb?)
	runtime/proc.go:435 +0xce fp=0xc0484665d8 sp=0xc0484665b8 pc=0x618cef3818ee
runtime.netpollblock(0x618cef3a4d78?, 0xef31b226?, 0x8c?)
	runtime/netpoll.go:575 +0xf7 fp=0xc048466610 sp=0xc0484665d8 pc=0x618cef3466f7
internal/poll.runtime_pollWait(0x7c50faa08d98, 0x72)
	runtime/netpoll.go:351 +0x85 fp=0xc048466630 sp=0xc048466610 pc=0x618cef380b05
internal/poll.(*pollDesc).wait(0xc000050080?, 0xc000686581?, 0x0)
	internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc048466658 sp=0xc048466630 pc=0x618cef407f87
internal/poll.(*pollDesc).waitRead(...)
	internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Read(0xc000050080, {0xc000686581, 0x1, 0x1})
	internal/poll/fd_unix.go:165 +0x27a fp=0xc0484666f0 sp=0xc048466658 pc=0x618cef40927a
net.(*netFD).Read(0xc000050080, {0xc000686581?, 0xc00014c518?, 0xc048466770?})
	net/fd_posix.go:55 +0x25 fp=0xc048466738 sp=0xc0484666f0 pc=0x618cef47e1c5
net.(*conn).Read(0xc000598000, {0xc000686581?, 0xc003078ac0?, 0x618cef767e40?})
	net/net.go:194 +0x45 fp=0xc048466780 sp=0xc048466738 pc=0x618cef48c585
net/http.(*connReader).backgroundRead(0xc000686570)
	net/http/server.go:690 +0x37 fp=0xc0484667c8 sp=0xc048466780 pc=0x618cef677e17
net/http.(*connReader).startBackgroundRead.gowrap2()
	net/http/server.go:686 +0x25 fp=0xc0484667e0 sp=0xc0484667c8 pc=0x618cef677d45
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc0484667e8 sp=0xc0484667e0 pc=0x618cef389021
created by net/http.(*connReader).startBackgroundRead in goroutine 10
	net/http/server.go:686 +0xb6

rax    0x7c495c38de40
rbx    0x7c495c38ddb0
rcx    0x3
rdx    0x7c4ae01d9a60
rdi    0x0
rsi    0x7c4890b3f030
rbp    0x7c4ae01d9a58
rsp    0x7c494bffec78
r8     0x4
r9     0xc00007c048
r10    0x1
r11    0x206
r12    0x1
r13    0x7c495c0133c8
r14    0xe99
r15    0x7c495c38ddb0
rip    0x618cf015bfd0
rflags 0x10202
cs     0x33
fs     0x0
gs     0x0

@Fade78 commented on GitHub (Mar 12, 2025): Using latest docker container, upgraded today to ollama 0.6.0 (using open-webui 0.5.20). Same problem: runs gemma3 models, except 27b (tested with very small context). ``` time=2025-03-12T15:34:20.706Z level=INFO source=sched.go:731 msg="new model will fit in available VRAM, loading" model=/root/.ollama/models/blobs/sha256-afa0ea2ef463c87a1eebb9af070e76a353107493b5d9a62e5e66f65a65409541 library=cuda parallel=1 required="20.2 GiB" time=2025-03-12T15:34:20.848Z level=INFO source=server.go:105 msg="system memory" total="62.4 GiB" free="40.1 GiB" free_swap="235.8 GiB" time=2025-03-12T15:34:20.849Z level=INFO source=server.go:138 msg=offload library=cuda layers.requested=-1 layers.model=63 layers.offload=63 layers.split=32,31 memory.available="[15.6 GiB 10.1 GiB]" memory.gpu_overhead="0 B" memory.required.full="20.2 GiB" memory.required.partial="20.2 GiB" memory.required.kv="248.0 MiB" memory.required.allocations="[10.6 GiB 9.6 GiB]" memory.weights.total="14.6 GiB" memory.weights.repeating="13.5 GiB" memory.weights.nonrepeating="1.1 GiB" memory.graph.full="1.6 GiB" memory.graph.partial="1.6 GiB" time=2025-03-12T15:34:20.915Z level=WARN source=ggml.go:149 msg="key not found" key=tokenizer.ggml.pretokenizer default="(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\\r\\n\\p{L}\\p{N}]?\\p{L}+|\\p{N}{1,3}| ?[^\\s\\p{L}\\p{N}]+[\\r\\n]*|\\s*[\\r\\n]+|\\s+(?!\\S)|\\s+" time=2025-03-12T15:34:20.918Z level=WARN source=ggml.go:149 msg="key not found" key=tokenizer.ggml.add_eot_token default=false time=2025-03-12T15:34:20.920Z level=WARN source=ggml.go:149 msg="key not found" key=tokenizer.ggml.pretokenizer default="(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\\r\\n\\p{L}\\p{N}]?\\p{L}+|\\p{N}{1,3}| ?[^\\s\\p{L}\\p{N}]+[\\r\\n]*|\\s*[\\r\\n]+|\\s+(?!\\S)|\\s+" time=2025-03-12T15:34:20.924Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.attention.layer_norm_rms_epsilon default=9.999999974752427e-07 time=2025-03-12T15:34:20.924Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.rope.local.freq_base default=10000 time=2025-03-12T15:34:20.924Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06 time=2025-03-12T15:34:20.924Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.rope.freq_scale default=1 time=2025-03-12T15:34:20.925Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.final_logit_softcapping default=30 time=2025-03-12T15:34:20.925Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.mm_tokens_per_image default=256 time=2025-03-12T15:34:20.925Z level=INFO source=server.go:405 msg="starting llama server" cmd="/usr/bin/ollama runner --ollama-engine --model /root/.ollama/models/blobs/sha256-afa0ea2ef463c87a1eebb9af070e76a353107493b5d9a62e5e66f65a65409541 --ctx-size 512 --batch-size 512 --n-gpu-layers 63 --threads 8 --parallel 1 --tensor-split 32,31 --port 40645" time=2025-03-12T15:34:20.925Z level=INFO source=sched.go:450 msg="loaded runners" count=1 time=2025-03-12T15:34:20.925Z level=INFO source=server.go:585 msg="waiting for llama runner to start responding" time=2025-03-12T15:34:20.925Z level=INFO source=server.go:619 msg="waiting for server to become available" status="llm server error" time=2025-03-12T15:34:20.933Z level=INFO source=runner.go:882 msg="starting ollama engine" time=2025-03-12T15:34:20.933Z level=INFO source=runner.go:938 msg="Server listening on 127.0.0.1:40645" time=2025-03-12T15:34:21.003Z level=WARN source=ggml.go:149 msg="key not found" key=general.name default="" time=2025-03-12T15:34:21.003Z level=WARN source=ggml.go:149 msg="key not found" key=general.description default="" time=2025-03-12T15:34:21.003Z level=INFO source=ggml.go:67 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=1247 num_key_values=36 ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 2 CUDA devices: Device 0: NVIDIA GeForce RTX 4060 Ti, compute capability 8.9, VMM: yes Device 1: NVIDIA GeForce RTX 4070, compute capability 8.9, VMM: yes load_backend: loaded CUDA backend from /usr/lib/ollama/cuda_v12/libggml-cuda.so load_backend: loaded CPU backend from /usr/lib/ollama/libggml-cpu-icelake.so time=2025-03-12T15:34:21.099Z level=INFO source=ggml.go:109 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.AVX512=1 CPU.0.AVX512_VBMI=1 CPU.0.AVX512_VNNI=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=500,600,610,700,750,800,860,870,890,900,1200 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 CUDA.1.ARCHS=500,600,610,700,750,800,860,870,890,900,1200 CUDA.1.USE_GRAPHS=1 CUDA.1.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc) time=2025-03-12T15:34:21.147Z level=INFO source=ggml.go:289 msg="model weights" buffer=CPU size="1.1 GiB" time=2025-03-12T15:34:21.147Z level=INFO source=ggml.go:289 msg="model weights" buffer=CUDA0 size="7.4 GiB" time=2025-03-12T15:34:21.147Z level=INFO source=ggml.go:289 msg="model weights" buffer=CUDA1 size="8.8 GiB" time=2025-03-12T15:34:21.178Z level=INFO source=server.go:619 msg="waiting for server to become available" status="llm server loading model" time=2025-03-12T15:34:25.770Z level=INFO source=ggml.go:356 msg="compute graph" backend=CUDA0 buffer_type=CUDA0 time=2025-03-12T15:34:25.770Z level=INFO source=ggml.go:356 msg="compute graph" backend=CUDA1 buffer_type=CUDA1 time=2025-03-12T15:34:25.770Z level=INFO source=ggml.go:356 msg="compute graph" backend=CPU buffer_type=CUDA_Host time=2025-03-12T15:34:25.770Z level=WARN source=ggml.go:149 msg="key not found" key=tokenizer.ggml.pretokenizer default="(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\\r\\n\\p{L}\\p{N}]?\\p{L}+|\\p{N}{1,3}| ?[^\\s\\p{L}\\p{N}]+[\\r\\n]*|\\s*[\\r\\n]+|\\s+(?!\\S)|\\s+" time=2025-03-12T15:34:25.771Z level=WARN source=ggml.go:149 msg="key not found" key=tokenizer.ggml.add_eot_token default=false time=2025-03-12T15:34:25.773Z level=WARN source=ggml.go:149 msg="key not found" key=tokenizer.ggml.pretokenizer default="(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\\r\\n\\p{L}\\p{N}]?\\p{L}+|\\p{N}{1,3}| ?[^\\s\\p{L}\\p{N}]+[\\r\\n]*|\\s*[\\r\\n]+|\\s+(?!\\S)|\\s+" time=2025-03-12T15:34:25.776Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.attention.layer_norm_rms_epsilon default=9.999999974752427e-07 time=2025-03-12T15:34:25.776Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.rope.local.freq_base default=10000 time=2025-03-12T15:34:25.776Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06 time=2025-03-12T15:34:25.776Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.rope.freq_scale default=1 time=2025-03-12T15:34:25.776Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.final_logit_softcapping default=30 time=2025-03-12T15:34:25.776Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.mm_tokens_per_image default=256 time=2025-03-12T15:34:25.828Z level=INFO source=server.go:624 msg="llama runner started in 4.90 seconds" ggml_backend_cuda_buffer_type_alloc_buffer: allocating 1209.99 MiB on device 1: cudaMalloc failed: out of memory ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 1268767232 SIGSEGV: segmentation violation PC=0x618cf015bfd0 m=345 sigcode=1 addr=0x58 signal arrived during cgo execution goroutine 9 gp=0xc000502c40 m=345 mp=0xc08f156808 [syscall]: runtime.cgocall(0x618cf01affe0, 0xc000519b00) runtime/cgocall.go:167 +0x4b fp=0xc000519ad8 sp=0xc000519aa0 pc=0x618cef37e60b github.com/ollama/ollama/ml/backend/ggml._Cfunc_ggml_backend_sched_graph_compute_async(0x7c495c013270, 0x7c506c1bbec0) _cgo_gotypes.go:485 +0x4a fp=0xc000519b00 sp=0xc000519ad8 pc=0x618cef7691aa github.com/ollama/ollama/ml/backend/ggml.Context.Compute.func1(...) github.com/ollama/ollama/ml/backend/ggml/ggml.go:497 github.com/ollama/ollama/ml/backend/ggml.Context.Compute({0xc000420280, 0x7c4ae000d230, 0x7c506c1bbec0, 0x0, 0x2000}, {0xc09359aa70, 0x1, 0x618cf066b210?}) github.com/ollama/ollama/ml/backend/ggml/ggml.go:497 +0xbd fp=0xc000519b90 sp=0xc000519b00 pc=0x618cef771a7d github.com/ollama/ollama/ml/backend/ggml.(*Context).Compute(0xc094789c80?, {0xc09359aa70?, 0x200?, 0xc0afc14008?}) <autogenerated>:1 +0x72 fp=0xc000519c08 sp=0xc000519b90 pc=0x618cef7774f2 github.com/ollama/ollama/model.Forward({0x618cf0662c00, 0xc094789c80}, {0x618cf065a2b0, 0xc0003420e0}, {{0xc0afc12800, 0x112, 0x200}, {0xc0afc14008, 0x100, 0x155}, ...}) github.com/ollama/ollama/model/model.go:303 +0x218 fp=0xc000519cf0 sp=0xc000519c08 pc=0x618cef79e718 github.com/ollama/ollama/runner/ollamarunner.(*Server).processBatch(0xc0004d6120) github.com/ollama/ollama/runner/ollamarunner/runner.go:395 +0x3bb fp=0xc000519f98 sp=0xc000519cf0 pc=0x618cef80adfb github.com/ollama/ollama/runner/ollamarunner.(*Server).run(0xc0004d6120, {0x618cf065b5e0, 0xc0000fb590}) github.com/ollama/ollama/runner/ollamarunner/runner.go:321 +0x4e fp=0xc000519fb8 sp=0xc000519f98 pc=0x618cef80a9ee github.com/ollama/ollama/runner/ollamarunner.Execute.gowrap2() github.com/ollama/ollama/runner/ollamarunner/runner.go:919 +0x28 fp=0xc000519fe0 sp=0xc000519fb8 pc=0x618cef80f668 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000519fe8 sp=0xc000519fe0 pc=0x618cef389021 created by github.com/ollama/ollama/runner/ollamarunner.Execute in goroutine 1 github.com/ollama/ollama/runner/ollamarunner/runner.go:919 +0xa9c goroutine 1 gp=0xc000002380 m=nil [IO wait]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000515648 sp=0xc000515628 pc=0x618cef3818ee runtime.netpollblock(0xc000515698?, 0xef31b226?, 0x8c?) runtime/netpoll.go:575 +0xf7 fp=0xc000515680 sp=0xc000515648 pc=0x618cef3466f7 internal/poll.runtime_pollWait(0x7c50faa08eb0, 0x72) runtime/netpoll.go:351 +0x85 fp=0xc0005156a0 sp=0xc000515680 pc=0x618cef380b05 internal/poll.(*pollDesc).wait(0xc000050d80?, 0x900324cfe?, 0x0) internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc0005156c8 sp=0xc0005156a0 pc=0x618cef407f87 internal/poll.(*pollDesc).waitRead(...) internal/poll/fd_poll_runtime.go:89 internal/poll.(*FD).Accept(0xc000050d80) internal/poll/fd_unix.go:620 +0x295 fp=0xc000515770 sp=0xc0005156c8 pc=0x618cef40d355 net.(*netFD).accept(0xc000050d80) net/fd_unix.go:172 +0x29 fp=0xc000515828 sp=0xc000515770 pc=0x618cef480169 net.(*TCPListener).accept(0xc00014cd40) net/tcpsock_posix.go:159 +0x1b fp=0xc000515878 sp=0xc000515828 pc=0x618cef495b1b net.(*TCPListener).Accept(0xc00014cd40) net/tcpsock.go:380 +0x30 fp=0xc0005158a8 sp=0xc000515878 pc=0x618cef4949d0 net/http.(*onceCloseListener).Accept(0xc000246090?) <autogenerated>:1 +0x24 fp=0xc0005158c0 sp=0xc0005158a8 pc=0x618cef6abb44 net/http.(*Server).Serve(0xc000200600, {0x618cf0659318, 0xc00014cd40}) net/http/server.go:3424 +0x30c fp=0xc0005159f0 sp=0xc0005158c0 pc=0x618cef68340c github.com/ollama/ollama/runner/ollamarunner.Execute({0xc000034170, 0x10, 0x11}) github.com/ollama/ollama/runner/ollamarunner/runner.go:939 +0xe6a fp=0xc000515d08 sp=0xc0005159f0 pc=0x618cef80f3aa github.com/ollama/ollama/runner.Execute({0xc000034150?, 0x0?, 0x0?}) github.com/ollama/ollama/runner/runner.go:20 +0xc9 fp=0xc000515d30 sp=0xc000515d08 pc=0x618cef80ff09 github.com/ollama/ollama/cmd.NewCLI.func2(0xc000120f00?, {0x618cf01cb054?, 0x4?, 0x618cf01cb058?}) github.com/ollama/ollama/cmd/cmd.go:1285 +0x45 fp=0xc000515d58 sp=0xc000515d30 pc=0x618ceff806a5 github.com/spf13/cobra.(*Command).execute(0xc0004c2f08, {0xc0001afe60, 0x11, 0x12}) github.com/spf13/cobra@v1.7.0/command.go:940 +0x85c fp=0xc000515e78 sp=0xc000515d58 pc=0x618cef4f92fc github.com/spf13/cobra.(*Command).ExecuteC(0xc0004a6908) github.com/spf13/cobra@v1.7.0/command.go:1068 +0x3a5 fp=0xc000515f30 sp=0xc000515e78 pc=0x618cef4f9b45 github.com/spf13/cobra.(*Command).Execute(...) github.com/spf13/cobra@v1.7.0/command.go:992 github.com/spf13/cobra.(*Command).ExecuteContext(...) github.com/spf13/cobra@v1.7.0/command.go:985 main.main() github.com/ollama/ollama/main.go:12 +0x4d fp=0xc000515f50 sp=0xc000515f30 pc=0x618ceff80a0d runtime.main() runtime/proc.go:283 +0x29d fp=0xc000515fe0 sp=0xc000515f50 pc=0x618cef34dcfd runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000515fe8 sp=0xc000515fe0 pc=0x618cef389021 goroutine 2 gp=0xc000002e00 m=nil [force gc (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000084fa8 sp=0xc000084f88 pc=0x618cef3818ee runtime.goparkunlock(...) runtime/proc.go:441 runtime.forcegchelper() runtime/proc.go:348 +0xb8 fp=0xc000084fe0 sp=0xc000084fa8 pc=0x618cef34e038 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000084fe8 sp=0xc000084fe0 pc=0x618cef389021 created by runtime.init.7 in goroutine 1 runtime/proc.go:336 +0x1a goroutine 3 gp=0xc000003340 m=nil [GC sweep wait]: runtime.gopark(0x1?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000085780 sp=0xc000085760 pc=0x618cef3818ee runtime.goparkunlock(...) runtime/proc.go:441 runtime.bgsweep(0xc00003e080) runtime/mgcsweep.go:316 +0xdf fp=0xc0000857c8 sp=0xc000085780 pc=0x618cef33885f runtime.gcenable.gowrap1() runtime/mgc.go:204 +0x25 fp=0xc0000857e0 sp=0xc0000857c8 pc=0x618cef32cc45 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0000857e8 sp=0xc0000857e0 pc=0x618cef389021 created by runtime.gcenable in goroutine 1 runtime/mgc.go:204 +0x66 goroutine 4 gp=0xc000003500 m=nil [GC scavenge wait]: runtime.gopark(0x10000?, 0x618cf0381a00?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000085f78 sp=0xc000085f58 pc=0x618cef3818ee runtime.goparkunlock(...) runtime/proc.go:441 runtime.(*scavengerState).park(0x618cf0ebfb40) runtime/mgcscavenge.go:425 +0x49 fp=0xc000085fa8 sp=0xc000085f78 pc=0x618cef3362a9 runtime.bgscavenge(0xc00003e080) runtime/mgcscavenge.go:658 +0x59 fp=0xc000085fc8 sp=0xc000085fa8 pc=0x618cef336839 runtime.gcenable.gowrap2() runtime/mgc.go:205 +0x25 fp=0xc000085fe0 sp=0xc000085fc8 pc=0x618cef32cbe5 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000085fe8 sp=0xc000085fe0 pc=0x618cef389021 created by runtime.gcenable in goroutine 1 runtime/mgc.go:205 +0xa5 goroutine 5 gp=0xc000003dc0 m=nil [finalizer wait]: runtime.gopark(0x1b8?, 0xc000002380?, 0x1?, 0x23?, 0xc000084688?) runtime/proc.go:435 +0xce fp=0xc000084630 sp=0xc000084610 pc=0x618cef3818ee runtime.runfinq() runtime/mfinal.go:196 +0x107 fp=0xc0000847e0 sp=0xc000084630 pc=0x618cef32bc07 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0000847e8 sp=0xc0000847e0 pc=0x618cef389021 created by runtime.createfing in goroutine 1 runtime/mfinal.go:166 +0x3d goroutine 6 gp=0xc0001de8c0 m=nil [chan receive]: runtime.gopark(0xc000233720?, 0xc00023c078?, 0x60?, 0xf?, 0x618cef466ea8?) runtime/proc.go:435 +0xce fp=0xc091a90f18 sp=0xc091a90ef8 pc=0x618cef3818ee runtime.chanrecv(0xc0000423f0, 0x0, 0x1) runtime/chan.go:664 +0x445 fp=0xc091a90f90 sp=0xc091a90f18 pc=0x618cef31de05 runtime.chanrecv1(0x0?, 0x0?) runtime/chan.go:506 +0x12 fp=0xc091a90fb8 sp=0xc091a90f90 pc=0x618cef31d992 runtime.unique_runtime_registerUniqueMapCleanup.func2(...) runtime/mgc.go:1796 runtime.unique_runtime_registerUniqueMapCleanup.gowrap1() runtime/mgc.go:1799 +0x2f fp=0xc091a90fe0 sp=0xc091a90fb8 pc=0x618cef32fdef runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc091a90fe8 sp=0xc091a90fe0 pc=0x618cef389021 created by unique.runtime_registerUniqueMapCleanup in goroutine 1 runtime/mgc.go:1794 +0x85 goroutine 7 gp=0xc0001defc0 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000086f38 sp=0xc000086f18 pc=0x618cef3818ee runtime.gcBgMarkWorker(0xc0000439d0) runtime/mgc.go:1423 +0xe9 fp=0xc000086fc8 sp=0xc000086f38 pc=0x618cef32f109 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc000086fe0 sp=0xc000086fc8 pc=0x618cef32efe5 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000086fe8 sp=0xc000086fe0 pc=0x618cef389021 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 18 gp=0xc000102380 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000080738 sp=0xc000080718 pc=0x618cef3818ee runtime.gcBgMarkWorker(0xc0000439d0) runtime/mgc.go:1423 +0xe9 fp=0xc0000807c8 sp=0xc000080738 pc=0x618cef32f109 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc0000807e0 sp=0xc0000807c8 pc=0x618cef32efe5 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0000807e8 sp=0xc0000807e0 pc=0x618cef389021 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 19 gp=0xc000102540 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000080f38 sp=0xc000080f18 pc=0x618cef3818ee runtime.gcBgMarkWorker(0xc0000439d0) runtime/mgc.go:1423 +0xe9 fp=0xc000080fc8 sp=0xc000080f38 pc=0x618cef32f109 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc000080fe0 sp=0xc000080fc8 pc=0x618cef32efe5 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000080fe8 sp=0xc000080fe0 pc=0x618cef389021 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 20 gp=0xc000102700 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000081738 sp=0xc000081718 pc=0x618cef3818ee runtime.gcBgMarkWorker(0xc0000439d0) runtime/mgc.go:1423 +0xe9 fp=0xc0000817c8 sp=0xc000081738 pc=0x618cef32f109 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc0000817e0 sp=0xc0000817c8 pc=0x618cef32efe5 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0000817e8 sp=0xc0000817e0 pc=0x618cef389021 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 21 gp=0xc0001028c0 m=nil [GC worker (idle)]: runtime.gopark(0xb8be929d3d8d?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000081f38 sp=0xc000081f18 pc=0x618cef3818ee runtime.gcBgMarkWorker(0xc0000439d0) runtime/mgc.go:1423 +0xe9 fp=0xc000081fc8 sp=0xc000081f38 pc=0x618cef32f109 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc000081fe0 sp=0xc000081fc8 pc=0x618cef32efe5 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000081fe8 sp=0xc000081fe0 pc=0x618cef389021 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 22 gp=0xc000102a80 m=nil [GC worker (idle)]: runtime.gopark(0x618cf0647068?, 0xc0001120a0?, 0x1b?, 0xa?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000082738 sp=0xc000082718 pc=0x618cef3818ee runtime.gcBgMarkWorker(0xc0000439d0) runtime/mgc.go:1423 +0xe9 fp=0xc0000827c8 sp=0xc000082738 pc=0x618cef32f109 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc0000827e0 sp=0xc0000827c8 pc=0x618cef32efe5 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0000827e8 sp=0xc0000827e0 pc=0x618cef389021 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 23 gp=0xc000102c40 m=nil [GC worker (idle)]: runtime.gopark(0xb8be9e4369f3?, 0x1?, 0x8f?, 0x24?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000082f38 sp=0xc000082f18 pc=0x618cef3818ee runtime.gcBgMarkWorker(0xc0000439d0) runtime/mgc.go:1423 +0xe9 fp=0xc000082fc8 sp=0xc000082f38 pc=0x618cef32f109 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc000082fe0 sp=0xc000082fc8 pc=0x618cef32efe5 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000082fe8 sp=0xc000082fe0 pc=0x618cef389021 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 24 gp=0xc000102e00 m=nil [GC worker (idle)]: runtime.gopark(0x618cf0f6e2c0?, 0x3?, 0xf4?, 0x29?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000083738 sp=0xc000083718 pc=0x618cef3818ee runtime.gcBgMarkWorker(0xc0000439d0) runtime/mgc.go:1423 +0xe9 fp=0xc0000837c8 sp=0xc000083738 pc=0x618cef32f109 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc0000837e0 sp=0xc0000837c8 pc=0x618cef32efe5 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0000837e8 sp=0xc0000837e0 pc=0x618cef389021 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 25 gp=0xc000102fc0 m=nil [GC worker (idle)]: runtime.gopark(0x618cf0f6e2c0?, 0x1?, 0x58?, 0x56?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000083f38 sp=0xc000083f18 pc=0x618cef3818ee runtime.gcBgMarkWorker(0xc0000439d0) runtime/mgc.go:1423 +0xe9 fp=0xc000083fc8 sp=0xc000083f38 pc=0x618cef32f109 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc000083fe0 sp=0xc000083fc8 pc=0x618cef32efe5 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000083fe8 sp=0xc000083fe0 pc=0x618cef389021 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 26 gp=0xc000103180 m=nil [GC worker (idle)]: runtime.gopark(0x618cf0f6e2c0?, 0x1?, 0x4?, 0x67?, 0x0?) runtime/proc.go:435 +0xce fp=0xc00011c738 sp=0xc00011c718 pc=0x618cef3818ee runtime.gcBgMarkWorker(0xc0000439d0) runtime/mgc.go:1423 +0xe9 fp=0xc00011c7c8 sp=0xc00011c738 pc=0x618cef32f109 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc00011c7e0 sp=0xc00011c7c8 pc=0x618cef32efe5 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc00011c7e8 sp=0xc00011c7e0 pc=0x618cef389021 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 27 gp=0xc000103340 m=nil [GC worker (idle)]: runtime.gopark(0xb8be9e42685f?, 0x3?, 0xad?, 0x3b?, 0x0?) runtime/proc.go:435 +0xce fp=0xc00011cf38 sp=0xc00011cf18 pc=0x618cef3818ee runtime.gcBgMarkWorker(0xc0000439d0) runtime/mgc.go:1423 +0xe9 fp=0xc00011cfc8 sp=0xc00011cf38 pc=0x618cef32f109 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc00011cfe0 sp=0xc00011cfc8 pc=0x618cef32efe5 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc00011cfe8 sp=0xc00011cfe0 pc=0x618cef389021 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 28 gp=0xc000103500 m=nil [GC worker (idle)]: runtime.gopark(0xb8be9e11cedd?, 0x1?, 0x5b?, 0x1e?, 0x0?) runtime/proc.go:435 +0xce fp=0xc00011d738 sp=0xc00011d718 pc=0x618cef3818ee runtime.gcBgMarkWorker(0xc0000439d0) runtime/mgc.go:1423 +0xe9 fp=0xc00011d7c8 sp=0xc00011d738 pc=0x618cef32f109 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc00011d7e0 sp=0xc00011d7c8 pc=0x618cef32efe5 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc00011d7e8 sp=0xc00011d7e0 pc=0x618cef389021 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 29 gp=0xc0001036c0 m=nil [GC worker (idle)]: runtime.gopark(0xb8bea83aae99?, 0x1?, 0x65?, 0x97?, 0x0?) runtime/proc.go:435 +0xce fp=0xc00011df38 sp=0xc00011df18 pc=0x618cef3818ee runtime.gcBgMarkWorker(0xc0000439d0) runtime/mgc.go:1423 +0xe9 fp=0xc00011dfc8 sp=0xc00011df38 pc=0x618cef32f109 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc00011dfe0 sp=0xc00011dfc8 pc=0x618cef32efe5 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc00011dfe8 sp=0xc00011dfe0 pc=0x618cef389021 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 30 gp=0xc000103880 m=nil [GC worker (idle)]: runtime.gopark(0x618cf0f6e2c0?, 0x1?, 0x3c?, 0x70?, 0x0?) runtime/proc.go:435 +0xce fp=0xc00011e738 sp=0xc00011e718 pc=0x618cef3818ee runtime.gcBgMarkWorker(0xc0000439d0) runtime/mgc.go:1423 +0xe9 fp=0xc00011e7c8 sp=0xc00011e738 pc=0x618cef32f109 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc00011e7e0 sp=0xc00011e7c8 pc=0x618cef32efe5 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc00011e7e8 sp=0xc00011e7e0 pc=0x618cef389021 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 31 gp=0xc000103a40 m=nil [GC worker (idle)]: runtime.gopark(0xb8be9470b48f?, 0x1?, 0x8b?, 0xdd?, 0x0?) runtime/proc.go:435 +0xce fp=0xc00011ef38 sp=0xc00011ef18 pc=0x618cef3818ee runtime.gcBgMarkWorker(0xc0000439d0) runtime/mgc.go:1423 +0xe9 fp=0xc00011efc8 sp=0xc00011ef38 pc=0x618cef32f109 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc00011efe0 sp=0xc00011efc8 pc=0x618cef32efe5 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc00011efe8 sp=0xc00011efe0 pc=0x618cef389021 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 32 gp=0xc000103c00 m=nil [GC worker (idle)]: runtime.gopark(0x618cf0f6e2c0?, 0x1?, 0xf6?, 0x31?, 0x0?) runtime/proc.go:435 +0xce fp=0xc00011f738 sp=0xc00011f718 pc=0x618cef3818ee runtime.gcBgMarkWorker(0xc0000439d0) runtime/mgc.go:1423 +0xe9 fp=0xc00011f7c8 sp=0xc00011f738 pc=0x618cef32f109 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc00011f7e0 sp=0xc00011f7c8 pc=0x618cef32efe5 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc00011f7e8 sp=0xc00011f7e0 pc=0x618cef389021 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 10 gp=0xc0030ba380 m=nil [select]: runtime.gopark(0xc0935d7a68?, 0x2?, 0x0?, 0x1a?, 0xc0935d780c?) runtime/proc.go:435 +0xce fp=0xc0935d7620 sp=0xc0935d7600 pc=0x618cef3818ee runtime.selectgo(0xc0935d7a68, 0xc0935d7808, 0x112?, 0x0, 0x1?, 0x1) runtime/select.go:351 +0x837 fp=0xc0935d7758 sp=0xc0935d7620 pc=0x618cef3601f7 github.com/ollama/ollama/runner/ollamarunner.(*Server).completion(0xc0004d6120, {0x618cf06594f8, 0xc09359e380}, 0xc00037e8c0) github.com/ollama/ollama/runner/ollamarunner/runner.go:649 +0xad0 fp=0xc0935d7ac0 sp=0xc0935d7758 pc=0x618cef80ce70 github.com/ollama/ollama/runner/ollamarunner.(*Server).completion-fm({0x618cf06594f8?, 0xc09359e380?}, 0xc000519b40?) <autogenerated>:1 +0x36 fp=0xc0935d7af0 sp=0xc0935d7ac0 pc=0x618cef80fa36 net/http.HandlerFunc.ServeHTTP(0xc000143500?, {0x618cf06594f8?, 0xc09359e380?}, 0xc000519b60?) net/http/server.go:2294 +0x29 fp=0xc0935d7b18 sp=0xc0935d7af0 pc=0x618cef67fa49 net/http.(*ServeMux).ServeHTTP(0x618cef326125?, {0x618cf06594f8, 0xc09359e380}, 0xc00037e8c0) net/http/server.go:2822 +0x1c4 fp=0xc0935d7b68 sp=0xc0935d7b18 pc=0x618cef681944 net/http.serverHandler.ServeHTTP({0x618cf0655b10?}, {0x618cf06594f8?, 0xc09359e380?}, 0x1?) net/http/server.go:3301 +0x8e fp=0xc0935d7b98 sp=0xc0935d7b68 pc=0x618cef69f3ce net/http.(*conn).serve(0xc000246090, {0x618cf065b5a8, 0xc0000fdb60}) net/http/server.go:2102 +0x625 fp=0xc0935d7fb8 sp=0xc0935d7b98 pc=0x618cef67df45 net/http.(*Server).Serve.gowrap3() net/http/server.go:3454 +0x28 fp=0xc0935d7fe0 sp=0xc0935d7fb8 pc=0x618cef683808 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0935d7fe8 sp=0xc0935d7fe0 pc=0x618cef389021 created by net/http.(*Server).Serve in goroutine 1 net/http/server.go:3454 +0x485 goroutine 1366 gp=0xc08e57f180 m=nil [IO wait]: runtime.gopark(0xc0484665f8?, 0x618cef388c7c?, 0x20?, 0x66?, 0xb?) runtime/proc.go:435 +0xce fp=0xc0484665d8 sp=0xc0484665b8 pc=0x618cef3818ee runtime.netpollblock(0x618cef3a4d78?, 0xef31b226?, 0x8c?) runtime/netpoll.go:575 +0xf7 fp=0xc048466610 sp=0xc0484665d8 pc=0x618cef3466f7 internal/poll.runtime_pollWait(0x7c50faa08d98, 0x72) runtime/netpoll.go:351 +0x85 fp=0xc048466630 sp=0xc048466610 pc=0x618cef380b05 internal/poll.(*pollDesc).wait(0xc000050080?, 0xc000686581?, 0x0) internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc048466658 sp=0xc048466630 pc=0x618cef407f87 internal/poll.(*pollDesc).waitRead(...) internal/poll/fd_poll_runtime.go:89 internal/poll.(*FD).Read(0xc000050080, {0xc000686581, 0x1, 0x1}) internal/poll/fd_unix.go:165 +0x27a fp=0xc0484666f0 sp=0xc048466658 pc=0x618cef40927a net.(*netFD).Read(0xc000050080, {0xc000686581?, 0xc00014c518?, 0xc048466770?}) net/fd_posix.go:55 +0x25 fp=0xc048466738 sp=0xc0484666f0 pc=0x618cef47e1c5 net.(*conn).Read(0xc000598000, {0xc000686581?, 0xc003078ac0?, 0x618cef767e40?}) net/net.go:194 +0x45 fp=0xc048466780 sp=0xc048466738 pc=0x618cef48c585 net/http.(*connReader).backgroundRead(0xc000686570) net/http/server.go:690 +0x37 fp=0xc0484667c8 sp=0xc048466780 pc=0x618cef677e17 net/http.(*connReader).startBackgroundRead.gowrap2() net/http/server.go:686 +0x25 fp=0xc0484667e0 sp=0xc0484667c8 pc=0x618cef677d45 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0484667e8 sp=0xc0484667e0 pc=0x618cef389021 created by net/http.(*connReader).startBackgroundRead in goroutine 10 net/http/server.go:686 +0xb6 rax 0x7c495c38de40 rbx 0x7c495c38ddb0 rcx 0x3 rdx 0x7c4ae01d9a60 rdi 0x0 rsi 0x7c4890b3f030 rbp 0x7c4ae01d9a58 rsp 0x7c494bffec78 r8 0x4 r9 0xc00007c048 r10 0x1 r11 0x206 r12 0x1 r13 0x7c495c0133c8 r14 0xe99 r15 0x7c495c38ddb0 rip 0x618cf015bfd0 rflags 0x10202 cs 0x33 fs 0x0 gs 0x0 ```

GiteaMirror commented

2026-05-04 13:41:44 -05:00

@rick-github commented on GitHub (Mar 12, 2025):

ggml_backend_cuda_buffer_type_alloc_buffer: allocating 1209.99 MiB on device 1: cudaMalloc failed: out of memory
ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 1268767232

https://github.com/ollama/ollama/issues/8597#issuecomment-2614533288

@rick-github commented on GitHub (Mar 12, 2025): ``` ggml_backend_cuda_buffer_type_alloc_buffer: allocating 1209.99 MiB on device 1: cudaMalloc failed: out of memory ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 1268767232 ``` https://github.com/ollama/ollama/issues/8597#issuecomment-2614533288

GiteaMirror commented

2026-05-04 13:41:46 -05:00

@jarek7777 commented on GitHub (Mar 12, 2025):

My log

mar 12 16:08:32 ollama[1942066]: time=2025-03-12T16:08:32.458+01:00 level=WARN source=sched.go:647 msg="gpu VRAM usage didn't recover within timeout" seconds=5.000904565 model=/usr/share/ollama/.ollama/models/blobs/sha256-daec91ffb5dd0c27411bd71f29932917c49cf529a641d0168496c3a501e3062c
mar 12 16:08:32 ollama[1942066]: time=2025-03-12T16:08:32.709+01:00 level=WARN source=sched.go:647 msg="gpu VRAM usage didn't recover within timeout" seconds=5.251344829 model=/usr/share/ollama/.ollama/models/blobs/sha256-daec91ffb5dd0c27411bd71f29932917c49cf529a641d0168496c3a501e3062c
mar 12 16:08:32 ollama[1942066]: time=2025-03-12T16:08:32.958+01:00 level=WARN source=sched.go:647 msg="gpu VRAM usage didn't recover within timeout" seconds=5.500742259 model=/usr/share/ollama/.ollama/models/blobs/sha256-daec91ffb5dd0c27411bd71f29932917c49cf529a641d0168496c3a501e3062c
mar 12 16:09:43 ollama[1942066]: [GIN] 2025/03/12 - 16:09:43 | 200 | 22.532µs | 127.0.0.1 | HEAD "/"
mar 12 16:09:43 ollama[1942066]: [GIN] 2025/03/12 - 16:09:43 | 200 | 39.670658ms | 127.0.0.1 | POST "/api/show"
mar 12 16:09:44 ollama[1942066]: time=2025-03-12T16:09:44.240+01:00 level=INFO source=sched.go:731 msg="new model will fit in available VRAM, loading" model=/usr/share/ollama/.ollama/models/blobs/sha256-8bf5daddfa5b7ee1f84fd3d759261439151106d8908ea064c4e4445afc8c8683 library=cuda parallel=3 required="58.9 GiB"
mar 12 16:09:44 ollama[1942066]: time=2025-03-12T16:09:44.447+01:00 level=INFO source=server.go:105 msg="system memory" total="60.5 GiB" free="47.6 GiB" free_swap="23.9 MiB"
mar 12 16:09:44 ollama[1942066]: time=2025-03-12T16:09:44.659+01:00 level=INFO source=server.go:138 msg=offload library=cuda layers.requested=-1 layers.model=63 layers.offload=63 layers.split=32,31 memory.available="[46.5 GiB 42.7 GiB]" memory.gpu_overhead="0 B" memory.required.full="58.9 GiB" memory.required.partial="58.9 GiB" memory.required.kv="2.9 GiB" memory.required.allocations="[30.8 GiB 28.1 GiB]" memory.weights.total="50.6 GiB" memory.weights.repeating="48.0 GiB" memory.weights.nonrepeating="2.6 GiB" memory.graph.full="1.6 GiB" memory.graph.partial="1.6 GiB"
mar 12 16:09:44 ollama[1942066]: time=2025-03-12T16:09:44.659+01:00 level=INFO source=server.go:185 msg="enabling flash attention"
mar 12 16:09:44 ollama[1942066]: time=2025-03-12T16:09:44.659+01:00 level=WARN source=server.go:193 msg="kv cache type not supported by model" type=""
mar 12 16:09:44 ollama[1942066]: time=2025-03-12T16:09:44.707+01:00 level=WARN source=ggml.go:149 msg="key not found" key=tokenizer.ggml.pretokenizer default="(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\r\n\p{L}\p{N}]?\p{L}+|\p{N}{1,3}| ?[^\s\p{L}\p{N}]+[\r\n]|\s[\r\n]+|\s+(?!\S)|\s+"
mar 12 16:09:44 ollama[1942066]: time=2025-03-12T16:09:44.708+01:00 level=WARN source=ggml.go:149 msg="key not found" key=tokenizer.ggml.add_eot_token default=false
mar 12 16:09:44 ollama[1942066]: time=2025-03-12T16:09:44.710+01:00 level=WARN source=ggml.go:149 msg="key not found" key=tokenizer.ggml.pretokenizer default="(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\r\n\p{L}\p{N}]?\p{L}+|\p{N}{1,3}| ?[^\s\p{L}\p{N}]+[\r\n]|\s[\r\n]+|\s+(?!\S)|\s+"
mar 12 16:09:44 ollama[1942066]: time=2025-03-12T16:09:44.719+01:00 level=WARN source=ggml.go:149 msg="key not found" key=gemma3.attention.layer_norm_rms_epsilon default=9.999999974752427e-07
mar 12 16:09:44 ollama[1942066]: time=2025-03-12T16:09:44.719+01:00 level=WARN source=ggml.go:149 msg="key not found" key=gemma3.rope.local.freq_base default=10000
mar 12 16:09:44 ollama[1942066]: time=2025-03-12T16:09:44.719+01:00 level=WARN source=ggml.go:149 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06
mar 12 16:09:44 ollama[1942066]: time=2025-03-12T16:09:44.719+01:00 level=WARN source=ggml.go:149 msg="key not found" key=gemma3.rope.freq_scale default=1
mar 12 16:09:44 ollama[1942066]: time=2025-03-12T16:09:44.719+01:00 level=WARN source=ggml.go:149 msg="key not found" key=gemma3.final_logit_softcapping default=30
mar 12 16:09:44 ollama[1942066]: time=2025-03-12T16:09:44.719+01:00 level=WARN source=ggml.go:149 msg="key not found" key=gemma3.mm_tokens_per_image default=256
mar 12 16:09:44 ollama[1942066]: time=2025-03-12T16:09:44.719+01:00 level=INFO source=server.go:405 msg="starting llama server" cmd="/usr/local/bin/ollama runner --ollama-engine --model /usr/share/ollama/.ollama/models/blobs/sha256-8bf5daddfa5b7ee1f84fd3d759261439151106d8908ea064c4e4445afc8c8683 --ctx-size 6144 --batch-size 512 --n-gpu-layers 63 --threads 16 --flash-attn --no-mmap --parallel 3 --tensor-split 32,31 --port 40059"
mar 12 16:09:44 ollama[1942066]: time=2025-03-12T16:09:44.720+01:00 level=INFO source=sched.go:450 msg="loaded runners" count=1
mar 12 16:09:44 ollama[1942066]: time=2025-03-12T16:09:44.720+01:00 level=INFO source=server.go:585 msg="waiting for llama runner to start responding"
mar 12 16:09:44 ollama[1942066]: time=2025-03-12T16:09:44.720+01:00 level=INFO source=server.go:619 msg="waiting for server to become available" status="llm server error"
mar 12 16:09:44 ollama[1942066]: time=2025-03-12T16:09:44.725+01:00 level=INFO source=runner.go:882 msg="starting ollama engine"
mar 12 16:09:44 ollama[1942066]: time=2025-03-12T16:09:44.726+01:00 level=INFO source=runner.go:938 msg="Server listening on 127.0.0.1:40059"
mar 12 16:09:44 ollama[1942066]: time=2025-03-12T16:09:44.776+01:00 level=WARN source=ggml.go:149 msg="key not found" key=general.name default=""
mar 12 16:09:44 ollama[1942066]: time=2025-03-12T16:09:44.776+01:00 level=WARN source=ggml.go:149 msg="key not found" key=general.description default=""
mar 12 16:09:44 ollama[1942066]: time=2025-03-12T16:09:44.776+01:00 level=INFO source=ggml.go:67 msg="" architecture=gemma3 file_type=F16 name="" description="" num_tensors=1247 num_key_values=36
mar 12 16:09:44 ollama[1942066]: ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
mar 12 16:09:44 ollama[1942066]: ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
mar 12 16:09:44 ollama[1942066]: ggml_cuda_init: found 2 CUDA devices:
mar 12 16:09:44 ollama[1942066]: Device 0: NVIDIA RTX 6000 Ada Generation, compute capability 8.9, VMM: yes
mar 12 16:09:44 ollama[1942066]: Device 1: NVIDIA RTX 6000 Ada Generation, compute capability 8.9, VMM: yes
mar 12 16:09:44 ollama[1942066]: load_backend: loaded CUDA backend from /usr/local/lib/ollama/cuda_v12/libggml-cuda.so
mar 12 16:09:44 ollama[1942066]: load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-icelake.so
mar 12 16:09:44 ollama[1942066]: time=2025-03-12T16:09:44.885+01:00 level=INFO source=ggml.go:109 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.AVX512=1 CPU.0.AVX512_VBMI=1 CPU.0.AVX512_VNNI=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=500,600,610,700,750,800,860,870,890,900,1200 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 CUDA.1.ARCHS=500,600,610,700,750,800,860,870,890,900,1200 CUDA.1.USE_GRAPHS=1 CUDA.1.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc)
mar 12 16:09:44 ollama[1942066]: time=2025-03-12T16:09:44.950+01:00 level=INFO source=ggml.go:289 msg="model weights" buffer=CUDA0 size="24.6 GiB"
mar 12 16:09:44 ollama[1942066]: time=2025-03-12T16:09:44.950+01:00 level=INFO source=ggml.go:289 msg="model weights" buffer=CUDA1 size="26.5 GiB"
mar 12 16:09:44 ollama[1942066]: time=2025-03-12T16:09:44.950+01:00 level=INFO source=ggml.go:289 msg="model weights" buffer=CPU size="2.6 GiB"
mar 12 16:09:45 ollama[1942066]: time=2025-03-12T16:09:45.173+01:00 level=INFO source=server.go:619 msg="waiting for server to become available" status="llm server not responding"
mar 12 16:09:46 ollama[1942066]: time=2025-03-12T16:09:46.988+01:00 level=INFO source=server.go:619 msg="waiting for server to become available" status="llm server loading model"
mar 12 16:09:47 ollama[1942066]: time=2025-03-12T16:09:47.732+01:00 level=INFO source=server.go:619 msg="waiting for server to become available" status="llm server not responding"
mar 12 16:09:48 ollama[1942066]: time=2025-03-12T16:09:48.461+01:00 level=INFO source=server.go:619 msg="waiting for server to become available" status="llm server loading model"
mar 12 16:09:52 ollama[1942066]: time=2025-03-12T16:09:52.694+01:00 level=INFO source=server.go:619 msg="waiting for server to become available" status="llm server not responding"
mar 12 16:09:53 ollama[1942066]: time=2025-03-12T16:09:53.109+01:00 level=INFO source=server.go:619 msg="waiting for server to become available" status="llm server loading model"
mar 12 16:09:53 ollama[1942066]: time=2025-03-12T16:09:53.562+01:00 level=INFO source=server.go:619 msg="waiting for server to become available" status="llm server not responding"
mar 12 16:09:53 ollama[1942066]: time=2025-03-12T16:09:53.912+01:00 level=INFO source=server.go:619 msg="waiting for server to become available" status="llm server loading model"
mar 12 16:09:58 systemd[1]: ollama.service: A process of this unit has been killed by the OOM killer.
mar 12 16:09:58 ollama[1942066]: time=2025-03-12T16:09:58.223+01:00 level=INFO source=server.go:619 msg="waiting for server to become available" status="llm server error"
mar 12 16:09:58 ollama[1942066]: time=2025-03-12T16:09:58.473+01:00 level=WARN source=server.go:592 msg="client connection closed before server finished loading, aborting load"
mar 12 16:09:58 ollama[1942066]: time=2025-03-12T16:09:58.474+01:00 level=ERROR source=sched.go:456 msg="error loading llama server" error="timed out waiting for llama runner to start: context canceled"
mar 12 16:09:58 ollama[1942066]: [GIN] 2025/03/12 - 16:09:58 | 499 | 15.173257953s | 127.0.0.1 | POST "/api/generate"
mar 12 16:10:00 systemd[1]: ollama.service: Failed with result 'oom-kill'.
mar 12 16:10:00 systemd[1]: ollama.service: Consumed 14min 53.613s CPU time, 49.2G memory peak, 0B memory swap peak.

@jarek7777 commented on GitHub (Mar 12, 2025): My log mar 12 16:08:32 ollama[1942066]: time=2025-03-12T16:08:32.458+01:00 level=WARN source=sched.go:647 msg="gpu VRAM usage didn't recover within timeout" seconds=5.000904565 model=/usr/share/ollama/.ollama/models/blobs/sha256-daec91ffb5dd0c27411bd71f29932917c49cf529a641d0168496c3a501e3062c mar 12 16:08:32 ollama[1942066]: time=2025-03-12T16:08:32.709+01:00 level=WARN source=sched.go:647 msg="gpu VRAM usage didn't recover within timeout" seconds=5.251344829 model=/usr/share/ollama/.ollama/models/blobs/sha256-daec91ffb5dd0c27411bd71f29932917c49cf529a641d0168496c3a501e3062c mar 12 16:08:32 ollama[1942066]: time=2025-03-12T16:08:32.958+01:00 level=WARN source=sched.go:647 msg="gpu VRAM usage didn't recover within timeout" seconds=5.500742259 model=/usr/share/ollama/.ollama/models/blobs/sha256-daec91ffb5dd0c27411bd71f29932917c49cf529a641d0168496c3a501e3062c mar 12 16:09:43 ollama[1942066]: [GIN] 2025/03/12 - 16:09:43 | 200 | 22.532µs | 127.0.0.1 | HEAD "/" mar 12 16:09:43 ollama[1942066]: [GIN] 2025/03/12 - 16:09:43 | 200 | 39.670658ms | 127.0.0.1 | POST "/api/show" mar 12 16:09:44 ollama[1942066]: time=2025-03-12T16:09:44.240+01:00 level=INFO source=sched.go:731 msg="new model will fit in available VRAM, loading" model=/usr/share/ollama/.ollama/models/blobs/sha256-8bf5daddfa5b7ee1f84fd3d759261439151106d8908ea064c4e4445afc8c8683 library=cuda parallel=3 required="58.9 GiB" mar 12 16:09:44 ollama[1942066]: time=2025-03-12T16:09:44.447+01:00 level=INFO source=server.go:105 msg="system memory" total="60.5 GiB" free="47.6 GiB" free_swap="23.9 MiB" mar 12 16:09:44 ollama[1942066]: time=2025-03-12T16:09:44.659+01:00 level=INFO source=server.go:138 msg=offload library=cuda layers.requested=-1 layers.model=63 layers.offload=63 layers.split=32,31 memory.available="[46.5 GiB 42.7 GiB]" memory.gpu_overhead="0 B" memory.required.full="58.9 GiB" memory.required.partial="58.9 GiB" memory.required.kv="2.9 GiB" memory.required.allocations="[30.8 GiB 28.1 GiB]" memory.weights.total="50.6 GiB" memory.weights.repeating="48.0 GiB" memory.weights.nonrepeating="2.6 GiB" memory.graph.full="1.6 GiB" memory.graph.partial="1.6 GiB" mar 12 16:09:44 ollama[1942066]: time=2025-03-12T16:09:44.659+01:00 level=INFO source=server.go:185 msg="enabling flash attention" mar 12 16:09:44 ollama[1942066]: time=2025-03-12T16:09:44.659+01:00 level=WARN source=server.go:193 msg="kv cache type not supported by model" type="" mar 12 16:09:44 ollama[1942066]: time=2025-03-12T16:09:44.707+01:00 level=WARN source=ggml.go:149 msg="key not found" key=tokenizer.ggml.pretokenizer default="(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\\r\\n\\p{L}\\p{N}]?\\p{L}+|\\p{N}{1,3}| ?[^\\s\\p{L}\\p{N}]+[\\r\\n]*|\\s*[\\r\\n]+|\\s+(?!\\S)|\\s+" mar 12 16:09:44 ollama[1942066]: time=2025-03-12T16:09:44.708+01:00 level=WARN source=ggml.go:149 msg="key not found" key=tokenizer.ggml.add_eot_token default=false mar 12 16:09:44 ollama[1942066]: time=2025-03-12T16:09:44.710+01:00 level=WARN source=ggml.go:149 msg="key not found" key=tokenizer.ggml.pretokenizer default="(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\\r\\n\\p{L}\\p{N}]?\\p{L}+|\\p{N}{1,3}| ?[^\\s\\p{L}\\p{N}]+[\\r\\n]*|\\s*[\\r\\n]+|\\s+(?!\\S)|\\s+" mar 12 16:09:44 ollama[1942066]: time=2025-03-12T16:09:44.719+01:00 level=WARN source=ggml.go:149 msg="key not found" key=gemma3.attention.layer_norm_rms_epsilon default=9.999999974752427e-07 mar 12 16:09:44 ollama[1942066]: time=2025-03-12T16:09:44.719+01:00 level=WARN source=ggml.go:149 msg="key not found" key=gemma3.rope.local.freq_base default=10000 mar 12 16:09:44 ollama[1942066]: time=2025-03-12T16:09:44.719+01:00 level=WARN source=ggml.go:149 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06 mar 12 16:09:44 ollama[1942066]: time=2025-03-12T16:09:44.719+01:00 level=WARN source=ggml.go:149 msg="key not found" key=gemma3.rope.freq_scale default=1 mar 12 16:09:44 ollama[1942066]: time=2025-03-12T16:09:44.719+01:00 level=WARN source=ggml.go:149 msg="key not found" key=gemma3.final_logit_softcapping default=30 mar 12 16:09:44 ollama[1942066]: time=2025-03-12T16:09:44.719+01:00 level=WARN source=ggml.go:149 msg="key not found" key=gemma3.mm_tokens_per_image default=256 mar 12 16:09:44 ollama[1942066]: time=2025-03-12T16:09:44.719+01:00 level=INFO source=server.go:405 msg="starting llama server" cmd="/usr/local/bin/ollama runner --ollama-engine --model /usr/share/ollama/.ollama/models/blobs/sha256-8bf5daddfa5b7ee1f84fd3d759261439151106d8908ea064c4e4445afc8c8683 --ctx-size 6144 --batch-size 512 --n-gpu-layers 63 --threads 16 --flash-attn --no-mmap --parallel 3 --tensor-split 32,31 --port 40059" mar 12 16:09:44 ollama[1942066]: time=2025-03-12T16:09:44.720+01:00 level=INFO source=sched.go:450 msg="loaded runners" count=1 mar 12 16:09:44 ollama[1942066]: time=2025-03-12T16:09:44.720+01:00 level=INFO source=server.go:585 msg="waiting for llama runner to start responding" mar 12 16:09:44 ollama[1942066]: time=2025-03-12T16:09:44.720+01:00 level=INFO source=server.go:619 msg="waiting for server to become available" status="llm server error" mar 12 16:09:44 ollama[1942066]: time=2025-03-12T16:09:44.725+01:00 level=INFO source=runner.go:882 msg="starting ollama engine" mar 12 16:09:44 ollama[1942066]: time=2025-03-12T16:09:44.726+01:00 level=INFO source=runner.go:938 msg="Server listening on 127.0.0.1:40059" mar 12 16:09:44 ollama[1942066]: time=2025-03-12T16:09:44.776+01:00 level=WARN source=ggml.go:149 msg="key not found" key=general.name default="" mar 12 16:09:44 ollama[1942066]: time=2025-03-12T16:09:44.776+01:00 level=WARN source=ggml.go:149 msg="key not found" key=general.description default="" mar 12 16:09:44 ollama[1942066]: time=2025-03-12T16:09:44.776+01:00 level=INFO source=ggml.go:67 msg="" architecture=gemma3 file_type=F16 name="" description="" num_tensors=1247 num_key_values=36 mar 12 16:09:44 ollama[1942066]: ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no mar 12 16:09:44 ollama[1942066]: ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no mar 12 16:09:44 ollama[1942066]: ggml_cuda_init: found 2 CUDA devices: mar 12 16:09:44 ollama[1942066]: Device 0: NVIDIA RTX 6000 Ada Generation, compute capability 8.9, VMM: yes mar 12 16:09:44 ollama[1942066]: Device 1: NVIDIA RTX 6000 Ada Generation, compute capability 8.9, VMM: yes mar 12 16:09:44 ollama[1942066]: load_backend: loaded CUDA backend from /usr/local/lib/ollama/cuda_v12/libggml-cuda.so mar 12 16:09:44 ollama[1942066]: load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-icelake.so mar 12 16:09:44 ollama[1942066]: time=2025-03-12T16:09:44.885+01:00 level=INFO source=ggml.go:109 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.AVX512=1 CPU.0.AVX512_VBMI=1 CPU.0.AVX512_VNNI=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=500,600,610,700,750,800,860,870,890,900,1200 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 CUDA.1.ARCHS=500,600,610,700,750,800,860,870,890,900,1200 CUDA.1.USE_GRAPHS=1 CUDA.1.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc) mar 12 16:09:44 ollama[1942066]: time=2025-03-12T16:09:44.950+01:00 level=INFO source=ggml.go:289 msg="model weights" buffer=CUDA0 size="24.6 GiB" mar 12 16:09:44 ollama[1942066]: time=2025-03-12T16:09:44.950+01:00 level=INFO source=ggml.go:289 msg="model weights" buffer=CUDA1 size="26.5 GiB" mar 12 16:09:44 ollama[1942066]: time=2025-03-12T16:09:44.950+01:00 level=INFO source=ggml.go:289 msg="model weights" buffer=CPU size="2.6 GiB" mar 12 16:09:45 ollama[1942066]: time=2025-03-12T16:09:45.173+01:00 level=INFO source=server.go:619 msg="waiting for server to become available" status="llm server not responding" mar 12 16:09:46 ollama[1942066]: time=2025-03-12T16:09:46.988+01:00 level=INFO source=server.go:619 msg="waiting for server to become available" status="llm server loading model" mar 12 16:09:47 ollama[1942066]: time=2025-03-12T16:09:47.732+01:00 level=INFO source=server.go:619 msg="waiting for server to become available" status="llm server not responding" mar 12 16:09:48 ollama[1942066]: time=2025-03-12T16:09:48.461+01:00 level=INFO source=server.go:619 msg="waiting for server to become available" status="llm server loading model" mar 12 16:09:52 ollama[1942066]: time=2025-03-12T16:09:52.694+01:00 level=INFO source=server.go:619 msg="waiting for server to become available" status="llm server not responding" mar 12 16:09:53 ollama[1942066]: time=2025-03-12T16:09:53.109+01:00 level=INFO source=server.go:619 msg="waiting for server to become available" status="llm server loading model" mar 12 16:09:53 ollama[1942066]: time=2025-03-12T16:09:53.562+01:00 level=INFO source=server.go:619 msg="waiting for server to become available" status="llm server not responding" mar 12 16:09:53 ollama[1942066]: time=2025-03-12T16:09:53.912+01:00 level=INFO source=server.go:619 msg="waiting for server to become available" status="llm server loading model" mar 12 16:09:58 systemd[1]: ollama.service: A process of this unit has been killed by the OOM killer. mar 12 16:09:58 ollama[1942066]: time=2025-03-12T16:09:58.223+01:00 level=INFO source=server.go:619 msg="waiting for server to become available" status="llm server error" mar 12 16:09:58 ollama[1942066]: time=2025-03-12T16:09:58.473+01:00 level=WARN source=server.go:592 msg="client connection closed before server finished loading, aborting load" mar 12 16:09:58 ollama[1942066]: time=2025-03-12T16:09:58.474+01:00 level=ERROR source=sched.go:456 msg="error loading llama server" error="timed out waiting for llama runner to start: context canceled" mar 12 16:09:58 ollama[1942066]: [GIN] 2025/03/12 - 16:09:58 | 499 | 15.173257953s | 127.0.0.1 | POST "/api/generate" mar 12 16:10:00 systemd[1]: ollama.service: Failed with result 'oom-kill'. mar 12 16:10:00 systemd[1]: ollama.service: Consumed 14min 53.613s CPU time, 49.2G memory peak, 0B memory swap peak.

GiteaMirror commented

2026-05-04 13:41:52 -05:00

@nachocheeseburger commented on GitHub (Mar 12, 2025):

Do you have a log from using ollama run?

yes. i posted both here
https://github.com/ollama/ollama/issues/9685#issuecomment-2718169767

@nachocheeseburger commented on GitHub (Mar 12, 2025): > Do you have a log from using `ollama run`? yes. i posted both here https://github.com/ollama/ollama/issues/9685#issuecomment-2718169767

GiteaMirror commented

2026-05-04 13:41:58 -05:00

@rick-github commented on GitHub (Mar 12, 2025):

mar 12 16:09:58 systemd[1]: ollama.service: A process of this unit has been killed by the OOM killer.

@rick-github commented on GitHub (Mar 12, 2025): ``` mar 12 16:09:58 systemd[1]: ollama.service: A process of this unit has been killed by the OOM killer. ```

GiteaMirror commented

2026-05-04 13:42:00 -05:00

@JulienDeveaux commented on GitHub (Mar 12, 2025):

It's also giving me :

root@2507008298a3:/# ollama run gemma3:27b
Error: llama runner process has terminated: signal: killed

With logs :

time=2025/03/12 15:44:08 routes.go:1225: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:2048 OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
time=2025-03-12T15:44:08.510Z level=INFO source=images.go:432 msg="total blobs: 54"
time=2025-03-12T15:44:08.514Z level=INFO source=images.go:439 msg="total unused blobs removed: 0"
time=2025-03-12T15:44:08.516Z level=INFO source=routes.go:1292 msg="Listening on [::]:11434 (version 0.6.0)"
time=2025-03-12T15:44:08.516Z level=INFO source=gpu.go:217 msg="looking for compatible GPUs"
time=2025-03-12T15:44:08.673Z level=INFO source=types.go:130 msg="inference compute" id=GPU-bbc91b30-a8db-1cc2-17c5-0130f14ffc5b library=cuda variant=v12 compute=8.6 driver=12.4 name="NVIDIA GeForce RTX 3090" total="23.7 GiB" available="23.3 GiB"
time=2025-03-12T15:44:22.609Z level=INFO source=sched.go:715 msg="new model will fit in available VRAM in single GPU, loading" model=/root/.ollama/models/blobs/sha256-afa0ea2ef463c87a1eebb9af070e76a353107493b5d9a62e5e66f65a65409541 gpu=GPU-bbc91b30-a8db-1cc2-17c5-0130f14ffc5b parallel=4 available=25020334080 required="20.6 GiB"
time=2025-03-12T15:44:22.683Z level=INFO source=server.go:105 msg="system memory" total="15.6 GiB" free="14.0 GiB" free_swap="2.7 GiB"
time=2025-03-12T15:44:22.684Z level=INFO source=server.go:138 msg=offload library=cuda layers.requested=-1 layers.model=63 layers.offload=63 layers.split="" memory.available="[23.3 GiB]" memory.gpu_overhead="0 B" memory.required.full="20.6 GiB" memory.required.partial="20.6 GiB" memory.required.kv="3.9 GiB" memory.required.allocations="[20.6 GiB]" memory.weights.total="18.2 GiB" memory.weights.repeating="17.1 GiB" memory.weights.nonrepeating="1.1 GiB" memory.graph.full="565.0 MiB" memory.graph.partial="1.6 GiB"
time=2025-03-12T15:44:22.795Z level=WARN source=ggml.go:149 msg="key not found" key=tokenizer.ggml.pretokenizer default="(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\\r\\n\\p{L}\\p{N}]?\\p{L}+|\\p{N}{1,3}| ?[^\\s\\p{L}\\p{N}]+[\\r\\n]*|\\s*[\\r\\n]+|\\s+(?!\\S)|\\s+"
time=2025-03-12T15:44:22.800Z level=WARN source=ggml.go:149 msg="key not found" key=tokenizer.ggml.add_eot_token default=false
time=2025-03-12T15:44:22.803Z level=WARN source=ggml.go:149 msg="key not found" key=tokenizer.ggml.pretokenizer default="(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\\r\\n\\p{L}\\p{N}]?\\p{L}+|\\p{N}{1,3}| ?[^\\s\\p{L}\\p{N}]+[\\r\\n]*|\\s*[\\r\\n]+|\\s+(?!\\S)|\\s+"
time=2025-03-12T15:44:22.811Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.attention.layer_norm_rms_epsilon default=9.999999974752427e-07
time=2025-03-12T15:44:22.811Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.rope.local.freq_base default=10000
time=2025-03-12T15:44:22.811Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06
time=2025-03-12T15:44:22.811Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.rope.freq_scale default=1
time=2025-03-12T15:44:22.811Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.final_logit_softcapping default=30
time=2025-03-12T15:44:22.811Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.mm_tokens_per_image default=256
time=2025-03-12T15:44:22.811Z level=INFO source=server.go:405 msg="starting llama server" cmd="/usr/bin/ollama runner --ollama-engine --model /root/.ollama/models/blobs/sha256-afa0ea2ef463c87a1eebb9af070e76a353107493b5d9a62e5e66f65a65409541 --ctx-size 8192 --batch-size 512 --n-gpu-layers 63 --threads 4 --no-mmap --parallel 4 --port 37707"
time=2025-03-12T15:44:22.812Z level=INFO source=sched.go:450 msg="loaded runners" count=1
time=2025-03-12T15:44:22.812Z level=INFO source=server.go:585 msg="waiting for llama runner to start responding"
time=2025-03-12T15:44:22.812Z level=INFO source=server.go:619 msg="waiting for server to become available" status="llm server error"
time=2025-03-12T15:44:22.823Z level=INFO source=runner.go:882 msg="starting ollama engine"
time=2025-03-12T15:44:22.824Z level=INFO source=runner.go:938 msg="Server listening on 127.0.0.1:37707"
time=2025-03-12T15:44:22.940Z level=WARN source=ggml.go:149 msg="key not found" key=general.name default=""
time=2025-03-12T15:44:22.940Z level=WARN source=ggml.go:149 msg="key not found" key=general.description default=""
time=2025-03-12T15:44:22.940Z level=INFO source=ggml.go:67 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=1247 num_key_values=36
time=2025-03-12T15:44:23.064Z level=INFO source=server.go:619 msg="waiting for server to become available" status="llm server loading model"
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
load_backend: loaded CUDA backend from /usr/lib/ollama/cuda_v12/libggml-cuda.so
load_backend: loaded CPU backend from /usr/lib/ollama/libggml-cpu-haswell.so
time=2025-03-12T15:44:23.168Z level=INFO source=ggml.go:109 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=500,600,610,700,750,800,860,870,890,900,1200 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc)
time=2025-03-12T15:44:23.244Z level=INFO source=ggml.go:289 msg="model weights" buffer=CUDA0 size="16.2 GiB"
time=2025-03-12T15:44:23.244Z level=INFO source=ggml.go:289 msg="model weights" buffer=CPU size="1.1 GiB"
time=2025-03-12T15:44:35.063Z level=INFO source=server.go:619 msg="waiting for server to become available" status="llm server not responding"
time=2025-03-12T15:44:48.424Z level=INFO source=server.go:619 msg="waiting for server to become available" status="llm server loading model"
time=2025-03-12T15:44:49.527Z level=INFO source=server.go:619 msg="waiting for server to become available" status="llm server not responding"
time=2025-03-12T15:44:49.837Z level=INFO source=server.go:619 msg="waiting for server to become available" status="llm server loading model"
time=2025-03-12T15:44:53.998Z level=INFO source=server.go:619 msg="waiting for server to become available" status="llm server not responding"
time=2025-03-12T15:44:55.197Z level=ERROR source=sched.go:456 msg="error loading llama server" error="llama runner process has terminated: signal: killed"

I'm using the latest docker image

@JulienDeveaux commented on GitHub (Mar 12, 2025): It's also giving me : ``` root@2507008298a3:/# ollama run gemma3:27b Error: llama runner process has terminated: signal: killed ``` With logs : ``` time=2025/03/12 15:44:08 routes.go:1225: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:2048 OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]" time=2025-03-12T15:44:08.510Z level=INFO source=images.go:432 msg="total blobs: 54" time=2025-03-12T15:44:08.514Z level=INFO source=images.go:439 msg="total unused blobs removed: 0" time=2025-03-12T15:44:08.516Z level=INFO source=routes.go:1292 msg="Listening on [::]:11434 (version 0.6.0)" time=2025-03-12T15:44:08.516Z level=INFO source=gpu.go:217 msg="looking for compatible GPUs" time=2025-03-12T15:44:08.673Z level=INFO source=types.go:130 msg="inference compute" id=GPU-bbc91b30-a8db-1cc2-17c5-0130f14ffc5b library=cuda variant=v12 compute=8.6 driver=12.4 name="NVIDIA GeForce RTX 3090" total="23.7 GiB" available="23.3 GiB" time=2025-03-12T15:44:22.609Z level=INFO source=sched.go:715 msg="new model will fit in available VRAM in single GPU, loading" model=/root/.ollama/models/blobs/sha256-afa0ea2ef463c87a1eebb9af070e76a353107493b5d9a62e5e66f65a65409541 gpu=GPU-bbc91b30-a8db-1cc2-17c5-0130f14ffc5b parallel=4 available=25020334080 required="20.6 GiB" time=2025-03-12T15:44:22.683Z level=INFO source=server.go:105 msg="system memory" total="15.6 GiB" free="14.0 GiB" free_swap="2.7 GiB" time=2025-03-12T15:44:22.684Z level=INFO source=server.go:138 msg=offload library=cuda layers.requested=-1 layers.model=63 layers.offload=63 layers.split="" memory.available="[23.3 GiB]" memory.gpu_overhead="0 B" memory.required.full="20.6 GiB" memory.required.partial="20.6 GiB" memory.required.kv="3.9 GiB" memory.required.allocations="[20.6 GiB]" memory.weights.total="18.2 GiB" memory.weights.repeating="17.1 GiB" memory.weights.nonrepeating="1.1 GiB" memory.graph.full="565.0 MiB" memory.graph.partial="1.6 GiB" time=2025-03-12T15:44:22.795Z level=WARN source=ggml.go:149 msg="key not found" key=tokenizer.ggml.pretokenizer default="(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\\r\\n\\p{L}\\p{N}]?\\p{L}+|\\p{N}{1,3}| ?[^\\s\\p{L}\\p{N}]+[\\r\\n]*|\\s*[\\r\\n]+|\\s+(?!\\S)|\\s+" time=2025-03-12T15:44:22.800Z level=WARN source=ggml.go:149 msg="key not found" key=tokenizer.ggml.add_eot_token default=false time=2025-03-12T15:44:22.803Z level=WARN source=ggml.go:149 msg="key not found" key=tokenizer.ggml.pretokenizer default="(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\\r\\n\\p{L}\\p{N}]?\\p{L}+|\\p{N}{1,3}| ?[^\\s\\p{L}\\p{N}]+[\\r\\n]*|\\s*[\\r\\n]+|\\s+(?!\\S)|\\s+" time=2025-03-12T15:44:22.811Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.attention.layer_norm_rms_epsilon default=9.999999974752427e-07 time=2025-03-12T15:44:22.811Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.rope.local.freq_base default=10000 time=2025-03-12T15:44:22.811Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06 time=2025-03-12T15:44:22.811Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.rope.freq_scale default=1 time=2025-03-12T15:44:22.811Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.final_logit_softcapping default=30 time=2025-03-12T15:44:22.811Z level=WARN source=ggml.go:149 msg="key not found" key=gemma3.mm_tokens_per_image default=256 time=2025-03-12T15:44:22.811Z level=INFO source=server.go:405 msg="starting llama server" cmd="/usr/bin/ollama runner --ollama-engine --model /root/.ollama/models/blobs/sha256-afa0ea2ef463c87a1eebb9af070e76a353107493b5d9a62e5e66f65a65409541 --ctx-size 8192 --batch-size 512 --n-gpu-layers 63 --threads 4 --no-mmap --parallel 4 --port 37707" time=2025-03-12T15:44:22.812Z level=INFO source=sched.go:450 msg="loaded runners" count=1 time=2025-03-12T15:44:22.812Z level=INFO source=server.go:585 msg="waiting for llama runner to start responding" time=2025-03-12T15:44:22.812Z level=INFO source=server.go:619 msg="waiting for server to become available" status="llm server error" time=2025-03-12T15:44:22.823Z level=INFO source=runner.go:882 msg="starting ollama engine" time=2025-03-12T15:44:22.824Z level=INFO source=runner.go:938 msg="Server listening on 127.0.0.1:37707" time=2025-03-12T15:44:22.940Z level=WARN source=ggml.go:149 msg="key not found" key=general.name default="" time=2025-03-12T15:44:22.940Z level=WARN source=ggml.go:149 msg="key not found" key=general.description default="" time=2025-03-12T15:44:22.940Z level=INFO source=ggml.go:67 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=1247 num_key_values=36 time=2025-03-12T15:44:23.064Z level=INFO source=server.go:619 msg="waiting for server to become available" status="llm server loading model" ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 CUDA devices: Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes load_backend: loaded CUDA backend from /usr/lib/ollama/cuda_v12/libggml-cuda.so load_backend: loaded CPU backend from /usr/lib/ollama/libggml-cpu-haswell.so time=2025-03-12T15:44:23.168Z level=INFO source=ggml.go:109 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=500,600,610,700,750,800,860,870,890,900,1200 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc) time=2025-03-12T15:44:23.244Z level=INFO source=ggml.go:289 msg="model weights" buffer=CUDA0 size="16.2 GiB" time=2025-03-12T15:44:23.244Z level=INFO source=ggml.go:289 msg="model weights" buffer=CPU size="1.1 GiB" time=2025-03-12T15:44:35.063Z level=INFO source=server.go:619 msg="waiting for server to become available" status="llm server not responding" time=2025-03-12T15:44:48.424Z level=INFO source=server.go:619 msg="waiting for server to become available" status="llm server loading model" time=2025-03-12T15:44:49.527Z level=INFO source=server.go:619 msg="waiting for server to become available" status="llm server not responding" time=2025-03-12T15:44:49.837Z level=INFO source=server.go:619 msg="waiting for server to become available" status="llm server loading model" time=2025-03-12T15:44:53.998Z level=INFO source=server.go:619 msg="waiting for server to become available" status="llm server not responding" time=2025-03-12T15:44:55.197Z level=ERROR source=sched.go:456 msg="error loading llama server" error="llama runner process has terminated: signal: killed" ``` I'm using the latest docker image

GiteaMirror commented

2026-05-04 13:42:03 -05:00

@rick-github commented on GitHub (Mar 12, 2025):

Do you have a log from using ollama run?

yes. i posted both here #9685 (comment)

Two different models, (gemma3:27b for ollama run, gemma3:12b for ollama api) and different context windows require different memory allocations:
run, 27b, context=2048:

time=2025-03-12T14:42:26.764Z level=INFO source=server.go:138 msg=offload library=cuda layers.requested=-1 layers.model=63 layers.offload=49 layers.split="" memory.available="[14.5 GiB]" memory.gpu_overhead="0 B" memory.required.full="18.8 GiB" memory.required.partial="14.3 GiB" memory.required.kv="992.0 MiB" memory.required.allocations="[14.3 GiB]" memory.weights.total="15.3 GiB" memory.weights.repeating="14.2 GiB" memory.weights.nonrepeating="1.1 GiB" memory.graph.full="522.5 MiB" memory.graph.partial="1.6 GiB"

time=2025-03-12T14:42:26.819Z level=INFO source=server.go:405 msg="starting llama server" cmd="/usr/local/bin/ollama runner --ollama-engine --model /home/arch/Desktop/AIstuff/llms/models/ollama/blobs/sha256-afa0ea2ef463c87a1eebb9af070e76a353107493b5d9a62e5e66f65a65409541 --ctx-size 2048 --batch-size 512 --n-gpu-layers 49 --threads 6 --parallel 1 --port 38197"

api, 12b, context=32000:

time=2025-03-12T14:23:45.354Z level=INFO source=server.go:138 msg=offload library=cuda layers.requested=-1 layers.model=49 layers.offload=33 layers.split="" memory.available="[14.5 GiB]" memory.gpu_overhead="0 B" memory.required.full="20.8 GiB" memory.required.partial="14.3 GiB" memory.required.kv="11.7 GiB" memory.required.allocations="[14.3 GiB]" memory.weights.total="17.7 GiB" memory.weights.repeating="17.0 GiB" memory.weights.nonrepeating="787.5 MiB" memory.graph.full="1.1 GiB" memory.graph.partial="1.3 GiB"

time=2025-03-12T14:23:29.294Z level=INFO source=server.go:405 msg="starting llama server" cmd="/usr/local/bin/ollama runner --ollama-engine --model /home/arch/Desktop/AIstuff/llms/models/ollama/blobs/sha256-adca500fad9b54c565ae672184e0c9eb690eb6014ba63f8ec13849d4f73a32d3 --ctx-size 32000 --batch-size 512 --n-gpu-layers 33 --threads 6 --parallel 1 --port 36283"

In both cases the GPU is full, and in the case of 12b, it's predominantly KV cache. It could be that ollama is underestimating the size of the KV cache, and 32000 tokens is actually a bit more than 11.7G. To mitigate this problem you can try reducing the context window, or adding some GPU overhead, see here.

@rick-github commented on GitHub (Mar 12, 2025): > > Do you have a log from using `ollama run`? > > yes. i posted both here [#9685 (comment)](https://github.com/ollama/ollama/issues/9685#issuecomment-2718169767) Two different models, (gemma3:27b for `ollama run`, gemma3:12b for ollama api) and different context windows require different memory allocations: run, 27b, context=2048: ``` time=2025-03-12T14:42:26.764Z level=INFO source=server.go:138 msg=offload library=cuda layers.requested=-1 layers.model=63 layers.offload=49 layers.split="" memory.available="[14.5 GiB]" memory.gpu_overhead="0 B" memory.required.full="18.8 GiB" memory.required.partial="14.3 GiB" memory.required.kv="992.0 MiB" memory.required.allocations="[14.3 GiB]" memory.weights.total="15.3 GiB" memory.weights.repeating="14.2 GiB" memory.weights.nonrepeating="1.1 GiB" memory.graph.full="522.5 MiB" memory.graph.partial="1.6 GiB" time=2025-03-12T14:42:26.819Z level=INFO source=server.go:405 msg="starting llama server" cmd="/usr/local/bin/ollama runner --ollama-engine --model /home/arch/Desktop/AIstuff/llms/models/ollama/blobs/sha256-afa0ea2ef463c87a1eebb9af070e76a353107493b5d9a62e5e66f65a65409541 --ctx-size 2048 --batch-size 512 --n-gpu-layers 49 --threads 6 --parallel 1 --port 38197" ``` api, 12b, context=32000: ``` time=2025-03-12T14:23:45.354Z level=INFO source=server.go:138 msg=offload library=cuda layers.requested=-1 layers.model=49 layers.offload=33 layers.split="" memory.available="[14.5 GiB]" memory.gpu_overhead="0 B" memory.required.full="20.8 GiB" memory.required.partial="14.3 GiB" memory.required.kv="11.7 GiB" memory.required.allocations="[14.3 GiB]" memory.weights.total="17.7 GiB" memory.weights.repeating="17.0 GiB" memory.weights.nonrepeating="787.5 MiB" memory.graph.full="1.1 GiB" memory.graph.partial="1.3 GiB" time=2025-03-12T14:23:29.294Z level=INFO source=server.go:405 msg="starting llama server" cmd="/usr/local/bin/ollama runner --ollama-engine --model /home/arch/Desktop/AIstuff/llms/models/ollama/blobs/sha256-adca500fad9b54c565ae672184e0c9eb690eb6014ba63f8ec13849d4f73a32d3 --ctx-size 32000 --batch-size 512 --n-gpu-layers 33 --threads 6 --parallel 1 --port 36283" ``` In both cases the GPU is full, and in the case of 12b, it's predominantly KV cache. It could be that ollama is underestimating the size of the KV cache, and 32000 tokens is actually a bit more than 11.7G. To mitigate this problem you can try reducing the context window, or adding some GPU overhead, see [here](https://github.com/ollama/ollama/issues/8597#issuecomment-2614533288).

GiteaMirror commented

2026-05-04 13:42:06 -05:00

@nachocheeseburger commented on GitHub (Mar 12, 2025):

Do you have a log from using ollama run?

yes. i posted both here #9685 (comment)

Two different models, (gemma3:27b for ollama run, gemma3:12b for ollama api) and different context windows require different memory allocations: run, 27b, context=2048:

time=2025-03-12T14:42:26.764Z level=INFO source=server.go:138 msg=offload library=cuda layers.requested=-1 layers.model=63 layers.offload=49 layers.split="" memory.available="[14.5 GiB]" memory.gpu_overhead="0 B" memory.required.full="18.8 GiB" memory.required.partial="14.3 GiB" memory.required.kv="992.0 MiB" memory.required.allocations="[14.3 GiB]" memory.weights.total="15.3 GiB" memory.weights.repeating="14.2 GiB" memory.weights.nonrepeating="1.1 GiB" memory.graph.full="522.5 MiB" memory.graph.partial="1.6 GiB"

time=2025-03-12T14:42:26.819Z level=INFO source=server.go:405 msg="starting llama server" cmd="/usr/local/bin/ollama runner --ollama-engine --model /home/arch/Desktop/AIstuff/llms/models/ollama/blobs/sha256-afa0ea2ef463c87a1eebb9af070e76a353107493b5d9a62e5e66f65a65409541 --ctx-size 2048 --batch-size 512 --n-gpu-layers 49 --threads 6 --parallel 1 --port 38197"

api, 12b, context=32000:

time=2025-03-12T14:23:45.354Z level=INFO source=server.go:138 msg=offload library=cuda layers.requested=-1 layers.model=49 layers.offload=33 layers.split="" memory.available="[14.5 GiB]" memory.gpu_overhead="0 B" memory.required.full="20.8 GiB" memory.required.partial="14.3 GiB" memory.required.kv="11.7 GiB" memory.required.allocations="[14.3 GiB]" memory.weights.total="17.7 GiB" memory.weights.repeating="17.0 GiB" memory.weights.nonrepeating="787.5 MiB" memory.graph.full="1.1 GiB" memory.graph.partial="1.3 GiB"

time=2025-03-12T14:23:29.294Z level=INFO source=server.go:405 msg="starting llama server" cmd="/usr/local/bin/ollama runner --ollama-engine --model /home/arch/Desktop/AIstuff/llms/models/ollama/blobs/sha256-adca500fad9b54c565ae672184e0c9eb690eb6014ba63f8ec13849d4f73a32d3 --ctx-size 32000 --batch-size 512 --n-gpu-layers 33 --threads 6 --parallel 1 --port 36283"

In both cases the GPU is full, and in the case of 12b, it's predominantly KV cache. It could be that ollama is underestimating the size of the KV cache, and 32000 tokens is actually a bit more than 11.7G. To mitigate this problem you can try reducing the context window, or adding some GPU overhead, see here.

this solved my issues. thank you

@nachocheeseburger commented on GitHub (Mar 12, 2025): > > > Do you have a log from using `ollama run`? > > > > > > yes. i posted both here [#9685 (comment)](https://github.com/ollama/ollama/issues/9685#issuecomment-2718169767) > > Two different models, (gemma3:27b for `ollama run`, gemma3:12b for ollama api) and different context windows require different memory allocations: run, 27b, context=2048: > > ``` > time=2025-03-12T14:42:26.764Z level=INFO source=server.go:138 msg=offload library=cuda layers.requested=-1 layers.model=63 layers.offload=49 layers.split="" memory.available="[14.5 GiB]" memory.gpu_overhead="0 B" memory.required.full="18.8 GiB" memory.required.partial="14.3 GiB" memory.required.kv="992.0 MiB" memory.required.allocations="[14.3 GiB]" memory.weights.total="15.3 GiB" memory.weights.repeating="14.2 GiB" memory.weights.nonrepeating="1.1 GiB" memory.graph.full="522.5 MiB" memory.graph.partial="1.6 GiB" > > time=2025-03-12T14:42:26.819Z level=INFO source=server.go:405 msg="starting llama server" cmd="/usr/local/bin/ollama runner --ollama-engine --model /home/arch/Desktop/AIstuff/llms/models/ollama/blobs/sha256-afa0ea2ef463c87a1eebb9af070e76a353107493b5d9a62e5e66f65a65409541 --ctx-size 2048 --batch-size 512 --n-gpu-layers 49 --threads 6 --parallel 1 --port 38197" > ``` > > api, 12b, context=32000: > > ``` > time=2025-03-12T14:23:45.354Z level=INFO source=server.go:138 msg=offload library=cuda layers.requested=-1 layers.model=49 layers.offload=33 layers.split="" memory.available="[14.5 GiB]" memory.gpu_overhead="0 B" memory.required.full="20.8 GiB" memory.required.partial="14.3 GiB" memory.required.kv="11.7 GiB" memory.required.allocations="[14.3 GiB]" memory.weights.total="17.7 GiB" memory.weights.repeating="17.0 GiB" memory.weights.nonrepeating="787.5 MiB" memory.graph.full="1.1 GiB" memory.graph.partial="1.3 GiB" > > time=2025-03-12T14:23:29.294Z level=INFO source=server.go:405 msg="starting llama server" cmd="/usr/local/bin/ollama runner --ollama-engine --model /home/arch/Desktop/AIstuff/llms/models/ollama/blobs/sha256-adca500fad9b54c565ae672184e0c9eb690eb6014ba63f8ec13849d4f73a32d3 --ctx-size 32000 --batch-size 512 --n-gpu-layers 33 --threads 6 --parallel 1 --port 36283" > ``` > > In both cases the GPU is full, and in the case of 12b, it's predominantly KV cache. It could be that ollama is underestimating the size of the KV cache, and 32000 tokens is actually a bit more than 11.7G. To mitigate this problem you can try reducing the context window, or adding some GPU overhead, see [here](https://github.com/ollama/ollama/issues/8597#issuecomment-2614533288). this solved my issues. thank you

GiteaMirror commented

2026-05-04 13:42:15 -05:00

@vladkol commented on GitHub (Mar 12, 2025):

OLLAMA_FLASH_ATTENTION=1 helps with 27B model.

@vladkol commented on GitHub (Mar 12, 2025): `OLLAMA_FLASH_ATTENTION=1` helps with 27B model.

GiteaMirror commented

2026-05-04 13:42:21 -05:00

@JulienDeveaux commented on GitHub (Mar 12, 2025):

Looking at dmesg, it is a out of memory (which is weird with 24Gb of VRAM, the model should fit)

[  +0,000003] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=user.slice,mems_allowed=0,global_oom,task_memcg=/system.slice/docker-0aab9aad7947b862405470f498b29ccd75bb6549dfc297403131d969e519e529.scope,task=ollama,pid=69905,uid=0
[  +0,001179] Out of memory: Killed process 69905 (ollama) total-vm:56178848kB, anon-rss:14404524kB, file-rss:93472kB, shmem-rss:8192kB, UID:0 pgtables:37812kB oom_score_adj:0

Where can I set this flag ?
I don't see any docs about flags

@JulienDeveaux commented on GitHub (Mar 12, 2025): Looking at dmesg, it is a out of memory (which is weird with 24Gb of VRAM, the model should fit) ``` [ +0,000003] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=user.slice,mems_allowed=0,global_oom,task_memcg=/system.slice/docker-0aab9aad7947b862405470f498b29ccd75bb6549dfc297403131d969e519e529.scope,task=ollama,pid=69905,uid=0 [ +0,001179] Out of memory: Killed process 69905 (ollama) total-vm:56178848kB, anon-rss:14404524kB, file-rss:93472kB, shmem-rss:8192kB, UID:0 pgtables:37812kB oom_score_adj:0 ``` Where can I set this flag ? I don't see any docs about flags

GiteaMirror commented

2026-05-04 13:42:23 -05:00

@vladkol commented on GitHub (Mar 12, 2025):

@JulienDeveaux https://github.com/ollama/ollama/blob/main/docs/faq.md#how-can-i-enable-flash-attention

@vladkol commented on GitHub (Mar 12, 2025): @JulienDeveaux https://github.com/ollama/ollama/blob/main/docs/faq.md#how-can-i-enable-flash-attention

GiteaMirror commented

2026-05-04 13:42:25 -05:00

@rick-github commented on GitHub (Mar 12, 2025):

I don't think it's a VRAM issue, the RSS for the ollama process reached 14G and your system has (15.6 + 2.7) 18.3 total, so the kernel killed the process. It looks like 0.6.0 + gemma3 is an overenthusiastic memory allocater, likely needs attention from the devs.

@rick-github commented on GitHub (Mar 12, 2025): I don't think it's a VRAM issue, the RSS for the ollama process reached 14G and your system has (15.6 + 2.7) 18.3 total, so the kernel killed the process. It looks like 0.6.0 + gemma3 is an overenthusiastic memory allocater, likely needs attention from the devs.

GiteaMirror commented

2026-05-04 13:42:27 -05:00

@JulienDeveaux commented on GitHub (Mar 12, 2025):

@JulienDeveaux https://github.com/ollama/ollama/blob/main/docs/faq.md#how-can-i-enable-flash-attention

Oooh thank you

I don't think it's a VRAM issue, the RSS for the ollama process reached 14G and your system has (15.6 + 2.7) 18.3 total, so the kernel killed the process. It looks like 0.6.0 + gemma3 is an overenthusiastic memory allocater, likely needs attention from the devs.

Thant you for the insight. I'll keep an eye out for future releases

@JulienDeveaux commented on GitHub (Mar 12, 2025): > [@JulienDeveaux](https://github.com/JulienDeveaux) https://github.com/ollama/ollama/blob/main/docs/faq.md#how-can-i-enable-flash-attention Oooh thank you > I don't think it's a VRAM issue, the RSS for the ollama process reached 14G and your system has (15.6 + 2.7) 18.3 total, so the kernel killed the process. It looks like 0.6.0 + gemma3 is an overenthusiastic memory allocater, likely needs attention from the devs. Thant you for the insight. I'll keep an eye out for future releases

GiteaMirror commented

2026-05-04 13:42:28 -05:00

@ZecaStevenson commented on GitHub (Mar 18, 2025):

I'm using the API: /api/generate

And having the same problem with gemma3:27b:
{"error":"POST predict: Post "http://127.0.0.1:37283/completion": EOF"}

It also happen when using Ollama's command line tool:
Error: POST predict: Post "http://127.0.0.1:32931/completion": EOF

If I try it with another model (even larger ones like qwen2.5:32b and qwq) everything works fine.

@ZecaStevenson commented on GitHub (Mar 18, 2025): I'm using the API: /api/generate And having the same problem with gemma3:27b: {"error":"POST predict: Post \"http://127.0.0.1:37283/completion\": EOF"} It also happen when using Ollama's command line tool: Error: POST predict: Post "http://127.0.0.1:32931/completion": EOF If I try it with another model (even larger ones like qwen2.5:32b and qwq) everything works fine.

Sign in to join this conversation.

Branches Tags

main

dhiltgen/ci

parth-launch-plan-gating

hoyyeva/anthropic-reference-images-path

parth-anthropic-reference-images-path

brucemacd/download-before-remove

hoyyeva/editor-config-repair

parth-mlx-decode-checkpoints

parth-launch-codex-app

hoyyeva/fix-codex-model-metadata-warning

hoyyeva/qwen

parth/hide-claude-desktop-till-release

hoyyeva/opencode-image-modality

parth-add-claude-code-autoinstall

release_v0.22.0

pdevine/manifest-list

codex/fix-codex-model-metadata-warning

pdevine/addressable-manifest

brucemacd/launch-fetch-reccomended

jmorganca/llama-compat

launch-copilot-cli

hoyyeva/opencode-thinking

release_v0.20.7

parth-auto-save-backup

parth-test

jmorganca/gemma4-audio-replacements

fix-manifest-digest-on-pull

hoyyeva/vscode-improve

brucemacd/install-server-wait

parth/update-claude-docs

brucemac/start-ap-install

pdevine/mlx-update

pdevine/qwen35_vision

drifkin/api-show-fallback

mintlify/image-generation-1773352582

hoyyeva/server-context-length-local-config

jmorganca/faster-reptition-penalties

jmorganca/convert-nemotron

parth-pi-thinking

pdevine/sampling-penalties

jmorganca/fix-create-quantization-memory

dongchen/resumable_transfer_fix

pdevine/sampling-cache-error

jessegross/mlx-usage

hoyyeva/openclaw-config

hoyyeva/app-html

pdevine/qwen3next

brucemacd/sign-sh-install

brucemacd/tui-update

brucemacd/usage-api

jmorganca/launch-empty

fix-app-dist-embed

mxyng/mlx-compile

mxyng/mlx-quant

mxyng/mlx-glm4.7

mxyng/mlx

brucemacd/simplify-model-picker

jmorganca/qwen3-concurrent

fix-glm-4.7-flash-mla-config

drifkin/qwen3-coder-opening-tag

brucemacd/usage-cli

fix-cuda12-fattn-shmem

ollama-imagegen-docs

parth/fix-multiline-inputs

brucemacd/config-docs

mxyng/model-files

mxyng/simple-execute

fix-imagegen-ollama-models

mxyng/async-upload

jmorganca/lazy-no-dtype-changes

imagegen-auto-detect-create

parth/decrease-concurrent-download-hf

fix-mlx-quantize-init

jmorganca/x-cleanup

usage

imagegen-readme

jmorganca/glm-image

mlx-gpu-cd

jmorganca/imagegen-modelfile

parth/agent-skills

parth/agent-allowlist

parth/signed-in-offline

parth/agents

parth/fix-context-chopping

improve-cloud-flow

parth/add-models-websearch

parth/prompt-renderer-mcp

jmorganca/native-settings

jmorganca/download-stream-hash

jmorganca/client2-rebased

brucemacd/oai-chat-req-multipart

jessegross/multi_chunk_reserve

grace/additional-omit-empty

grace/mistral-3-large

mxyng/tokenizer2

mxyng/tokenizer

jessegross/flash

hoyyeva/windows-nacked-app

mxyng/cleanup-attention

grace/deepseek-parser

hoyyeva/remember-unsent-prompt

parth/add-lfs-pointer-error-conversion

parth/olmo2-test2

hoyyeva/ollama-launchagent-plist

nicole/olmo-model

parth/olmo-test

mxyng/remove-embedded

parth/render-template

jmorganca/intellect-3

parth/remove-prealloc-linter

jmorganca/cmd-eval

nicole/nomic-embed-text-fix

mxyng/lint-2

hoyyeva/add-gemini-3-pro-preview

hoyyeva/load-model-list

mxyng/expand-path

mxyng/environ-2

hoyyeva/deeplink-json-encoding

parth/improve-tool-calling-tests

hoyyeva/conversation

hoyyeva/assistant-edit-response

hoyyeva/thinking

origin/brucemacd/invalid-char-i-err

parth/improve-tool-calling

jmorganca/required-omitempty

grace/qwen3-vl-tests

mxyng/iter-client

parth/docs-readme

nicole/embed-test

pdevine/integration-benchstat

parth/remove-generate-cmd

parth/add-toolcall-id

mxyng/server-tests

jmorganca/glm-4.6

jmorganca/gin-h-compat

drifkin/stable-tool-args

pdevine/qwen3-more-thinking

parth/add-websearch-client

nicole/websearch_local

jmorganca/qwen3-coder-updates

grace/deepseek-v3-migration-tests

mxyng/fix-create

jmorganca/cloud-errors

pdevine/parser-tidy

revert-12233-parth/simplify-entrypoints-runner

parth/enable-so-gpt-oss

brucemacd/qwen3vl

jmorganca/readme-simplify

parth/gpt-oss-structured-outputs

revert-12039-jmorganca/tools-braces

mxyng/embeddings

mxyng/gguf

mxyng/benchmark

mxyng/types-null

parth/move-parsing

mxyng/gemma2

jmorganca/docs

mxyng/16-bit

mxyng/create-stdin

pdevine/authorizedkeys

mxyng/quant

parth/opt-in-error-context-window

brucemacd/cache-models

brucemacd/runner-completion

jmorganca/llama-update-6

brucemacd/benchmark-list

brucemacd/partial-read-caps

parth/deepseek-r1-tools

mxyng/omit-array

parth/tool-prefix-temp

brucemacd/runner-test

jmorganca/qwen25vl

brucemacd/model-forward-test-ext

parth/python-function-parsing

jmorganca/cuda-compression-none

drifkin/num-parallel

drifkin/chat-truncation-fix

jmorganca/sync

parth/python-tools-calling

drifkin/array-head-count

brucemacd/create-no-loop

parth/server-enable-content-stream-with-tools

qwen25omni

mxyng/v3

brucemacd/ropeconfig

jmorganca/silence-tokenizer

parth/sample-so-test

parth/sampling-structured-outputs

brucemacd/doc-go-engine

parth/constrained-sampling-json

jmorganca/mistral-wip

brucemacd/mistral-small-convert

parth/sample-unmarshal-json-for-params

brucemacd/jomorganca/mistral

pdevine/bfloat16

jmorganca/mistral

brucemacd/mistral

pdevine/logging

parth/sample-correctness-fix

parth/sample-fix-sorting

jmorgan/sample-fix-sorting-extras

jmorganca/temp-0-images

brucemacd/parallel-embed-models

brucemacd/shim-grammar

jmorganca/fix-gguf-error

bmizerany/nameswork

jmorganca/faster-releases

bmizerany/validatenames

brucemacd/err-no-vocab

brucemacd/rope-config

brucemacd/err-hint

brucemacd/qwen2_5

brucemacd/logprobs

brucemacd/new_runner_graph_bench

progress-flicker

brucemacd/forward-test

brucemacd/go_qwen2

pdevine/gemma2

jmorganca/add-missing-symlink-eval

mxyng/next-debug

parth/set-context-size-openai

brucemacd/next-bpe-bench

brucemacd/next-bpe-test

brucemacd/new_runner_e2e

brucemacd/new_runner_qwen2

pdevine/convert-cohere2

brucemacd/convert-cli

parth/log-probs

mxyng/next-mlx

mxyng/cmd-history

parth/templating

parth/tokenize-detokenize

brucemacd/check-key-register

bmizerany/grammar

jmorganca/vendor-081b29bd

mxyng/func-checks

jmorganca/fix-null-format

parth/fix-default-to-warn-json

jmorganca/qwen2vl

jmorganca/no-concat

parth/cmd-cleanup-SO

brucemacd/check-key-register-structured-err

parth/openai-stream-usage

parth/fix-referencing-so

stream-tools-stop

jmorganca/degin-1

brucemacd/install-path-clean

brucemacd/push-name-validation

brucemacd/browser-key-register

jmorganca/openai-fix-first-message

jmorganca/fix-proxy

jessegross/sample

parth/disallow-streaming-tools

dhiltgen/remove_submodule

jmorganca/ga

jmorganca/mllama

pdevine/newlines

pdevine/geems-2b

jmorganca/llama-bump

mxyng/modelname-7

mxyng/gin-slog

mxyng/modelname-6

jyan/convert-prog

jyan/quant5

paligemma-support

pdevine/import-docs

jmorganca/openai-context

jyan/paligemma

jyan/p2

jyan/palitest

bmizerany/embedspeedup

jmorganca/llama-vit

brucemacd/allow-ollama

royh/ep-methods

royh/whisper

mxyng/api-models

mxyng/fix-memory

jyan/q4_4/8

jyan/ollama-v

royh/stream-tools

roy-embed-parallel

bmizerany/hrm

revert-5963-revert-5924-mxyng/llama3.1-rope

royh/embed-viz

jyan/local2

jyan/auth

jyan/local

jyan/parse-temp

jmorganca/template-mistral

jyan/reord-g

royh-openai-suffixdocs

royh-imgembed

royh-embed-parallel

jyan/quant4

royh-precision

jyan/progress

pdevine/fix-template

jyan/quant3

pdevine/ggla

mxyng/update-registry-domain

jmorganca/ggml-static

mxyng/create-context

jyan/v0.146

mxyng/layers-from-files

build_dist

bmizerany/noseek

royh-ls

royh-name

timeout

mxyng/server-timestamp

bmizerany/nosillyggufslurps

royh-params

jmorganca/llama-cpp-7c26775

royh-openai-delete

royh-show-rigid

jmorganca/enable-fa

jmorganca/no-error-template

jyan/format

royh-testdelete

bmizerany/fastverify

language_support

pdevine/ps-glitches

brucemacd/tokenize

bruce/iq-quants

bmizerany/filepathwithcoloninhost

mxyng/split-bin

bmizerany/client-registry

jmorganca/if-none-match

native

jmorganca/native

jmorganca/batch-embeddings

jmorganca/initcmake

jmorganca/mm

pdevine/showggmlinfo

modenameenforcealphanum

bmizerany/modenameenforcealphanum

jmorganca/done-reason

jmorganca/llama-cpp-8960fe8

ollama.com

bmizerany/filepathnobuild

bmizerany/types/model/defaultfix

rmdisplaylong

nogogen

bmizerany/x

modelfile-readme

bmizerany/replacecolon

jmorganca/limit

jmorganca/execstack

jmorganca/replace-assets

mxyng/tune-concurrency

jmorganca/testing

whitespace-detection

jmorganca/options

upgrade-all

scratch

cuda-search

mattw/airenamer

mattw/allmodelsonhuggingface

mattw/quantcontext

mattw/whatneedstorun

brucemacd/llama-mem-calc

mattw/faq-context

mattw/communitylinks

mattw/noprune

mattw/python-functioncalling

rename

mxyng/install

pulse

remove-first

editor

mattw/selfqueryingretrieval

cgo

mattw/howtoquant

api

matt/streamingapi

format-config

mxyng/extra-args

shell

update-nous-hermes

cp-model

upload-progress

fix-unknown-model

fix-model-names

delete-fix

insecure-registry

ls

deletemodels

progressbar

readme-updates

license-layers

skip-list

list-models

modelpath

matt/examplemodelfiles

distribution

go-opts

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/ollama#68378