[GH-ISSUE #11434] crash when running gemma3 model and ask it to calculate the time between two dates. #54061

Closed
opened 2026-04-29 05:10:08 -05:00 by GiteaMirror · 5 comments
Owner

Originally created by @abcbarryn on GitHub (Jul 15, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/11434

What is the issue?

I asked the model to calculate the length of time between July 15th, 2025 and July 25th, 2025 and it start shuttering and then crashed.

Relevant log output

Jul 15 01:24:42 amobile ollama[17467]: time=2025-07-15T01:24:42.026-04:00 level=INFO source=server.go:135 msg="system memory" total="31.2 GiB" free="27.6 GiB" free_swap="29.7 GiB"
Jul 15 01:24:42 amobile ollama[17467]: time=2025-07-15T01:24:42.029-04:00 level=INFO source=server.go:175 msg=offload library=cpu layers.requested=-1 layers.model=35 layers.offload=0 layers.split="" memory.available="[27.6 GiB]" memory.gpu_overhead="0 B" memory.required.full="5.1 GiB" memory.required.partial="0 B" memory.required.kv="450.0 MiB" memory.required.allocations="[5.1 GiB]" memory.weights.total="2.3 GiB" memory.weights.repeating="1.8 GiB" memory.weights.nonrepeating="525.0 MiB" memory.graph.full="517.0 MiB" memory.graph.partial="1.0 GiB" projector.weights="795.9 MiB" projector.graph="1.0 GiB"
Jul 15 01:24:42 amobile ollama[17467]: time=2025-07-15T01:24:42.148-04:00 level=INFO source=server.go:438 msg="starting llama server" cmd="/usr/local/bin/ollama runner --ollama-engine --model /usr/share/ollama/.ollama/models/blobs/sha256-aeda25e63ebd698fab8638ffb778e68bed908b960d39d0becc650fa981609d25 --ctx-size 8192 --batch-size 512 --threads 28 --no-mmap --parallel 2 --port 41521"
Jul 15 01:24:42 amobile ollama[17467]: time=2025-07-15T01:24:42.148-04:00 level=INFO source=sched.go:483 msg="loaded runners" count=1

OS

SuSE Linux

GPU

None

CPU

Two CPUs with 14 cores each.

Ollama version

ollama version is 0.9.6

Originally created by @abcbarryn on GitHub (Jul 15, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/11434 ### What is the issue? I asked the model to calculate the length of time between July 15th, 2025 and July 25th, 2025 and it start shuttering and then crashed. ### Relevant log output ```shell Jul 15 01:24:42 amobile ollama[17467]: time=2025-07-15T01:24:42.026-04:00 level=INFO source=server.go:135 msg="system memory" total="31.2 GiB" free="27.6 GiB" free_swap="29.7 GiB" Jul 15 01:24:42 amobile ollama[17467]: time=2025-07-15T01:24:42.029-04:00 level=INFO source=server.go:175 msg=offload library=cpu layers.requested=-1 layers.model=35 layers.offload=0 layers.split="" memory.available="[27.6 GiB]" memory.gpu_overhead="0 B" memory.required.full="5.1 GiB" memory.required.partial="0 B" memory.required.kv="450.0 MiB" memory.required.allocations="[5.1 GiB]" memory.weights.total="2.3 GiB" memory.weights.repeating="1.8 GiB" memory.weights.nonrepeating="525.0 MiB" memory.graph.full="517.0 MiB" memory.graph.partial="1.0 GiB" projector.weights="795.9 MiB" projector.graph="1.0 GiB" Jul 15 01:24:42 amobile ollama[17467]: time=2025-07-15T01:24:42.148-04:00 level=INFO source=server.go:438 msg="starting llama server" cmd="/usr/local/bin/ollama runner --ollama-engine --model /usr/share/ollama/.ollama/models/blobs/sha256-aeda25e63ebd698fab8638ffb778e68bed908b960d39d0becc650fa981609d25 --ctx-size 8192 --batch-size 512 --threads 28 --no-mmap --parallel 2 --port 41521" Jul 15 01:24:42 amobile ollama[17467]: time=2025-07-15T01:24:42.148-04:00 level=INFO source=sched.go:483 msg="loaded runners" count=1 ``` ### OS SuSE Linux ### GPU None ### CPU Two CPUs with 14 cores each. ### Ollama version ollama version is 0.9.6
GiteaMirror added the bug label 2026-04-29 05:10:08 -05:00
Author
Owner

@rick-github commented on GitHub (Jul 15, 2025):

The log doesn't show a crash.

<!-- gh-comment-id:3075656059 --> @rick-github commented on GitHub (Jul 15, 2025): The log doesn't show a crash.
Author
Owner

@abcbarryn commented on GitHub (Jul 15, 2025):

Maybe the model crashed? Something crashed. The terminal was stuck in a shuttering output loop until I pressed CTRL-C.

<!-- gh-comment-id:3075898228 --> @abcbarryn commented on GitHub (Jul 15, 2025): Maybe the model crashed? Something crashed. The terminal was stuck in a shuttering output loop until I pressed CTRL-C.
Author
Owner

@rick-github commented on GitHub (Jul 15, 2025):

Perhaps if you add more than 4 lines of log.

<!-- gh-comment-id:3075933811 --> @rick-github commented on GitHub (Jul 15, 2025): Perhaps if you add more than 4 lines of log.
Author
Owner

@abcbarryn commented on GitHub (Jul 15, 2025):

Jul 15 01:06:05 amobile ollama[17467]: [GIN] 2025/07/15 - 01:06:05 | 200 |       43.89µs |       127.0.0.1 | HEAD     "/"
Jul 15 01:06:05 amobile ollama[17467]: [GIN] 2025/07/15 - 01:06:05 | 200 |      34.607µs |       127.0.0.1 | GET      "/api/ps"
Jul 15 01:06:15 amobile ollama[17467]: [GIN] 2025/07/15 - 01:06:15 | 200 |      34.303µs |       127.0.0.1 | HEAD     "/"
Jul 15 01:06:15 amobile ollama[17467]: [GIN] 2025/07/15 - 01:06:15 | 200 |  199.449466ms |       127.0.0.1 | POST     "/api/show"
Jul 15 01:06:16 amobile ollama[17467]: time=2025-07-15T01:06:16.135-04:00 level=INFO source=server.go:135 msg="system memory" total="31.2 GiB" free="27.7 GiB" free_swap="29.7 GiB"
Jul 15 01:06:16 amobile ollama[17467]: time=2025-07-15T01:06:16.138-04:00 level=INFO source=server.go:175 msg=offload library=cpu layers.requested=-1 layers.model=35 layers.offload=0 layers.split="" memory.available="[27.7 GiB]" memory.gpu_overhead="0 B" memory.required.full="5.1 GiB" memory.required.partial="0 B" memory.required.kv="450.0 MiB" memory.required.allocations="[5.1 GiB]" memory.weights.total="2.3 GiB" memory.weights.repeating="1.8 GiB" memory.weights.nonrepeating="525.0 MiB" memory.graph.full="517.0 MiB" memory.graph.partial="1.0 GiB" projector.weights="795.9 MiB" projector.graph="1.0 GiB"
Jul 15 01:06:16 amobile ollama[17467]: time=2025-07-15T01:06:16.236-04:00 level=INFO source=server.go:438 msg="starting llama server" cmd="/usr/local/bin/ollama runner --ollama-engine --model /usr/share/ollama/.ollama/models/blobs/sha256-aeda25e63ebd698fab8638ffb778e68bed908b960d39d0becc650fa981609d25 --ctx-size 8192 --batch-size 512 --threads 28 --no-mmap --parallel 2 --port 33357"
Jul 15 01:06:16 amobile ollama[17467]: time=2025-07-15T01:06:16.237-04:00 level=INFO source=sched.go:483 msg="loaded runners" count=1
Jul 15 01:06:16 amobile ollama[17467]: time=2025-07-15T01:06:16.237-04:00 level=INFO source=server.go:598 msg="waiting for llama runner to start responding"
Jul 15 01:06:16 amobile ollama[17467]: time=2025-07-15T01:06:16.237-04:00 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server not responding"
Jul 15 01:06:16 amobile ollama[17467]: time=2025-07-15T01:06:16.253-04:00 level=INFO source=runner.go:925 msg="starting ollama engine"
Jul 15 01:06:16 amobile ollama[17467]: time=2025-07-15T01:06:16.254-04:00 level=INFO source=runner.go:983 msg="Server listening on 127.0.0.1:33357"
Jul 15 01:06:16 amobile ollama[17467]: time=2025-07-15T01:06:16.344-04:00 level=INFO source=ggml.go:92 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=883 num_key_values=36
Jul 15 01:06:16 amobile ollama[17467]: load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-haswell.so
Jul 15 01:06:16 amobile ollama[17467]: time=2025-07-15T01:06:16.351-04:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc)
Jul 15 01:06:16 amobile ollama[17467]: time=2025-07-15T01:06:16.357-04:00 level=INFO source=ggml.go:359 msg="offloading 0 repeating layers to GPU"
Jul 15 01:06:16 amobile ollama[17467]: time=2025-07-15T01:06:16.357-04:00 level=INFO source=ggml.go:363 msg="offloading output layer to CPU"
Jul 15 01:06:16 amobile ollama[17467]: time=2025-07-15T01:06:16.357-04:00 level=INFO source=ggml.go:375 msg="offloaded 0/35 layers to GPU"
Jul 15 01:06:16 amobile ollama[17467]: time=2025-07-15T01:06:16.357-04:00 level=INFO source=ggml.go:377 msg="model weights" buffer=CPU size="3.6 GiB"
Jul 15 01:06:16 amobile ollama[17467]: time=2025-07-15T01:06:16.489-04:00 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server loading model"
Jul 15 01:06:16 amobile ollama[17467]: time=2025-07-15T01:06:16.666-04:00 level=INFO source=ggml.go:666 msg="compute graph" backend=CPU buffer_type=CPU size="1.1 GiB"
Jul 15 01:06:16 amobile ollama[17467]: time=2025-07-15T01:06:16.962-04:00 level=INFO source=ggml.go:666 msg="compute graph" backend=CPU buffer_type=CPU size="1.1 GiB"
Jul 15 01:06:17 amobile ollama[17467]: time=2025-07-15T01:06:17.762-04:00 level=INFO source=server.go:637 msg="llama runner started in 1.53 seconds"
Jul 15 01:06:17 amobile ollama[17467]: [GIN] 2025/07/15 - 01:06:17 | 200 |   1.92506674s |       127.0.0.1 | POST     "/api/generate"
Jul 15 01:09:41 amobile ollama[17467]: [GIN] 2025/07/15 - 01:09:41 | 200 |         2m42s |       127.0.0.1 | POST     "/api/chat"
Jul 15 01:12:48 amobile ollama[17467]: [GIN] 2025/07/15 - 01:12:48 | 200 |          2m0s |       127.0.0.1 | POST     "/api/chat"
Jul 15 01:19:34 amobile ollama[17467]: [GIN] 2025/07/15 - 01:19:34 | 200 |         2m42s |       127.0.0.1 | POST     "/api/chat"
Jul 15 01:24:42 amobile ollama[17467]: time=2025-07-15T01:24:42.026-04:00 level=INFO source=server.go:135 msg="system memory" total="31.2 GiB" free="27.6 GiB" free_swap="29.7 GiB"
Jul 15 01:24:42 amobile ollama[17467]: time=2025-07-15T01:24:42.029-04:00 level=INFO source=server.go:175 msg=offload library=cpu layers.requested=-1 layers.model=35 layers.offload=0 layers.split="" memory.available="[27.6 GiB]" memory.gpu_overhead="0 B" memory.required.full="5.1 GiB" memory.required.partial="0 B" memory.required.kv="450.0 MiB" memory.required.allocations="[5.1 GiB]" memory.weights.total="2.3 GiB" memory.weights.repeating="1.8 GiB" memory.weights.nonrepeating="525.0 MiB" memory.graph.full="517.0 MiB" memory.graph.partial="1.0 GiB" projector.weights="795.9 MiB" projector.graph="1.0 GiB"
Jul 15 01:24:42 amobile ollama[17467]: time=2025-07-15T01:24:42.148-04:00 level=INFO source=server.go:438 msg="starting llama server" cmd="/usr/local/bin/ollama runner --ollama-engine --model /usr/share/ollama/.ollama/models/blobs/sha256-aeda25e63ebd698fab8638ffb778e68bed908b960d39d0becc650fa981609d25 --ctx-size 8192 --batch-size 512 --threads 28 --no-mmap --parallel 2 --port 41521"
Jul 15 01:24:42 amobile ollama[17467]: time=2025-07-15T01:24:42.148-04:00 level=INFO source=sched.go:483 msg="loaded runners" count=1
Jul 15 01:24:42 amobile ollama[17467]: time=2025-07-15T01:24:42.148-04:00 level=INFO source=server.go:598 msg="waiting for llama runner to start responding"
Jul 15 01:24:42 amobile ollama[17467]: time=2025-07-15T01:24:42.149-04:00 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server not responding"
Jul 15 01:24:42 amobile ollama[17467]: time=2025-07-15T01:24:42.168-04:00 level=INFO source=runner.go:925 msg="starting ollama engine"
Jul 15 01:24:42 amobile ollama[17467]: time=2025-07-15T01:24:42.168-04:00 level=INFO source=runner.go:983 msg="Server listening on 127.0.0.1:41521"
Jul 15 01:24:42 amobile ollama[17467]: time=2025-07-15T01:24:42.273-04:00 level=INFO source=ggml.go:92 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=883 num_key_values=36
Jul 15 01:24:42 amobile ollama[17467]: load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-haswell.so
Jul 15 01:24:42 amobile ollama[17467]: time=2025-07-15T01:24:42.280-04:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc)
Jul 15 01:24:42 amobile ollama[17467]: time=2025-07-15T01:24:42.285-04:00 level=INFO source=ggml.go:359 msg="offloading 0 repeating layers to GPU"
Jul 15 01:24:42 amobile ollama[17467]: time=2025-07-15T01:24:42.286-04:00 level=INFO source=ggml.go:363 msg="offloading output layer to CPU"
Jul 15 01:24:42 amobile ollama[17467]: time=2025-07-15T01:24:42.286-04:00 level=INFO source=ggml.go:375 msg="offloaded 0/35 layers to GPU"
Jul 15 01:24:42 amobile ollama[17467]: time=2025-07-15T01:24:42.286-04:00 level=INFO source=ggml.go:377 msg="model weights" buffer=CPU size="3.6 GiB"
Jul 15 01:24:42 amobile ollama[17467]: time=2025-07-15T01:24:42.401-04:00 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server loading model"
Jul 15 01:24:42 amobile ollama[17467]: time=2025-07-15T01:24:42.570-04:00 level=INFO source=ggml.go:666 msg="compute graph" backend=CPU buffer_type=CPU size="1.1 GiB"
Jul 15 01:24:42 amobile ollama[17467]: time=2025-07-15T01:24:42.747-04:00 level=INFO source=ggml.go:666 msg="compute graph" backend=CPU buffer_type=CPU size="1.1 GiB"
Jul 15 01:24:43 amobile ollama[17467]: time=2025-07-15T01:24:43.911-04:00 level=INFO source=server.go:637 msg="llama runner started in 1.76 seconds"
Jul 15 01:27:35 amobile ollama[17467]: [GIN] 2025/07/15 - 01:27:35 | 200 |         2m53s |       127.0.0.1 | POST     "/api/chat"
Jul 15 01:33:05 amobile ollama[17467]: [GIN] 2025/07/15 - 01:33:05 | 200 |  37.98540726s |       127.0.0.1 | POST     "/api/chat"
Jul 15 01:40:16 amobile ollama[17467]: time=2025-07-15T01:40:16.619-04:00 level=INFO source=server.go:135 msg="system memory" total="31.2 GiB" free="27.6 GiB" free_swap="29.7 GiB"
Jul 15 01:40:16 amobile ollama[17467]: time=2025-07-15T01:40:16.621-04:00 level=INFO source=server.go:175 msg=offload library=cpu layers.requested=-1 layers.model=35 layers.offload=0 layers.split="" memory.available="[27.6 GiB]" memory.gpu_overhead="0 B" memory.required.full="5.1 GiB" memory.required.partial="0 B" memory.required.kv="450.0 MiB" memory.required.allocations="[5.1 GiB]" memory.weights.total="2.3 GiB" memory.weights.repeating="1.8 GiB" memory.weights.nonrepeating="525.0 MiB" memory.graph.full="517.0 MiB" memory.graph.partial="1.0 GiB" projector.weights="795.9 MiB" projector.graph="1.0 GiB"
Jul 15 01:40:16 amobile ollama[17467]: time=2025-07-15T01:40:16.716-04:00 level=INFO source=server.go:438 msg="starting llama server" cmd="/usr/local/bin/ollama runner --ollama-engine --model /usr/share/ollama/.ollama/models/blobs/sha256-aeda25e63ebd698fab8638ffb778e68bed908b960d39d0becc650fa981609d25 --ctx-size 8192 --batch-size 512 --threads 28 --no-mmap --parallel 2 --port 41437"
Jul 15 01:40:16 amobile ollama[17467]: time=2025-07-15T01:40:16.716-04:00 level=INFO source=sched.go:483 msg="loaded runners" count=1
Jul 15 01:40:16 amobile ollama[17467]: time=2025-07-15T01:40:16.716-04:00 level=INFO source=server.go:598 msg="waiting for llama runner to start responding"
Jul 15 01:40:16 amobile ollama[17467]: time=2025-07-15T01:40:16.717-04:00 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server not responding"
Jul 15 01:40:16 amobile ollama[17467]: time=2025-07-15T01:40:16.734-04:00 level=INFO source=runner.go:925 msg="starting ollama engine"
Jul 15 01:40:16 amobile ollama[17467]: time=2025-07-15T01:40:16.736-04:00 level=INFO source=runner.go:983 msg="Server listening on 127.0.0.1:41437"
Jul 15 01:40:16 amobile ollama[17467]: time=2025-07-15T01:40:16.827-04:00 level=INFO source=ggml.go:92 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=883 num_key_values=36
Jul 15 01:40:16 amobile ollama[17467]: load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-haswell.so
Jul 15 01:40:16 amobile ollama[17467]: time=2025-07-15T01:40:16.834-04:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc)
Jul 15 01:40:16 amobile ollama[17467]: time=2025-07-15T01:40:16.841-04:00 level=INFO source=ggml.go:359 msg="offloading 0 repeating layers to GPU"
Jul 15 01:40:16 amobile ollama[17467]: time=2025-07-15T01:40:16.841-04:00 level=INFO source=ggml.go:363 msg="offloading output layer to CPU"
Jul 15 01:40:16 amobile ollama[17467]: time=2025-07-15T01:40:16.841-04:00 level=INFO source=ggml.go:375 msg="offloaded 0/35 layers to GPU"
Jul 15 01:40:16 amobile ollama[17467]: time=2025-07-15T01:40:16.841-04:00 level=INFO source=ggml.go:377 msg="model weights" buffer=CPU size="3.6 GiB"
Jul 15 01:40:16 amobile ollama[17467]: time=2025-07-15T01:40:16.968-04:00 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server loading model"
Jul 15 01:40:17 amobile ollama[17467]: time=2025-07-15T01:40:17.113-04:00 level=INFO source=ggml.go:666 msg="compute graph" backend=CPU buffer_type=CPU size="1.1 GiB"
Jul 15 01:40:17 amobile ollama[17467]: time=2025-07-15T01:40:17.288-04:00 level=INFO source=ggml.go:666 msg="compute graph" backend=CPU buffer_type=CPU size="1.1 GiB"
Jul 15 01:40:18 amobile ollama[17467]: time=2025-07-15T01:40:18.226-04:00 level=INFO source=server.go:637 msg="llama runner started in 1.51 seconds"
Jul 15 01:41:38 amobile ollama[17467]: [GIN] 2025/07/15 - 01:41:38 | 200 |         1m22s |       127.0.0.1 | POST     "/api/chat"
Jul 15 01:44:40 amobile ollama[17467]: [GIN] 2025/07/15 - 01:44:40 | 200 |          1m3s |       127.0.0.1 | POST     "/api/chat"
Jul 15 01:49:32 amobile ollama[17467]: [GIN] 2025/07/15 - 01:49:32 | 200 |   26.3552658s |       127.0.0.1 | POST     "/api/chat"
Jul 15 02:00:49 amobile ollama[17467]: [GIN] 2025/07/15 - 02:00:49 | 200 |      39.415µs |       127.0.0.1 | HEAD     "/"
Jul 15 02:00:49 amobile ollama[17467]: [GIN] 2025/07/15 - 02:00:49 | 200 |  182.084631ms |       127.0.0.1 | POST     "/api/show"
Jul 15 02:00:50 amobile ollama[17467]: time=2025-07-15T02:00:50.063-04:00 level=INFO source=server.go:135 msg="system memory" total="31.2 GiB" free="27.6 GiB" free_swap="29.7 GiB"
Jul 15 02:00:50 amobile ollama[17467]: time=2025-07-15T02:00:50.065-04:00 level=INFO source=server.go:175 msg=offload library=cpu layers.requested=-1 layers.model=35 layers.offload=0 layers.split="" memory.available="[27.6 GiB]" memory.gpu_overhead="0 B" memory.required.full="5.1 GiB" memory.required.partial="0 B" memory.required.kv="450.0 MiB" memory.required.allocations="[5.1 GiB]" memory.weights.total="2.3 GiB" memory.weights.repeating="1.8 GiB" memory.weights.nonrepeating="525.0 MiB" memory.graph.full="517.0 MiB" memory.graph.partial="1.0 GiB" projector.weights="795.9 MiB" projector.graph="1.0 GiB"
Jul 15 02:00:50 amobile ollama[17467]: time=2025-07-15T02:00:50.166-04:00 level=INFO source=server.go:438 msg="starting llama server" cmd="/usr/local/bin/ollama runner --ollama-engine --model /usr/share/ollama/.ollama/models/blobs/sha256-aeda25e63ebd698fab8638ffb778e68bed908b960d39d0becc650fa981609d25 --ctx-size 8192 --batch-size 512 --threads 28 --no-mmap --parallel 2 --port 45897"
Jul 15 02:00:50 amobile ollama[17467]: time=2025-07-15T02:00:50.167-04:00 level=INFO source=sched.go:483 msg="loaded runners" count=1
Jul 15 02:00:50 amobile ollama[17467]: time=2025-07-15T02:00:50.167-04:00 level=INFO source=server.go:598 msg="waiting for llama runner to start responding"
Jul 15 02:00:50 amobile ollama[17467]: time=2025-07-15T02:00:50.167-04:00 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server not responding"
Jul 15 02:00:50 amobile ollama[17467]: time=2025-07-15T02:00:50.185-04:00 level=INFO source=runner.go:925 msg="starting ollama engine"
Jul 15 02:00:50 amobile ollama[17467]: time=2025-07-15T02:00:50.185-04:00 level=INFO source=runner.go:983 msg="Server listening on 127.0.0.1:45897"
Jul 15 02:00:50 amobile ollama[17467]: time=2025-07-15T02:00:50.274-04:00 level=INFO source=ggml.go:92 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=883 num_key_values=36
Jul 15 02:00:50 amobile ollama[17467]: load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-haswell.so
Jul 15 02:00:50 amobile ollama[17467]: time=2025-07-15T02:00:50.281-04:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc)
Jul 15 02:00:50 amobile ollama[17467]: time=2025-07-15T02:00:50.286-04:00 level=INFO source=ggml.go:359 msg="offloading 0 repeating layers to GPU"
Jul 15 02:00:50 amobile ollama[17467]: time=2025-07-15T02:00:50.287-04:00 level=INFO source=ggml.go:363 msg="offloading output layer to CPU"
Jul 15 02:00:50 amobile ollama[17467]: time=2025-07-15T02:00:50.287-04:00 level=INFO source=ggml.go:375 msg="offloaded 0/35 layers to GPU"
Jul 15 02:00:50 amobile ollama[17467]: time=2025-07-15T02:00:50.287-04:00 level=INFO source=ggml.go:377 msg="model weights" buffer=CPU size="3.6 GiB"
Jul 15 02:00:50 amobile ollama[17467]: time=2025-07-15T02:00:50.419-04:00 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server loading model"
Jul 15 02:00:50 amobile ollama[17467]: time=2025-07-15T02:00:50.575-04:00 level=INFO source=ggml.go:666 msg="compute graph" backend=CPU buffer_type=CPU size="1.1 GiB"
Jul 15 02:00:50 amobile ollama[17467]: time=2025-07-15T02:00:50.848-04:00 level=INFO source=ggml.go:666 msg="compute graph" backend=CPU buffer_type=CPU size="1.1 GiB"
Jul 15 02:00:51 amobile ollama[17467]: time=2025-07-15T02:00:51.684-04:00 level=INFO source=server.go:637 msg="llama runner started in 1.52 seconds"
Jul 15 02:00:51 amobile ollama[17467]: [GIN] 2025/07/15 - 02:00:51 | 200 |  1.910645146s |       127.0.0.1 | POST     "/api/generate"
Jul 15 02:04:07 amobile ollama[17467]: [GIN] 2025/07/15 - 02:04:07 | 200 |         2m49s |       127.0.0.1 | POST     "/api/chat"
Jul 15 02:06:47 amobile ollama[17467]: [GIN] 2025/07/15 - 02:06:47 | 200 |         2m12s |       127.0.0.1 | POST     "/api/chat"
Jul 15 02:10:37 amobile ollama[17467]: [GIN] 2025/07/15 - 02:10:37 | 200 |         2m24s |       127.0.0.1 | POST     "/api/chat"
Jul 15 02:13:36 amobile ollama[17467]: [GIN] 2025/07/15 - 02:13:36 | 200 |         1m45s |       127.0.0.1 | POST     "/api/chat"
Jul 15 02:18:37 amobile ollama[17467]: [GIN] 2025/07/15 - 02:18:37 | 200 |         1m29s |       127.0.0.1 | POST     "/api/chat"
Jul 15 02:20:22 amobile ollama[17467]: [GIN] 2025/07/15 - 02:20:22 | 200 | 24.101449501s |       127.0.0.1 | POST     "/api/chat"
Jul 15 02:22:28 amobile ollama[17467]: [GIN] 2025/07/15 - 02:22:28 | 200 |         1m16s |       127.0.0.1 | POST     "/api/chat"
Jul 15 02:24:37 amobile ollama[17467]: [GIN] 2025/07/15 - 02:24:37 | 200 |         1m15s |       127.0.0.1 | POST     "/api/chat"
Jul 15 02:26:57 amobile ollama[17467]: [GIN] 2025/07/15 - 02:26:57 | 200 |         1m14s |       127.0.0.1 | POST     "/api/chat"
Jul 15 02:31:07 amobile ollama[17467]: [GIN] 2025/07/15 - 02:31:07 | 200 |         2m13s |       127.0.0.1 | POST     "/api/chat"
Jul 15 02:32:33 amobile ollama[17467]: [GIN] 2025/07/15 - 02:32:33 | 200 | 11.564672162s |       127.0.0.1 | POST     "/api/chat"
Jul 15 02:36:32 amobile ollama[17467]: [GIN] 2025/07/15 - 02:36:32 | 200 |         2m21s |       127.0.0.1 | POST     "/api/chat"
Jul 15 02:41:53 amobile ollama[17467]: [GIN] 2025/07/15 - 02:41:53 | 200 | 46.782257017s |       127.0.0.1 | POST     "/api/chat"
Jul 15 02:43:59 amobile ollama[17467]: [GIN] 2025/07/15 - 02:43:59 | 200 |          1m3s |       127.0.0.1 | POST     "/api/chat"
Jul 15 10:58:17 amobile ollama[17467]: time=2025-07-15T10:58:17.463-04:00 level=INFO source=server.go:135 msg="system memory" total="31.2 GiB" free="27.2 GiB" free_swap="30.0 GiB"
Jul 15 10:58:17 amobile ollama[17467]: time=2025-07-15T10:58:17.465-04:00 level=INFO source=server.go:175 msg=offload library=cpu layers.requested=-1 layers.model=35 layers.offload=0 layers.split="" memory.available="[27.2 GiB]" memory.gpu_overhead="0 B" memory.required.full="5.1 GiB" memory.required.partial="0 B" memory.required.kv="450.0 MiB" memory.required.allocations="[5.1 GiB]" memory.weights.total="2.3 GiB" memory.weights.repeating="1.8 GiB" memory.weights.nonrepeating="525.0 MiB" memory.graph.full="517.0 MiB" memory.graph.partial="1.0 GiB" projector.weights="795.9 MiB" projector.graph="1.0 GiB"
Jul 15 10:58:17 amobile ollama[17467]: time=2025-07-15T10:58:17.561-04:00 level=INFO source=server.go:438 msg="starting llama server" cmd="/usr/local/bin/ollama runner --ollama-engine --model /usr/share/ollama/.ollama/models/blobs/sha256-aeda25e63ebd698fab8638ffb778e68bed908b960d39d0becc650fa981609d25 --ctx-size 8192 --batch-size 512 --threads 28 --no-mmap --parallel 2 --port 41967"
Jul 15 10:58:17 amobile ollama[17467]: time=2025-07-15T10:58:17.562-04:00 level=INFO source=sched.go:483 msg="loaded runners" count=1
Jul 15 10:58:17 amobile ollama[17467]: time=2025-07-15T10:58:17.562-04:00 level=INFO source=server.go:598 msg="waiting for llama runner to start responding"
Jul 15 10:58:17 amobile ollama[17467]: time=2025-07-15T10:58:17.562-04:00 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server not responding"
Jul 15 10:58:17 amobile ollama[17467]: time=2025-07-15T10:58:17.577-04:00 level=INFO source=runner.go:925 msg="starting ollama engine"
Jul 15 10:58:17 amobile ollama[17467]: time=2025-07-15T10:58:17.577-04:00 level=INFO source=runner.go:983 msg="Server listening on 127.0.0.1:41967"
Jul 15 10:58:17 amobile ollama[17467]: time=2025-07-15T10:58:17.664-04:00 level=INFO source=ggml.go:92 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=883 num_key_values=36
Jul 15 10:58:17 amobile ollama[17467]: load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-haswell.so
Jul 15 10:58:17 amobile ollama[17467]: time=2025-07-15T10:58:17.671-04:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc)
Jul 15 10:58:17 amobile ollama[17467]: time=2025-07-15T10:58:17.677-04:00 level=INFO source=ggml.go:359 msg="offloading 0 repeating layers to GPU"
Jul 15 10:58:17 amobile ollama[17467]: time=2025-07-15T10:58:17.677-04:00 level=INFO source=ggml.go:363 msg="offloading output layer to CPU"
Jul 15 10:58:17 amobile ollama[17467]: time=2025-07-15T10:58:17.677-04:00 level=INFO source=ggml.go:375 msg="offloaded 0/35 layers to GPU"
Jul 15 10:58:17 amobile ollama[17467]: time=2025-07-15T10:58:17.677-04:00 level=INFO source=ggml.go:377 msg="model weights" buffer=CPU size="3.6 GiB"
Jul 15 10:58:17 amobile ollama[17467]: time=2025-07-15T10:58:17.814-04:00 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server loading model"
Jul 15 10:58:17 amobile ollama[17467]: time=2025-07-15T10:58:17.966-04:00 level=INFO source=ggml.go:666 msg="compute graph" backend=CPU buffer_type=CPU size="1.1 GiB"
Jul 15 10:58:18 amobile ollama[17467]: time=2025-07-15T10:58:18.140-04:00 level=INFO source=ggml.go:666 msg="compute graph" backend=CPU buffer_type=CPU size="1.1 GiB"
Jul 15 10:58:19 amobile ollama[17467]: time=2025-07-15T10:58:19.581-04:00 level=INFO source=server.go:637 msg="llama runner started in 2.02 seconds"
Jul 15 10:58:23 amobile ollama[17467]: [GIN] 2025/07/15 - 10:58:23 | 200 |  6.619394623s |       127.0.0.1 | POST     "/api/chat"
Jul 15 10:58:51 amobile ollama[17467]: [GIN] 2025/07/15 - 10:58:51 | 200 |  2.678771024s |       127.0.0.1 | POST     "/api/chat"
Jul 15 10:59:28 amobile ollama[17467]: [GIN] 2025/07/15 - 10:59:28 | 200 | 16.793880245s |       127.0.0.1 | POST     "/api/chat"
Jul 15 11:01:09 amobile ollama[17467]: [GIN] 2025/07/15 - 11:01:09 | 200 |  5.583076216s |       127.0.0.1 | POST     "/api/chat"
Jul 15 12:24:46 amobile ollama[17467]: [GIN] 2025/07/15 - 12:24:46 | 200 |      56.614µs |       127.0.0.1 | HEAD     "/"
Jul 15 12:24:46 amobile ollama[17467]: [GIN] 2025/07/15 - 12:24:46 | 200 |   187.47452ms |       127.0.0.1 | POST     "/api/show"
Jul 15 12:24:47 amobile ollama[17467]: time=2025-07-15T12:24:47.225-04:00 level=INFO source=server.go:135 msg="system memory" total="31.2 GiB" free="27.2 GiB" free_swap="30.0 GiB"
Jul 15 12:24:47 amobile ollama[17467]: time=2025-07-15T12:24:47.227-04:00 level=INFO source=server.go:175 msg=offload library=cpu layers.requested=-1 layers.model=35 layers.offload=0 layers.split="" memory.available="[27.2 GiB]" memory.gpu_overhead="0 B" memory.required.full="5.1 GiB" memory.required.partial="0 B" memory.required.kv="450.0 MiB" memory.required.allocations="[5.1 GiB]" memory.weights.total="2.3 GiB" memory.weights.repeating="1.8 GiB" memory.weights.nonrepeating="525.0 MiB" memory.graph.full="517.0 MiB" memory.graph.partial="1.0 GiB" projector.weights="795.9 MiB" projector.graph="1.0 GiB"
Jul 15 12:24:47 amobile ollama[17467]: time=2025-07-15T12:24:47.325-04:00 level=INFO source=server.go:438 msg="starting llama server" cmd="/usr/local/bin/ollama runner --ollama-engine --model /usr/share/ollama/.ollama/models/blobs/sha256-aeda25e63ebd698fab8638ffb778e68bed908b960d39d0becc650fa981609d25 --ctx-size 8192 --batch-size 512 --threads 28 --no-mmap --parallel 2 --port 38447"
Jul 15 12:24:47 amobile ollama[17467]: time=2025-07-15T12:24:47.325-04:00 level=INFO source=sched.go:483 msg="loaded runners" count=1
Jul 15 12:24:47 amobile ollama[17467]: time=2025-07-15T12:24:47.325-04:00 level=INFO source=server.go:598 msg="waiting for llama runner to start responding"
Jul 15 12:24:47 amobile ollama[17467]: time=2025-07-15T12:24:47.325-04:00 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server not responding"
Jul 15 12:24:47 amobile ollama[17467]: time=2025-07-15T12:24:47.340-04:00 level=INFO source=runner.go:925 msg="starting ollama engine"
Jul 15 12:24:47 amobile ollama[17467]: time=2025-07-15T12:24:47.341-04:00 level=INFO source=runner.go:983 msg="Server listening on 127.0.0.1:38447"
Jul 15 12:24:47 amobile ollama[17467]: time=2025-07-15T12:24:47.430-04:00 level=INFO source=ggml.go:92 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=883 num_key_values=36
Jul 15 12:24:47 amobile ollama[17467]: load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-haswell.so
Jul 15 12:24:47 amobile ollama[17467]: time=2025-07-15T12:24:47.437-04:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc)
Jul 15 12:24:47 amobile ollama[17467]: time=2025-07-15T12:24:47.443-04:00 level=INFO source=ggml.go:359 msg="offloading 0 repeating layers to GPU"
Jul 15 12:24:47 amobile ollama[17467]: time=2025-07-15T12:24:47.443-04:00 level=INFO source=ggml.go:363 msg="offloading output layer to CPU"
Jul 15 12:24:47 amobile ollama[17467]: time=2025-07-15T12:24:47.443-04:00 level=INFO source=ggml.go:375 msg="offloaded 0/35 layers to GPU"
Jul 15 12:24:47 amobile ollama[17467]: time=2025-07-15T12:24:47.443-04:00 level=INFO source=ggml.go:377 msg="model weights" buffer=CPU size="3.6 GiB"
Jul 15 12:24:47 amobile ollama[17467]: time=2025-07-15T12:24:47.576-04:00 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server loading model"
Jul 15 12:24:47 amobile ollama[17467]: time=2025-07-15T12:24:47.743-04:00 level=INFO source=ggml.go:666 msg="compute graph" backend=CPU buffer_type=CPU size="1.1 GiB"
Jul 15 12:24:48 amobile ollama[17467]: time=2025-07-15T12:24:48.011-04:00 level=INFO source=ggml.go:666 msg="compute graph" backend=CPU buffer_type=CPU size="1.1 GiB"
Jul 15 12:24:49 amobile ollama[17467]: time=2025-07-15T12:24:49.594-04:00 level=INFO source=server.go:637 msg="llama runner started in 2.27 seconds"
Jul 15 12:24:49 amobile ollama[17467]: [GIN] 2025/07/15 - 12:24:49 | 200 |  2.646873422s |       127.0.0.1 | POST     "/api/generate"
Jul 15 12:28:09 amobile ollama[17467]: [GIN] 2025/07/15 - 12:28:09 | 200 |         2m41s |       127.0.0.1 | POST     "/api/chat"
Jul 15 12:30:20 amobile ollama[17467]: [GIN] 2025/07/15 - 12:30:20 | 200 | 46.245741541s |       127.0.0.1 | POST     "/api/chat"
Jul 15 12:31:18 amobile ollama[17467]: [GIN] 2025/07/15 - 12:31:18 | 200 | 40.434622179s |       127.0.0.1 | POST     "/api/chat"
Jul 15 12:32:04 amobile ollama[17467]: [GIN] 2025/07/15 - 12:32:04 | 200 | 11.571502803s |       127.0.0.1 | POST     "/api/chat"
Jul 15 14:05:59 amobile ollama[17467]: time=2025-07-15T14:05:59.853-04:00 level=INFO source=server.go:135 msg="system memory" total="31.2 GiB" free="27.3 GiB" free_swap="30.1 GiB"
Jul 15 14:05:59 amobile ollama[17467]: time=2025-07-15T14:05:59.855-04:00 level=INFO source=server.go:175 msg=offload library=cpu layers.requested=-1 layers.model=35 layers.offload=0 layers.split="" memory.available="[27.3 GiB]" memory.gpu_overhead="0 B" memory.required.full="5.1 GiB" memory.required.partial="0 B" memory.required.kv="450.0 MiB" memory.required.allocations="[5.1 GiB]" memory.weights.total="2.3 GiB" memory.weights.repeating="1.8 GiB" memory.weights.nonrepeating="525.0 MiB" memory.graph.full="517.0 MiB" memory.graph.partial="1.0 GiB" projector.weights="795.9 MiB" projector.graph="1.0 GiB"
Jul 15 14:05:59 amobile ollama[17467]: time=2025-07-15T14:05:59.955-04:00 level=INFO source=server.go:438 msg="starting llama server" cmd="/usr/local/bin/ollama runner --ollama-engine --model /usr/share/ollama/.ollama/models/blobs/sha256-aeda25e63ebd698fab8638ffb778e68bed908b960d39d0becc650fa981609d25 --ctx-size 8192 --batch-size 512 --threads 28 --no-mmap --parallel 2 --port 43873"
Jul 15 14:05:59 amobile ollama[17467]: time=2025-07-15T14:05:59.955-04:00 level=INFO source=sched.go:483 msg="loaded runners" count=1
Jul 15 14:05:59 amobile ollama[17467]: time=2025-07-15T14:05:59.955-04:00 level=INFO source=server.go:598 msg="waiting for llama runner to start responding"
Jul 15 14:05:59 amobile ollama[17467]: time=2025-07-15T14:05:59.955-04:00 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server not responding"
Jul 15 14:05:59 amobile ollama[17467]: time=2025-07-15T14:05:59.975-04:00 level=INFO source=runner.go:925 msg="starting ollama engine"
Jul 15 14:05:59 amobile ollama[17467]: time=2025-07-15T14:05:59.976-04:00 level=INFO source=runner.go:983 msg="Server listening on 127.0.0.1:43873"
Jul 15 14:06:00 amobile ollama[17467]: time=2025-07-15T14:06:00.077-04:00 level=INFO source=ggml.go:92 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=883 num_key_values=36
Jul 15 14:06:00 amobile ollama[17467]: load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-haswell.so
Jul 15 14:06:00 amobile ollama[17467]: time=2025-07-15T14:06:00.084-04:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc)
Jul 15 14:06:00 amobile ollama[17467]: time=2025-07-15T14:06:00.090-04:00 level=INFO source=ggml.go:359 msg="offloading 0 repeating layers to GPU"
Jul 15 14:06:00 amobile ollama[17467]: time=2025-07-15T14:06:00.090-04:00 level=INFO source=ggml.go:363 msg="offloading output layer to CPU"
Jul 15 14:06:00 amobile ollama[17467]: time=2025-07-15T14:06:00.090-04:00 level=INFO source=ggml.go:375 msg="offloaded 0/35 layers to GPU"
Jul 15 14:06:00 amobile ollama[17467]: time=2025-07-15T14:06:00.090-04:00 level=INFO source=ggml.go:377 msg="model weights" buffer=CPU size="3.6 GiB"
Jul 15 14:06:00 amobile ollama[17467]: time=2025-07-15T14:06:00.207-04:00 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server loading model"
Jul 15 14:06:00 amobile ollama[17467]: time=2025-07-15T14:06:00.381-04:00 level=INFO source=ggml.go:666 msg="compute graph" backend=CPU buffer_type=CPU size="1.1 GiB"
Jul 15 14:06:00 amobile ollama[17467]: time=2025-07-15T14:06:00.559-04:00 level=INFO source=ggml.go:666 msg="compute graph" backend=CPU buffer_type=CPU size="1.1 GiB"
Jul 15 14:06:01 amobile ollama[17467]: time=2025-07-15T14:06:01.977-04:00 level=INFO source=server.go:637 msg="llama runner started in 2.02 seconds"
Jul 15 14:06:14 amobile ollama[17467]: [GIN] 2025/07/15 - 14:06:14 | 200 | 15.243695865s |       127.0.0.1 | POST     "/api/chat"
Jul 15 14:08:49 amobile ollama[17467]: [GIN] 2025/07/15 - 14:08:49 | 200 |   7.81688474s |       127.0.0.1 | POST     "/api/chat"
Jul 15 14:10:32 amobile ollama[17467]: [GIN] 2025/07/15 - 14:10:32 | 200 |   5.57302473s |       127.0.0.1 | POST     "/api/chat"
Jul 15 14:11:52 amobile ollama[17467]: [GIN] 2025/07/15 - 14:11:52 | 200 |  4.563861397s |       127.0.0.1 | POST     "/api/chat"
Jul 15 14:12:21 amobile ollama[17467]: [GIN] 2025/07/15 - 14:12:21 | 200 |  7.445482222s |       127.0.0.1 | POST     "/api/chat"
Jul 15 14:13:40 amobile ollama[17467]: [GIN] 2025/07/15 - 14:13:40 | 200 |  8.141572187s |       127.0.0.1 | POST     "/api/chat"
Jul 15 14:15:38 amobile ollama[17467]: [GIN] 2025/07/15 - 14:15:38 | 200 | 11.704692526s |       127.0.0.1 | POST     "/api/chat"
Jul 15 14:17:23 amobile ollama[17467]: [GIN] 2025/07/15 - 14:17:23 | 200 |  8.167364087s |       127.0.0.1 | POST     "/api/chat"
Jul 15 14:19:24 amobile ollama[17467]: [GIN] 2025/07/15 - 14:19:24 | 200 |  8.725596273s |       127.0.0.1 | POST     "/api/chat"
Jul 15 14:20:54 amobile ollama[17467]: [GIN] 2025/07/15 - 14:20:54 | 200 |  7.287166608s |       127.0.0.1 | POST     "/api/chat"
Jul 15 14:23:09 amobile ollama[17467]: [GIN] 2025/07/15 - 14:23:09 | 200 |  7.923928608s |       127.0.0.1 | POST     "/api/chat"
Jul 15 14:28:00 amobile ollama[17467]: [GIN] 2025/07/15 - 14:28:00 | 200 |  9.638210214s |       127.0.0.1 | POST     "/api/chat"
Jul 15 14:30:20 amobile ollama[17467]: [GIN] 2025/07/15 - 14:30:20 | 200 | 11.398374606s |       127.0.0.1 | POST     "/api/chat"
Jul 15 14:32:11 amobile ollama[17467]: [GIN] 2025/07/15 - 14:32:11 | 200 |  6.043563297s |       127.0.0.1 | POST     "/api/chat"
Jul 15 14:35:04 amobile ollama[17467]: [GIN] 2025/07/15 - 14:35:04 | 200 | 10.856669259s |       127.0.0.1 | POST     "/api/chat"
Jul 15 14:38:09 amobile ollama[17467]: [GIN] 2025/07/15 - 14:38:09 | 200 |  10.15686096s |       127.0.0.1 | POST     "/api/chat"
Jul 15 14:39:15 amobile ollama[17467]: [GIN] 2025/07/15 - 14:39:15 | 200 |  6.940100831s |       127.0.0.1 | POST     "/api/chat"
Jul 15 14:41:27 amobile ollama[17467]: [GIN] 2025/07/15 - 14:41:27 | 200 | 12.226914926s |       127.0.0.1 | POST     "/api/chat"
Jul 15 14:44:28 amobile ollama[17467]: [GIN] 2025/07/15 - 14:44:28 | 200 | 23.563901162s |       127.0.0.1 | POST     "/api/chat"
Jul 15 14:48:26 amobile ollama[17467]: [GIN] 2025/07/15 - 14:48:26 | 200 | 33.467552018s |       127.0.0.1 | POST     "/api/chat"
Jul 15 14:50:38 amobile ollama[17467]: [GIN] 2025/07/15 - 14:50:38 | 200 | 32.005724467s |       127.0.0.1 | POST     "/api/chat"
Jul 15 14:52:51 amobile ollama[17467]: [GIN] 2025/07/15 - 14:52:51 | 200 | 34.606237099s |       127.0.0.1 | POST     "/api/chat"
Jul 15 14:55:03 amobile ollama[17467]: [GIN] 2025/07/15 - 14:55:03 | 200 | 39.147461074s |       127.0.0.1 | POST     "/api/chat"
Jul 15 14:57:11 amobile ollama[17467]: [GIN] 2025/07/15 - 14:57:11 | 200 | 37.597258006s |       127.0.0.1 | POST     "/api/chat"
Jul 15 14:57:45 amobile ollama[17467]: [GIN] 2025/07/15 - 14:57:45 | 200 |      60.643µs |       127.0.0.1 | HEAD     "/"
Jul 15 14:57:45 amobile ollama[17467]: [GIN] 2025/07/15 - 14:57:45 | 200 |      75.829µs |       127.0.0.1 | GET      "/api/ps"
Jul 15 14:57:49 amobile ollama[17467]: [GIN] 2025/07/15 - 14:57:49 | 200 |  5.326923899s |       127.0.0.1 | POST     "/api/chat"
Jul 15 14:58:40 amobile ollama[17467]: [GIN] 2025/07/15 - 14:58:40 | 200 |      39.146µs |       127.0.0.1 | HEAD     "/"
Jul 15 14:58:40 amobile ollama[17467]: [GIN] 2025/07/15 - 14:58:40 | 200 |       37.34µs |       127.0.0.1 | GET      "/api/ps"
Jul 15 15:31:24 amobile ollama[17467]: [GIN] 2025/07/15 - 15:31:24 | 200 |      39.551µs |       127.0.0.1 | HEAD     "/"
Jul 15 15:31:24 amobile ollama[17467]: [GIN] 2025/07/15 - 15:31:24 | 200 |  193.354766ms |       127.0.0.1 | POST     "/api/show"
Jul 15 15:31:25 amobile ollama[17467]: time=2025-07-15T15:31:25.125-04:00 level=INFO source=server.go:135 msg="system memory" total="31.2 GiB" free="27.2 GiB" free_swap="30.1 GiB"
Jul 15 15:31:25 amobile ollama[17467]: time=2025-07-15T15:31:25.127-04:00 level=INFO source=server.go:175 msg=offload library=cpu layers.requested=-1 layers.model=35 layers.offload=0 layers.split="" memory.available="[27.2 GiB]" memory.gpu_overhead="0 B" memory.required.full="5.1 GiB" memory.required.partial="0 B" memory.required.kv="450.0 MiB" memory.required.allocations="[5.1 GiB]" memory.weights.total="2.3 GiB" memory.weights.repeating="1.8 GiB" memory.weights.nonrepeating="525.0 MiB" memory.graph.full="517.0 MiB" memory.graph.partial="1.0 GiB" projector.weights="795.9 MiB" projector.graph="1.0 GiB"
Jul 15 15:31:25 amobile ollama[17467]: time=2025-07-15T15:31:25.223-04:00 level=INFO source=server.go:438 msg="starting llama server" cmd="/usr/local/bin/ollama runner --ollama-engine --model /usr/share/ollama/.ollama/models/blobs/sha256-aeda25e63ebd698fab8638ffb778e68bed908b960d39d0becc650fa981609d25 --ctx-size 8192 --batch-size 512 --threads 28 --no-mmap --parallel 2 --port 35539"
Jul 15 15:31:25 amobile ollama[17467]: time=2025-07-15T15:31:25.223-04:00 level=INFO source=sched.go:483 msg="loaded runners" count=1
Jul 15 15:31:25 amobile ollama[17467]: time=2025-07-15T15:31:25.223-04:00 level=INFO source=server.go:598 msg="waiting for llama runner to start responding"
Jul 15 15:31:25 amobile ollama[17467]: time=2025-07-15T15:31:25.223-04:00 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server not responding"
Jul 15 15:31:25 amobile ollama[17467]: time=2025-07-15T15:31:25.237-04:00 level=INFO source=runner.go:925 msg="starting ollama engine"
Jul 15 15:31:25 amobile ollama[17467]: time=2025-07-15T15:31:25.237-04:00 level=INFO source=runner.go:983 msg="Server listening on 127.0.0.1:35539"
Jul 15 15:31:25 amobile ollama[17467]: time=2025-07-15T15:31:25.331-04:00 level=INFO source=ggml.go:92 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=883 num_key_values=36
Jul 15 15:31:25 amobile ollama[17467]: load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-haswell.so
Jul 15 15:31:25 amobile ollama[17467]: time=2025-07-15T15:31:25.337-04:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc)
Jul 15 15:31:25 amobile ollama[17467]: time=2025-07-15T15:31:25.343-04:00 level=INFO source=ggml.go:359 msg="offloading 0 repeating layers to GPU"
Jul 15 15:31:25 amobile ollama[17467]: time=2025-07-15T15:31:25.343-04:00 level=INFO source=ggml.go:363 msg="offloading output layer to CPU"
Jul 15 15:31:25 amobile ollama[17467]: time=2025-07-15T15:31:25.343-04:00 level=INFO source=ggml.go:375 msg="offloaded 0/35 layers to GPU"
Jul 15 15:31:25 amobile ollama[17467]: time=2025-07-15T15:31:25.343-04:00 level=INFO source=ggml.go:377 msg="model weights" buffer=CPU size="3.6 GiB"
Jul 15 15:31:25 amobile ollama[17467]: time=2025-07-15T15:31:25.475-04:00 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server loading model"
Jul 15 15:31:25 amobile ollama[17467]: time=2025-07-15T15:31:25.638-04:00 level=INFO source=ggml.go:666 msg="compute graph" backend=CPU buffer_type=CPU size="1.1 GiB"
Jul 15 15:31:25 amobile ollama[17467]: time=2025-07-15T15:31:25.913-04:00 level=INFO source=ggml.go:666 msg="compute graph" backend=CPU buffer_type=CPU size="1.1 GiB"
Jul 15 15:31:27 amobile ollama[17467]: time=2025-07-15T15:31:27.248-04:00 level=INFO source=server.go:637 msg="llama runner started in 2.02 seconds"
Jul 15 15:31:27 amobile ollama[17467]: [GIN] 2025/07/15 - 15:31:27 | 200 |  2.458372036s |       127.0.0.1 | POST     "/api/generate"
Jul 15 15:32:06 amobile ollama[17467]: [GIN] 2025/07/15 - 15:32:06 | 200 | 12.149772723s |       127.0.0.1 | POST     "/api/chat"
Jul 15 15:32:50 amobile ollama[17467]: [GIN] 2025/07/15 - 15:32:50 | 200 | 19.018623689s |       127.0.0.1 | POST     "/api/chat"
Jul 15 15:33:40 amobile ollama[17467]: [GIN] 2025/07/15 - 15:33:40 | 200 |  19.08334206s |       127.0.0.1 | POST     "/api/chat"
Jul 15 15:34:59 amobile ollama[17467]: [GIN] 2025/07/15 - 15:34:59 | 200 | 26.413488556s |       127.0.0.1 | POST     "/api/chat"
Jul 15 15:37:22 amobile ollama[17467]: [GIN] 2025/07/15 - 15:37:22 | 200 | 27.871657486s |       127.0.0.1 | POST     "/api/chat"
Jul 15 15:38:35 amobile ollama[17467]: [GIN] 2025/07/15 - 15:38:35 | 200 | 27.072325445s |       127.0.0.1 | POST     "/api/chat"
Jul 15 15:40:25 amobile ollama[17467]: [GIN] 2025/07/15 - 15:40:25 | 200 | 38.748835117s |       127.0.0.1 | POST     "/api/chat"
Jul 15 15:42:57 amobile ollama[17467]: [GIN] 2025/07/15 - 15:42:57 | 200 |  3.887759995s |       127.0.0.1 | POST     "/api/chat"
Jul 15 15:47:15 amobile ollama[17467]: [GIN] 2025/07/15 - 15:47:15 | 200 | 32.823251665s |       127.0.0.1 | POST     "/api/chat"
Jul 15 15:47:48 amobile ollama[17467]: [GIN] 2025/07/15 - 15:47:48 | 200 |  8.797014247s |       127.0.0.1 | POST     "/api/chat"
Jul 15 15:48:16 amobile ollama[17467]: [GIN] 2025/07/15 - 15:48:16 | 200 | 23.792104523s |       127.0.0.1 | POST     "/api/chat"
Jul 15 15:48:38 amobile ollama[17467]: [GIN] 2025/07/15 - 15:48:38 | 200 |   9.21259348s |       127.0.0.1 | POST     "/api/chat"
Jul 15 15:49:31 amobile ollama[17467]: [GIN] 2025/07/15 - 15:49:31 | 200 | 21.710448987s |       127.0.0.1 | POST     "/api/chat"
Jul 15 15:49:35 amobile ollama[17467]: [GIN] 2025/07/15 - 15:49:35 | 200 | 32.825344864s |       127.0.0.1 | POST     "/api/chat"
Jul 15 15:49:40 amobile ollama[17467]: [GIN] 2025/07/15 - 15:49:40 | 200 |  7.626245474s |       127.0.0.1 | POST     "/api/chat"
Jul 15 15:50:47 amobile ollama[17467]: [GIN] 2025/07/15 - 15:50:47 | 200 |  6.319395124s |       127.0.0.1 | POST     "/api/chat"
Jul 15 15:50:55 amobile ollama[17467]: [GIN] 2025/07/15 - 15:50:55 | 200 | 16.482319877s |       127.0.0.1 | POST     "/api/chat"
Jul 15 16:00:37 amobile ollama[17467]: time=2025-07-15T16:00:37.011-04:00 level=INFO source=server.go:135 msg="system memory" total="31.2 GiB" free="27.1 GiB" free_swap="30.1 GiB"
Jul 15 16:00:37 amobile ollama[17467]: time=2025-07-15T16:00:37.013-04:00 level=INFO source=server.go:175 msg=offload library=cpu layers.requested=-1 layers.model=35 layers.offload=0 layers.split="" memory.available="[27.1 GiB]" memory.gpu_overhead="0 B" memory.required.full="5.1 GiB" memory.required.partial="0 B" memory.required.kv="450.0 MiB" memory.required.allocations="[5.1 GiB]" memory.weights.total="2.3 GiB" memory.weights.repeating="1.8 GiB" memory.weights.nonrepeating="525.0 MiB" memory.graph.full="517.0 MiB" memory.graph.partial="1.0 GiB" projector.weights="795.9 MiB" projector.graph="1.0 GiB"
Jul 15 16:00:37 amobile ollama[17467]: time=2025-07-15T16:00:37.108-04:00 level=INFO source=server.go:438 msg="starting llama server" cmd="/usr/local/bin/ollama runner --ollama-engine --model /usr/share/ollama/.ollama/models/blobs/sha256-aeda25e63ebd698fab8638ffb778e68bed908b960d39d0becc650fa981609d25 --ctx-size 8192 --batch-size 512 --threads 28 --no-mmap --parallel 2 --port 40683"
Jul 15 16:00:37 amobile ollama[17467]: time=2025-07-15T16:00:37.108-04:00 level=INFO source=sched.go:483 msg="loaded runners" count=1
Jul 15 16:00:37 amobile ollama[17467]: time=2025-07-15T16:00:37.109-04:00 level=INFO source=server.go:598 msg="waiting for llama runner to start responding"
Jul 15 16:00:37 amobile ollama[17467]: time=2025-07-15T16:00:37.109-04:00 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server not responding"
Jul 15 16:00:37 amobile ollama[17467]: time=2025-07-15T16:00:37.136-04:00 level=INFO source=runner.go:925 msg="starting ollama engine"
Jul 15 16:00:37 amobile ollama[17467]: time=2025-07-15T16:00:37.136-04:00 level=INFO source=runner.go:983 msg="Server listening on 127.0.0.1:40683"
Jul 15 16:00:37 amobile ollama[17467]: time=2025-07-15T16:00:37.233-04:00 level=INFO source=ggml.go:92 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=883 num_key_values=36
Jul 15 16:00:37 amobile ollama[17467]: load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-haswell.so
Jul 15 16:00:37 amobile ollama[17467]: time=2025-07-15T16:00:37.240-04:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc)
Jul 15 16:00:37 amobile ollama[17467]: time=2025-07-15T16:00:37.245-04:00 level=INFO source=ggml.go:359 msg="offloading 0 repeating layers to GPU"
Jul 15 16:00:37 amobile ollama[17467]: time=2025-07-15T16:00:37.245-04:00 level=INFO source=ggml.go:363 msg="offloading output layer to CPU"
Jul 15 16:00:37 amobile ollama[17467]: time=2025-07-15T16:00:37.245-04:00 level=INFO source=ggml.go:375 msg="offloaded 0/35 layers to GPU"
Jul 15 16:00:37 amobile ollama[17467]: time=2025-07-15T16:00:37.245-04:00 level=INFO source=ggml.go:377 msg="model weights" buffer=CPU size="3.6 GiB"
Jul 15 16:00:37 amobile ollama[17467]: time=2025-07-15T16:00:37.361-04:00 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server loading model"
Jul 15 16:00:37 amobile ollama[17467]: time=2025-07-15T16:00:37.537-04:00 level=INFO source=ggml.go:666 msg="compute graph" backend=CPU buffer_type=CPU size="1.1 GiB"
Jul 15 16:00:37 amobile ollama[17467]: time=2025-07-15T16:00:37.714-04:00 level=INFO source=ggml.go:666 msg="compute graph" backend=CPU buffer_type=CPU size="1.1 GiB"
Jul 15 16:00:39 amobile ollama[17467]: time=2025-07-15T16:00:39.125-04:00 level=INFO source=server.go:637 msg="llama runner started in 2.02 seconds"
Jul 15 16:00:50 amobile ollama[17467]: [GIN] 2025/07/15 - 16:00:50 | 200 | 13.453382374s |       127.0.0.1 | POST     "/api/chat"
Jul 15 16:01:37 amobile ollama[17467]: [GIN] 2025/07/15 - 16:01:37 | 200 |  7.010397526s |       127.0.0.1 | POST     "/api/chat"
Jul 15 16:02:47 amobile ollama[17467]: [GIN] 2025/07/15 - 16:02:47 | 200 |  9.218088928s |       127.0.0.1 | POST     "/api/chat"
Jul 15 16:03:40 amobile ollama[17467]: [GIN] 2025/07/15 - 16:03:40 | 200 |  8.419104993s |       127.0.0.1 | POST     "/api/chat"
Jul 15 16:06:18 amobile ollama[17467]: [GIN] 2025/07/15 - 16:06:18 | 200 | 44.640608373s |       127.0.0.1 | POST     "/api/chat"
Jul 15 16:06:50 amobile ollama[17467]: [GIN] 2025/07/15 - 16:06:50 | 200 |  17.14751161s |       127.0.0.1 | POST     "/api/chat"
Jul 15 16:07:39 amobile ollama[17467]: [GIN] 2025/07/15 - 16:07:39 | 200 | 38.689371786s |       127.0.0.1 | POST     "/api/chat"
Jul 15 16:08:12 amobile ollama[17467]: [GIN] 2025/07/15 - 16:08:12 | 200 | 32.899070555s |       127.0.0.1 | POST     "/api/chat"
Jul 15 16:08:34 amobile ollama[17467]: time=2025-07-15T16:08:34.345-04:00 level=WARN source=runner.go:157 msg="truncating input prompt" limit=4096 prompt=19010 keep=4 new=4096
Jul 15 16:09:46 amobile ollama[17467]: [GIN] 2025/07/15 - 16:09:46 | 200 | 25.107300894s |       127.0.0.1 | POST     "/api/chat"
Jul 15 16:12:18 amobile ollama[17467]: [GIN] 2025/07/15 - 16:12:18 | 200 |         3m43s |       127.0.0.1 | POST     "/api/chat"
Jul 15 16:12:19 amobile ollama[17467]: [GIN] 2025/07/15 - 16:12:19 | 200 |         1m36s |       127.0.0.1 | POST     "/api/chat"
Jul 15 16:15:05 amobile ollama[17467]: [GIN] 2025/07/15 - 16:15:05 | 200 |         1m53s |       127.0.0.1 | POST     "/api/chat"
Jul 15 16:15:38 amobile ollama[17467]: [GIN] 2025/07/15 - 16:15:38 | 200 |         1m14s |       127.0.0.1 | POST     "/api/chat"
Jul 15 16:15:48 amobile ollama[17467]: [GIN] 2025/07/15 - 16:15:48 | 200 | 33.070491476s |       127.0.0.1 | POST     "/api/chat"
Jul 15 16:17:31 amobile ollama[17467]: [GIN] 2025/07/15 - 16:17:31 | 200 |          1m0s |       127.0.0.1 | POST     "/api/chat"
Jul 15 16:21:06 amobile ollama[17467]: [GIN] 2025/07/15 - 16:21:06 | 200 | 34.303595005s |       127.0.0.1 | POST     "/api/chat"
Jul 15 16:21:32 amobile ollama[17467]: panic: failed to decode batch: could not find a kv cache slot (cache: 2560 batch: 512)
Jul 15 16:21:32 amobile ollama[17467]: goroutine 8 [running]:
Jul 15 16:21:32 amobile ollama[17467]: github.com/ollama/ollama/runner/ollamarunner.(*Server).run(0xc0002b6900, {0x55867a095700, 0xc0000008c0})
Jul 15 16:21:32 amobile ollama[17467]:         github.com/ollama/ollama/runner/ollamarunner/runner.go:364 +0x65
Jul 15 16:21:32 amobile ollama[17467]: created by github.com/ollama/ollama/runner/ollamarunner.Execute in goroutine 1
Jul 15 16:21:32 amobile ollama[17467]:         github.com/ollama/ollama/runner/ollamarunner/runner.go:960 +0xa74
Jul 15 16:21:32 amobile ollama[17467]: time=2025-07-15T16:21:32.337-04:00 level=ERROR source=server.go:807 msg="post predict" error="Post \"http://127.0.0.1:40683/completion\": EOF"
Jul 15 16:21:32 amobile ollama[17467]: [GIN] 2025/07/15 - 16:21:32 | 500 | 11.994152362s |       127.0.0.1 | POST     "/api/chat"
Jul 15 16:22:39 amobile ollama[17467]: time=2025-07-15T16:22:39.458-04:00 level=INFO source=server.go:135 msg="system memory" total="31.2 GiB" free="27.2 GiB" free_swap="30.1 GiB"
Jul 15 16:22:39 amobile ollama[17467]: time=2025-07-15T16:22:39.461-04:00 level=INFO source=server.go:175 msg=offload library=cpu layers.requested=-1 layers.model=35 layers.offload=0 layers.split="" memory.available="[27.2 GiB]" memory.gpu_overhead="0 B" memory.required.full="5.1 GiB" memory.required.partial="0 B" memory.required.kv="450.0 MiB" memory.required.allocations="[5.1 GiB]" memory.weights.total="2.3 GiB" memory.weights.repeating="1.8 GiB" memory.weights.nonrepeating="525.0 MiB" memory.graph.full="517.0 MiB" memory.graph.partial="1.0 GiB" projector.weights="795.9 MiB" projector.graph="1.0 GiB"
Jul 15 16:22:39 amobile ollama[17467]: time=2025-07-15T16:22:39.562-04:00 level=INFO source=server.go:438 msg="starting llama server" cmd="/usr/local/bin/ollama runner --ollama-engine --model /usr/share/ollama/.ollama/models/blobs/sha256-aeda25e63ebd698fab8638ffb778e68bed908b960d39d0becc650fa981609d25 --ctx-size 8192 --batch-size 512 --threads 28 --no-mmap --parallel 2 --port 45949"
Jul 15 16:22:39 amobile ollama[17467]: time=2025-07-15T16:22:39.562-04:00 level=INFO source=sched.go:483 msg="loaded runners" count=1
Jul 15 16:22:39 amobile ollama[17467]: time=2025-07-15T16:22:39.562-04:00 level=INFO source=server.go:598 msg="waiting for llama runner to start responding"
Jul 15 16:22:39 amobile ollama[17467]: time=2025-07-15T16:22:39.563-04:00 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server not responding"
Jul 15 16:22:39 amobile ollama[17467]: time=2025-07-15T16:22:39.581-04:00 level=INFO source=runner.go:925 msg="starting ollama engine"
Jul 15 16:22:39 amobile ollama[17467]: time=2025-07-15T16:22:39.582-04:00 level=INFO source=runner.go:983 msg="Server listening on 127.0.0.1:45949"
Jul 15 16:22:39 amobile ollama[17467]: time=2025-07-15T16:22:39.678-04:00 level=INFO source=ggml.go:92 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=883 num_key_values=36
Jul 15 16:22:39 amobile ollama[17467]: load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-haswell.so
Jul 15 16:22:39 amobile ollama[17467]: time=2025-07-15T16:22:39.684-04:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc)
Jul 15 16:22:39 amobile ollama[17467]: time=2025-07-15T16:22:39.690-04:00 level=INFO source=ggml.go:359 msg="offloading 0 repeating layers to GPU"
Jul 15 16:22:39 amobile ollama[17467]: time=2025-07-15T16:22:39.690-04:00 level=INFO source=ggml.go:363 msg="offloading output layer to CPU"
Jul 15 16:22:39 amobile ollama[17467]: time=2025-07-15T16:22:39.690-04:00 level=INFO source=ggml.go:375 msg="offloaded 0/35 layers to GPU"
Jul 15 16:22:39 amobile ollama[17467]: time=2025-07-15T16:22:39.690-04:00 level=INFO source=ggml.go:377 msg="model weights" buffer=CPU size="3.6 GiB"
Jul 15 16:22:39 amobile ollama[17467]: time=2025-07-15T16:22:39.814-04:00 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server loading model"
Jul 15 16:22:39 amobile ollama[17467]: time=2025-07-15T16:22:39.973-04:00 level=INFO source=ggml.go:666 msg="compute graph" backend=CPU buffer_type=CPU size="1.1 GiB"
Jul 15 16:22:40 amobile ollama[17467]: time=2025-07-15T16:22:40.238-04:00 level=INFO source=ggml.go:666 msg="compute graph" backend=CPU buffer_type=CPU size="1.1 GiB"
Jul 15 16:22:41 amobile ollama[17467]: time=2025-07-15T16:22:41.844-04:00 level=INFO source=server.go:637 msg="llama runner started in 2.28 seconds"
Jul 15 16:23:21 amobile ollama[17467]: [GIN] 2025/07/15 - 16:23:21 | 200 | 42.358125818s |       127.0.0.1 | POST     "/api/chat"
Jul 15 16:35:49 amobile ollama[17467]: time=2025-07-15T16:35:49.962-04:00 level=INFO source=server.go:135 msg="system memory" total="31.2 GiB" free="27.2 GiB" free_swap="30.1 GiB"
Jul 15 16:35:49 amobile ollama[17467]: time=2025-07-15T16:35:49.964-04:00 level=INFO source=server.go:175 msg=offload library=cpu layers.requested=-1 layers.model=35 layers.offload=0 layers.split="" memory.available="[27.2 GiB]" memory.gpu_overhead="0 B" memory.required.full="5.1 GiB" memory.required.partial="0 B" memory.required.kv="450.0 MiB" memory.required.allocations="[5.1 GiB]" memory.weights.total="2.3 GiB" memory.weights.repeating="1.8 GiB" memory.weights.nonrepeating="525.0 MiB" memory.graph.full="517.0 MiB" memory.graph.partial="1.0 GiB" projector.weights="795.9 MiB" projector.graph="1.0 GiB"
Jul 15 16:35:50 amobile ollama[17467]: time=2025-07-15T16:35:50.062-04:00 level=INFO source=server.go:438 msg="starting llama server" cmd="/usr/local/bin/ollama runner --ollama-engine --model /usr/share/ollama/.ollama/models/blobs/sha256-aeda25e63ebd698fab8638ffb778e68bed908b960d39d0becc650fa981609d25 --ctx-size 8192 --batch-size 512 --threads 28 --no-mmap --parallel 2 --port 36385"
Jul 15 16:35:50 amobile ollama[17467]: time=2025-07-15T16:35:50.062-04:00 level=INFO source=sched.go:483 msg="loaded runners" count=1
Jul 15 16:35:50 amobile ollama[17467]: time=2025-07-15T16:35:50.063-04:00 level=INFO source=server.go:598 msg="waiting for llama runner to start responding"
Jul 15 16:35:50 amobile ollama[17467]: time=2025-07-15T16:35:50.063-04:00 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server not responding"
Jul 15 16:35:50 amobile ollama[17467]: time=2025-07-15T16:35:50.080-04:00 level=INFO source=runner.go:925 msg="starting ollama engine"
Jul 15 16:35:50 amobile ollama[17467]: time=2025-07-15T16:35:50.081-04:00 level=INFO source=runner.go:983 msg="Server listening on 127.0.0.1:36385"
Jul 15 16:35:50 amobile ollama[17467]: time=2025-07-15T16:35:50.178-04:00 level=INFO source=ggml.go:92 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=883 num_key_values=36
Jul 15 16:35:50 amobile ollama[17467]: load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-haswell.so
Jul 15 16:35:50 amobile ollama[17467]: time=2025-07-15T16:35:50.186-04:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc)
Jul 15 16:35:50 amobile ollama[17467]: time=2025-07-15T16:35:50.193-04:00 level=INFO source=ggml.go:359 msg="offloading 0 repeating layers to GPU"
Jul 15 16:35:50 amobile ollama[17467]: time=2025-07-15T16:35:50.193-04:00 level=INFO source=ggml.go:363 msg="offloading output layer to CPU"
Jul 15 16:35:50 amobile ollama[17467]: time=2025-07-15T16:35:50.193-04:00 level=INFO source=ggml.go:375 msg="offloaded 0/35 layers to GPU"
Jul 15 16:35:50 amobile ollama[17467]: time=2025-07-15T16:35:50.193-04:00 level=INFO source=ggml.go:377 msg="model weights" buffer=CPU size="3.6 GiB"
Jul 15 16:35:50 amobile ollama[17467]: time=2025-07-15T16:35:50.314-04:00 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server loading model"
Jul 15 16:35:50 amobile ollama[17467]: time=2025-07-15T16:35:50.481-04:00 level=INFO source=ggml.go:666 msg="compute graph" backend=CPU buffer_type=CPU size="1.1 GiB"
Jul 15 16:35:50 amobile ollama[17467]: time=2025-07-15T16:35:50.656-04:00 level=INFO source=ggml.go:666 msg="compute graph" backend=CPU buffer_type=CPU size="1.1 GiB"
Jul 15 16:35:52 amobile ollama[17467]: time=2025-07-15T16:35:52.094-04:00 level=INFO source=server.go:637 msg="llama runner started in 2.03 seconds"
Jul 15 16:36:34 amobile ollama[17467]: [GIN] 2025/07/15 - 16:36:34 | 200 | 44.698539465s |       127.0.0.1 | POST     "/api/chat"
Jul 15 16:37:39 amobile ollama[17467]: [GIN] 2025/07/15 - 16:37:39 | 200 | 15.367562581s |       127.0.0.1 | POST     "/api/chat"
Jul 15 16:39:19 amobile ollama[17467]: [GIN] 2025/07/15 - 16:39:19 | 200 | 58.985695818s |       127.0.0.1 | POST     "/api/chat"
Jul 15 16:39:26 amobile ollama[17467]: [GIN] 2025/07/15 - 16:39:26 | 200 |          1m4s |       127.0.0.1 | POST     "/api/chat"
Jul 15 16:40:27 amobile ollama[17467]: [GIN] 2025/07/15 - 16:40:27 | 200 | 32.498111154s |       127.0.0.1 | POST     "/api/chat"
Jul 15 16:41:19 amobile ollama[17467]: [GIN] 2025/07/15 - 16:41:19 | 200 | 25.790679315s |       127.0.0.1 | POST     "/api/chat"
Jul 15 16:42:42 amobile ollama[17467]: [GIN] 2025/07/15 - 16:42:42 | 200 | 29.962983809s |       127.0.0.1 | POST     "/api/chat"
Jul 15 16:44:26 amobile ollama[17467]: [GIN] 2025/07/15 - 16:44:26 | 200 | 20.049622997s |       127.0.0.1 | POST     "/api/chat"
Jul 15 16:45:03 amobile ollama[17467]: [GIN] 2025/07/15 - 16:45:03 | 200 | 11.432715593s |       127.0.0.1 | POST     "/api/chat"
Jul 15 16:45:30 amobile ollama[17467]: [GIN] 2025/07/15 - 16:45:30 | 200 |  15.81967878s |       127.0.0.1 | POST     "/api/chat"
Jul 15 16:47:41 amobile ollama[17467]: [GIN] 2025/07/15 - 16:47:41 | 200 |         1m15s |       127.0.0.1 | POST     "/api/chat"
Jul 15 16:49:29 amobile ollama[17467]: [GIN] 2025/07/15 - 16:49:29 | 200 |          3m9s |       127.0.0.1 | POST     "/api/chat"
Jul 15 16:51:05 amobile ollama[17467]: [GIN] 2025/07/15 - 16:51:05 | 200 | 14.459985878s |       127.0.0.1 | POST     "/api/chat"
Jul 15 16:52:31 amobile ollama[17467]: panic: failed to decode batch: could not find a kv cache slot (cache: 2560 batch: 512)
Jul 15 16:52:31 amobile ollama[17467]: goroutine 14 [running]:
Jul 15 16:52:31 amobile ollama[17467]: github.com/ollama/ollama/runner/ollamarunner.(*Server).run(0xc00065e900, {0x555e3641c700, 0xc0001308c0})
Jul 15 16:52:31 amobile ollama[17467]:         github.com/ollama/ollama/runner/ollamarunner/runner.go:364 +0x65
Jul 15 16:52:31 amobile ollama[17467]: created by github.com/ollama/ollama/runner/ollamarunner.Execute in goroutine 1
Jul 15 16:52:31 amobile ollama[17467]:         github.com/ollama/ollama/runner/ollamarunner/runner.go:960 +0xa74
Jul 15 16:52:31 amobile ollama[17467]: time=2025-07-15T16:52:31.866-04:00 level=ERROR source=server.go:807 msg="post predict" error="Post \"http://127.0.0.1:36385/completion\": EOF"
Jul 15 16:52:31 amobile ollama[17467]: [GIN] 2025/07/15 - 16:52:31 | 200 | 13.142628839s |       127.0.0.1 | POST     "/api/chat"
Jul 15 16:53:23 amobile ollama[17467]: time=2025-07-15T16:53:23.611-04:00 level=INFO source=server.go:135 msg="system memory" total="31.2 GiB" free="27.2 GiB" free_swap="30.1 GiB"
Jul 15 16:53:23 amobile ollama[17467]: time=2025-07-15T16:53:23.613-04:00 level=INFO source=server.go:175 msg=offload library=cpu layers.requested=-1 layers.model=35 layers.offload=0 layers.split="" memory.available="[27.2 GiB]" memory.gpu_overhead="0 B" memory.required.full="5.1 GiB" memory.required.partial="0 B" memory.required.kv="450.0 MiB" memory.required.allocations="[5.1 GiB]" memory.weights.total="2.3 GiB" memory.weights.repeating="1.8 GiB" memory.weights.nonrepeating="525.0 MiB" memory.graph.full="517.0 MiB" memory.graph.partial="1.0 GiB" projector.weights="795.9 MiB" projector.graph="1.0 GiB"
Jul 15 16:53:23 amobile ollama[17467]: time=2025-07-15T16:53:23.724-04:00 level=INFO source=server.go:438 msg="starting llama server" cmd="/usr/local/bin/ollama runner --ollama-engine --model /usr/share/ollama/.ollama/models/blobs/sha256-aeda25e63ebd698fab8638ffb778e68bed908b960d39d0becc650fa981609d25 --ctx-size 8192 --batch-size 512 --threads 28 --no-mmap --parallel 2 --port 39263"
Jul 15 16:53:23 amobile ollama[17467]: time=2025-07-15T16:53:23.724-04:00 level=INFO source=sched.go:483 msg="loaded runners" count=1
Jul 15 16:53:23 amobile ollama[17467]: time=2025-07-15T16:53:23.724-04:00 level=INFO source=server.go:598 msg="waiting for llama runner to start responding"
Jul 15 16:53:23 amobile ollama[17467]: time=2025-07-15T16:53:23.724-04:00 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server not responding"
Jul 15 16:53:23 amobile ollama[17467]: time=2025-07-15T16:53:23.739-04:00 level=INFO source=runner.go:925 msg="starting ollama engine"
Jul 15 16:53:23 amobile ollama[17467]: time=2025-07-15T16:53:23.740-04:00 level=INFO source=runner.go:983 msg="Server listening on 127.0.0.1:39263"
Jul 15 16:53:23 amobile ollama[17467]: time=2025-07-15T16:53:23.836-04:00 level=INFO source=ggml.go:92 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=883 num_key_values=36
Jul 15 16:53:23 amobile ollama[17467]: load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-haswell.so
Jul 15 16:53:23 amobile ollama[17467]: time=2025-07-15T16:53:23.842-04:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc)
Jul 15 16:53:23 amobile ollama[17467]: time=2025-07-15T16:53:23.848-04:00 level=INFO source=ggml.go:359 msg="offloading 0 repeating layers to GPU"
Jul 15 16:53:23 amobile ollama[17467]: time=2025-07-15T16:53:23.848-04:00 level=INFO source=ggml.go:363 msg="offloading output layer to CPU"
Jul 15 16:53:23 amobile ollama[17467]: time=2025-07-15T16:53:23.848-04:00 level=INFO source=ggml.go:375 msg="offloaded 0/35 layers to GPU"
Jul 15 16:53:23 amobile ollama[17467]: time=2025-07-15T16:53:23.848-04:00 level=INFO source=ggml.go:377 msg="model weights" buffer=CPU size="3.6 GiB"
Jul 15 16:53:23 amobile ollama[17467]: time=2025-07-15T16:53:23.976-04:00 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server loading model"
Jul 15 16:53:24 amobile ollama[17467]: time=2025-07-15T16:53:24.139-04:00 level=INFO source=ggml.go:666 msg="compute graph" backend=CPU buffer_type=CPU size="1.1 GiB"
Jul 15 16:53:24 amobile ollama[17467]: time=2025-07-15T16:53:24.409-04:00 level=INFO source=ggml.go:666 msg="compute graph" backend=CPU buffer_type=CPU size="1.1 GiB"
Jul 15 16:53:26 amobile ollama[17467]: time=2025-07-15T16:53:26.009-04:00 level=INFO source=server.go:637 msg="llama runner started in 2.28 seconds"
Jul 15 16:54:38 amobile ollama[17467]: [GIN] 2025/07/15 - 16:54:38 | 200 |         1m15s |       127.0.0.1 | POST     "/api/chat"
Jul 15 16:56:20 amobile ollama[17467]: [GIN] 2025/07/15 - 16:56:20 | 200 |          1m9s |       127.0.0.1 | POST     "/api/chat"
Jul 15 16:59:05 amobile ollama[17467]: [GIN] 2025/07/15 - 16:59:05 | 200 |         1m34s |       127.0.0.1 | POST     "/api/chat"
Jul 15 17:00:40 amobile ollama[17467]: [GIN] 2025/07/15 - 17:00:40 | 200 |         1m33s |       127.0.0.1 | POST     "/api/chat"
Jul 15 17:01:10 amobile ollama[17467]: [GIN] 2025/07/15 - 17:01:10 | 200 |     140.204µs |       127.0.0.1 | GET      "/api/version"
Jul 15 17:02:26 amobile ollama[17467]: [GIN] 2025/07/15 - 17:02:26 | 200 |         1m37s |       127.0.0.1 | POST     "/api/chat"
Jul 15 17:35:05 amobile ollama[17467]: [GIN] 2025/07/15 - 17:35:05 | 200 |      52.773µs |       127.0.0.1 | HEAD     "/"
Jul 15 17:35:05 amobile ollama[17467]: [GIN] 2025/07/15 - 17:35:05 | 200 |  204.188719ms |       127.0.0.1 | POST     "/api/show"
Jul 15 17:35:05 amobile ollama[17467]: time=2025-07-15T17:35:05.674-04:00 level=INFO source=server.go:135 msg="system memory" total="31.2 GiB" free="27.2 GiB" free_swap="30.1 GiB"
Jul 15 17:35:05 amobile ollama[17467]: time=2025-07-15T17:35:05.676-04:00 level=INFO source=server.go:175 msg=offload library=cpu layers.requested=-1 layers.model=35 layers.offload=0 layers.split="" memory.available="[27.2 GiB]" memory.gpu_overhead="0 B" memory.required.full="5.1 GiB" memory.required.partial="0 B" memory.required.kv="450.0 MiB" memory.required.allocations="[5.1 GiB]" memory.weights.total="2.3 GiB" memory.weights.repeating="1.8 GiB" memory.weights.nonrepeating="525.0 MiB" memory.graph.full="517.0 MiB" memory.graph.partial="1.0 GiB" projector.weights="795.9 MiB" projector.graph="1.0 GiB"
Jul 15 17:35:05 amobile ollama[17467]: time=2025-07-15T17:35:05.774-04:00 level=INFO source=server.go:438 msg="starting llama server" cmd="/usr/local/bin/ollama runner --ollama-engine --model /usr/share/ollama/.ollama/models/blobs/sha256-aeda25e63ebd698fab8638ffb778e68bed908b960d39d0becc650fa981609d25 --ctx-size 8192 --batch-size 512 --threads 28 --no-mmap --parallel 2 --port 36275"
Jul 15 17:35:05 amobile ollama[17467]: time=2025-07-15T17:35:05.774-04:00 level=INFO source=sched.go:483 msg="loaded runners" count=1
Jul 15 17:35:05 amobile ollama[17467]: time=2025-07-15T17:35:05.774-04:00 level=INFO source=server.go:598 msg="waiting for llama runner to start responding"
Jul 15 17:35:05 amobile ollama[17467]: time=2025-07-15T17:35:05.774-04:00 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server not responding"
Jul 15 17:35:05 amobile ollama[17467]: time=2025-07-15T17:35:05.788-04:00 level=INFO source=runner.go:925 msg="starting ollama engine"
Jul 15 17:35:05 amobile ollama[17467]: time=2025-07-15T17:35:05.790-04:00 level=INFO source=runner.go:983 msg="Server listening on 127.0.0.1:36275"
Jul 15 17:35:05 amobile ollama[17467]: time=2025-07-15T17:35:05.880-04:00 level=INFO source=ggml.go:92 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=883 num_key_values=36
Jul 15 17:35:05 amobile ollama[17467]: load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-haswell.so
Jul 15 17:35:05 amobile ollama[17467]: time=2025-07-15T17:35:05.888-04:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc)
Jul 15 17:35:05 amobile ollama[17467]: time=2025-07-15T17:35:05.893-04:00 level=INFO source=ggml.go:359 msg="offloading 0 repeating layers to GPU"
Jul 15 17:35:05 amobile ollama[17467]: time=2025-07-15T17:35:05.893-04:00 level=INFO source=ggml.go:363 msg="offloading output layer to CPU"
Jul 15 17:35:05 amobile ollama[17467]: time=2025-07-15T17:35:05.893-04:00 level=INFO source=ggml.go:375 msg="offloaded 0/35 layers to GPU"
Jul 15 17:35:05 amobile ollama[17467]: time=2025-07-15T17:35:05.893-04:00 level=INFO source=ggml.go:377 msg="model weights" buffer=CPU size="3.6 GiB"
Jul 15 17:35:06 amobile ollama[17467]: time=2025-07-15T17:35:06.025-04:00 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server loading model"
Jul 15 17:35:06 amobile ollama[17467]: time=2025-07-15T17:35:06.174-04:00 level=INFO source=ggml.go:666 msg="compute graph" backend=CPU buffer_type=CPU size="1.1 GiB"
Jul 15 17:35:06 amobile ollama[17467]: time=2025-07-15T17:35:06.447-04:00 level=INFO source=ggml.go:666 msg="compute graph" backend=CPU buffer_type=CPU size="1.1 GiB"
Jul 15 17:35:08 amobile ollama[17467]: time=2025-07-15T17:35:08.049-04:00 level=INFO source=server.go:637 msg="llama runner started in 2.27 seconds"
Jul 15 17:35:08 amobile ollama[17467]: [GIN] 2025/07/15 - 17:35:08 | 200 |  2.657473117s |       127.0.0.1 | POST     "/api/generate"
Jul 15 17:39:08 amobile ollama[17467]: [GIN] 2025/07/15 - 17:39:08 | 200 |          2m2s |       127.0.0.1 | POST     "/api/chat"
Jul 15 17:41:57 amobile ollama[17467]: [GIN] 2025/07/15 - 17:41:57 | 200 | 42.836773699s |       127.0.0.1 | POST     "/api/chat"
Jul 15 17:43:41 amobile ollama[17467]: [GIN] 2025/07/15 - 17:43:41 | 200 | 31.474378088s |       127.0.0.1 | POST     "/api/chat"
<!-- gh-comment-id:3075945773 --> @abcbarryn commented on GitHub (Jul 15, 2025): ``` Jul 15 01:06:05 amobile ollama[17467]: [GIN] 2025/07/15 - 01:06:05 | 200 | 43.89µs | 127.0.0.1 | HEAD "/" Jul 15 01:06:05 amobile ollama[17467]: [GIN] 2025/07/15 - 01:06:05 | 200 | 34.607µs | 127.0.0.1 | GET "/api/ps" Jul 15 01:06:15 amobile ollama[17467]: [GIN] 2025/07/15 - 01:06:15 | 200 | 34.303µs | 127.0.0.1 | HEAD "/" Jul 15 01:06:15 amobile ollama[17467]: [GIN] 2025/07/15 - 01:06:15 | 200 | 199.449466ms | 127.0.0.1 | POST "/api/show" Jul 15 01:06:16 amobile ollama[17467]: time=2025-07-15T01:06:16.135-04:00 level=INFO source=server.go:135 msg="system memory" total="31.2 GiB" free="27.7 GiB" free_swap="29.7 GiB" Jul 15 01:06:16 amobile ollama[17467]: time=2025-07-15T01:06:16.138-04:00 level=INFO source=server.go:175 msg=offload library=cpu layers.requested=-1 layers.model=35 layers.offload=0 layers.split="" memory.available="[27.7 GiB]" memory.gpu_overhead="0 B" memory.required.full="5.1 GiB" memory.required.partial="0 B" memory.required.kv="450.0 MiB" memory.required.allocations="[5.1 GiB]" memory.weights.total="2.3 GiB" memory.weights.repeating="1.8 GiB" memory.weights.nonrepeating="525.0 MiB" memory.graph.full="517.0 MiB" memory.graph.partial="1.0 GiB" projector.weights="795.9 MiB" projector.graph="1.0 GiB" Jul 15 01:06:16 amobile ollama[17467]: time=2025-07-15T01:06:16.236-04:00 level=INFO source=server.go:438 msg="starting llama server" cmd="/usr/local/bin/ollama runner --ollama-engine --model /usr/share/ollama/.ollama/models/blobs/sha256-aeda25e63ebd698fab8638ffb778e68bed908b960d39d0becc650fa981609d25 --ctx-size 8192 --batch-size 512 --threads 28 --no-mmap --parallel 2 --port 33357" Jul 15 01:06:16 amobile ollama[17467]: time=2025-07-15T01:06:16.237-04:00 level=INFO source=sched.go:483 msg="loaded runners" count=1 Jul 15 01:06:16 amobile ollama[17467]: time=2025-07-15T01:06:16.237-04:00 level=INFO source=server.go:598 msg="waiting for llama runner to start responding" Jul 15 01:06:16 amobile ollama[17467]: time=2025-07-15T01:06:16.237-04:00 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server not responding" Jul 15 01:06:16 amobile ollama[17467]: time=2025-07-15T01:06:16.253-04:00 level=INFO source=runner.go:925 msg="starting ollama engine" Jul 15 01:06:16 amobile ollama[17467]: time=2025-07-15T01:06:16.254-04:00 level=INFO source=runner.go:983 msg="Server listening on 127.0.0.1:33357" Jul 15 01:06:16 amobile ollama[17467]: time=2025-07-15T01:06:16.344-04:00 level=INFO source=ggml.go:92 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=883 num_key_values=36 Jul 15 01:06:16 amobile ollama[17467]: load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-haswell.so Jul 15 01:06:16 amobile ollama[17467]: time=2025-07-15T01:06:16.351-04:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc) Jul 15 01:06:16 amobile ollama[17467]: time=2025-07-15T01:06:16.357-04:00 level=INFO source=ggml.go:359 msg="offloading 0 repeating layers to GPU" Jul 15 01:06:16 amobile ollama[17467]: time=2025-07-15T01:06:16.357-04:00 level=INFO source=ggml.go:363 msg="offloading output layer to CPU" Jul 15 01:06:16 amobile ollama[17467]: time=2025-07-15T01:06:16.357-04:00 level=INFO source=ggml.go:375 msg="offloaded 0/35 layers to GPU" Jul 15 01:06:16 amobile ollama[17467]: time=2025-07-15T01:06:16.357-04:00 level=INFO source=ggml.go:377 msg="model weights" buffer=CPU size="3.6 GiB" Jul 15 01:06:16 amobile ollama[17467]: time=2025-07-15T01:06:16.489-04:00 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server loading model" Jul 15 01:06:16 amobile ollama[17467]: time=2025-07-15T01:06:16.666-04:00 level=INFO source=ggml.go:666 msg="compute graph" backend=CPU buffer_type=CPU size="1.1 GiB" Jul 15 01:06:16 amobile ollama[17467]: time=2025-07-15T01:06:16.962-04:00 level=INFO source=ggml.go:666 msg="compute graph" backend=CPU buffer_type=CPU size="1.1 GiB" Jul 15 01:06:17 amobile ollama[17467]: time=2025-07-15T01:06:17.762-04:00 level=INFO source=server.go:637 msg="llama runner started in 1.53 seconds" Jul 15 01:06:17 amobile ollama[17467]: [GIN] 2025/07/15 - 01:06:17 | 200 | 1.92506674s | 127.0.0.1 | POST "/api/generate" Jul 15 01:09:41 amobile ollama[17467]: [GIN] 2025/07/15 - 01:09:41 | 200 | 2m42s | 127.0.0.1 | POST "/api/chat" Jul 15 01:12:48 amobile ollama[17467]: [GIN] 2025/07/15 - 01:12:48 | 200 | 2m0s | 127.0.0.1 | POST "/api/chat" Jul 15 01:19:34 amobile ollama[17467]: [GIN] 2025/07/15 - 01:19:34 | 200 | 2m42s | 127.0.0.1 | POST "/api/chat" Jul 15 01:24:42 amobile ollama[17467]: time=2025-07-15T01:24:42.026-04:00 level=INFO source=server.go:135 msg="system memory" total="31.2 GiB" free="27.6 GiB" free_swap="29.7 GiB" Jul 15 01:24:42 amobile ollama[17467]: time=2025-07-15T01:24:42.029-04:00 level=INFO source=server.go:175 msg=offload library=cpu layers.requested=-1 layers.model=35 layers.offload=0 layers.split="" memory.available="[27.6 GiB]" memory.gpu_overhead="0 B" memory.required.full="5.1 GiB" memory.required.partial="0 B" memory.required.kv="450.0 MiB" memory.required.allocations="[5.1 GiB]" memory.weights.total="2.3 GiB" memory.weights.repeating="1.8 GiB" memory.weights.nonrepeating="525.0 MiB" memory.graph.full="517.0 MiB" memory.graph.partial="1.0 GiB" projector.weights="795.9 MiB" projector.graph="1.0 GiB" Jul 15 01:24:42 amobile ollama[17467]: time=2025-07-15T01:24:42.148-04:00 level=INFO source=server.go:438 msg="starting llama server" cmd="/usr/local/bin/ollama runner --ollama-engine --model /usr/share/ollama/.ollama/models/blobs/sha256-aeda25e63ebd698fab8638ffb778e68bed908b960d39d0becc650fa981609d25 --ctx-size 8192 --batch-size 512 --threads 28 --no-mmap --parallel 2 --port 41521" Jul 15 01:24:42 amobile ollama[17467]: time=2025-07-15T01:24:42.148-04:00 level=INFO source=sched.go:483 msg="loaded runners" count=1 Jul 15 01:24:42 amobile ollama[17467]: time=2025-07-15T01:24:42.148-04:00 level=INFO source=server.go:598 msg="waiting for llama runner to start responding" Jul 15 01:24:42 amobile ollama[17467]: time=2025-07-15T01:24:42.149-04:00 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server not responding" Jul 15 01:24:42 amobile ollama[17467]: time=2025-07-15T01:24:42.168-04:00 level=INFO source=runner.go:925 msg="starting ollama engine" Jul 15 01:24:42 amobile ollama[17467]: time=2025-07-15T01:24:42.168-04:00 level=INFO source=runner.go:983 msg="Server listening on 127.0.0.1:41521" Jul 15 01:24:42 amobile ollama[17467]: time=2025-07-15T01:24:42.273-04:00 level=INFO source=ggml.go:92 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=883 num_key_values=36 Jul 15 01:24:42 amobile ollama[17467]: load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-haswell.so Jul 15 01:24:42 amobile ollama[17467]: time=2025-07-15T01:24:42.280-04:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc) Jul 15 01:24:42 amobile ollama[17467]: time=2025-07-15T01:24:42.285-04:00 level=INFO source=ggml.go:359 msg="offloading 0 repeating layers to GPU" Jul 15 01:24:42 amobile ollama[17467]: time=2025-07-15T01:24:42.286-04:00 level=INFO source=ggml.go:363 msg="offloading output layer to CPU" Jul 15 01:24:42 amobile ollama[17467]: time=2025-07-15T01:24:42.286-04:00 level=INFO source=ggml.go:375 msg="offloaded 0/35 layers to GPU" Jul 15 01:24:42 amobile ollama[17467]: time=2025-07-15T01:24:42.286-04:00 level=INFO source=ggml.go:377 msg="model weights" buffer=CPU size="3.6 GiB" Jul 15 01:24:42 amobile ollama[17467]: time=2025-07-15T01:24:42.401-04:00 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server loading model" Jul 15 01:24:42 amobile ollama[17467]: time=2025-07-15T01:24:42.570-04:00 level=INFO source=ggml.go:666 msg="compute graph" backend=CPU buffer_type=CPU size="1.1 GiB" Jul 15 01:24:42 amobile ollama[17467]: time=2025-07-15T01:24:42.747-04:00 level=INFO source=ggml.go:666 msg="compute graph" backend=CPU buffer_type=CPU size="1.1 GiB" Jul 15 01:24:43 amobile ollama[17467]: time=2025-07-15T01:24:43.911-04:00 level=INFO source=server.go:637 msg="llama runner started in 1.76 seconds" Jul 15 01:27:35 amobile ollama[17467]: [GIN] 2025/07/15 - 01:27:35 | 200 | 2m53s | 127.0.0.1 | POST "/api/chat" Jul 15 01:33:05 amobile ollama[17467]: [GIN] 2025/07/15 - 01:33:05 | 200 | 37.98540726s | 127.0.0.1 | POST "/api/chat" Jul 15 01:40:16 amobile ollama[17467]: time=2025-07-15T01:40:16.619-04:00 level=INFO source=server.go:135 msg="system memory" total="31.2 GiB" free="27.6 GiB" free_swap="29.7 GiB" Jul 15 01:40:16 amobile ollama[17467]: time=2025-07-15T01:40:16.621-04:00 level=INFO source=server.go:175 msg=offload library=cpu layers.requested=-1 layers.model=35 layers.offload=0 layers.split="" memory.available="[27.6 GiB]" memory.gpu_overhead="0 B" memory.required.full="5.1 GiB" memory.required.partial="0 B" memory.required.kv="450.0 MiB" memory.required.allocations="[5.1 GiB]" memory.weights.total="2.3 GiB" memory.weights.repeating="1.8 GiB" memory.weights.nonrepeating="525.0 MiB" memory.graph.full="517.0 MiB" memory.graph.partial="1.0 GiB" projector.weights="795.9 MiB" projector.graph="1.0 GiB" Jul 15 01:40:16 amobile ollama[17467]: time=2025-07-15T01:40:16.716-04:00 level=INFO source=server.go:438 msg="starting llama server" cmd="/usr/local/bin/ollama runner --ollama-engine --model /usr/share/ollama/.ollama/models/blobs/sha256-aeda25e63ebd698fab8638ffb778e68bed908b960d39d0becc650fa981609d25 --ctx-size 8192 --batch-size 512 --threads 28 --no-mmap --parallel 2 --port 41437" Jul 15 01:40:16 amobile ollama[17467]: time=2025-07-15T01:40:16.716-04:00 level=INFO source=sched.go:483 msg="loaded runners" count=1 Jul 15 01:40:16 amobile ollama[17467]: time=2025-07-15T01:40:16.716-04:00 level=INFO source=server.go:598 msg="waiting for llama runner to start responding" Jul 15 01:40:16 amobile ollama[17467]: time=2025-07-15T01:40:16.717-04:00 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server not responding" Jul 15 01:40:16 amobile ollama[17467]: time=2025-07-15T01:40:16.734-04:00 level=INFO source=runner.go:925 msg="starting ollama engine" Jul 15 01:40:16 amobile ollama[17467]: time=2025-07-15T01:40:16.736-04:00 level=INFO source=runner.go:983 msg="Server listening on 127.0.0.1:41437" Jul 15 01:40:16 amobile ollama[17467]: time=2025-07-15T01:40:16.827-04:00 level=INFO source=ggml.go:92 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=883 num_key_values=36 Jul 15 01:40:16 amobile ollama[17467]: load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-haswell.so Jul 15 01:40:16 amobile ollama[17467]: time=2025-07-15T01:40:16.834-04:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc) Jul 15 01:40:16 amobile ollama[17467]: time=2025-07-15T01:40:16.841-04:00 level=INFO source=ggml.go:359 msg="offloading 0 repeating layers to GPU" Jul 15 01:40:16 amobile ollama[17467]: time=2025-07-15T01:40:16.841-04:00 level=INFO source=ggml.go:363 msg="offloading output layer to CPU" Jul 15 01:40:16 amobile ollama[17467]: time=2025-07-15T01:40:16.841-04:00 level=INFO source=ggml.go:375 msg="offloaded 0/35 layers to GPU" Jul 15 01:40:16 amobile ollama[17467]: time=2025-07-15T01:40:16.841-04:00 level=INFO source=ggml.go:377 msg="model weights" buffer=CPU size="3.6 GiB" Jul 15 01:40:16 amobile ollama[17467]: time=2025-07-15T01:40:16.968-04:00 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server loading model" Jul 15 01:40:17 amobile ollama[17467]: time=2025-07-15T01:40:17.113-04:00 level=INFO source=ggml.go:666 msg="compute graph" backend=CPU buffer_type=CPU size="1.1 GiB" Jul 15 01:40:17 amobile ollama[17467]: time=2025-07-15T01:40:17.288-04:00 level=INFO source=ggml.go:666 msg="compute graph" backend=CPU buffer_type=CPU size="1.1 GiB" Jul 15 01:40:18 amobile ollama[17467]: time=2025-07-15T01:40:18.226-04:00 level=INFO source=server.go:637 msg="llama runner started in 1.51 seconds" Jul 15 01:41:38 amobile ollama[17467]: [GIN] 2025/07/15 - 01:41:38 | 200 | 1m22s | 127.0.0.1 | POST "/api/chat" Jul 15 01:44:40 amobile ollama[17467]: [GIN] 2025/07/15 - 01:44:40 | 200 | 1m3s | 127.0.0.1 | POST "/api/chat" Jul 15 01:49:32 amobile ollama[17467]: [GIN] 2025/07/15 - 01:49:32 | 200 | 26.3552658s | 127.0.0.1 | POST "/api/chat" Jul 15 02:00:49 amobile ollama[17467]: [GIN] 2025/07/15 - 02:00:49 | 200 | 39.415µs | 127.0.0.1 | HEAD "/" Jul 15 02:00:49 amobile ollama[17467]: [GIN] 2025/07/15 - 02:00:49 | 200 | 182.084631ms | 127.0.0.1 | POST "/api/show" Jul 15 02:00:50 amobile ollama[17467]: time=2025-07-15T02:00:50.063-04:00 level=INFO source=server.go:135 msg="system memory" total="31.2 GiB" free="27.6 GiB" free_swap="29.7 GiB" Jul 15 02:00:50 amobile ollama[17467]: time=2025-07-15T02:00:50.065-04:00 level=INFO source=server.go:175 msg=offload library=cpu layers.requested=-1 layers.model=35 layers.offload=0 layers.split="" memory.available="[27.6 GiB]" memory.gpu_overhead="0 B" memory.required.full="5.1 GiB" memory.required.partial="0 B" memory.required.kv="450.0 MiB" memory.required.allocations="[5.1 GiB]" memory.weights.total="2.3 GiB" memory.weights.repeating="1.8 GiB" memory.weights.nonrepeating="525.0 MiB" memory.graph.full="517.0 MiB" memory.graph.partial="1.0 GiB" projector.weights="795.9 MiB" projector.graph="1.0 GiB" Jul 15 02:00:50 amobile ollama[17467]: time=2025-07-15T02:00:50.166-04:00 level=INFO source=server.go:438 msg="starting llama server" cmd="/usr/local/bin/ollama runner --ollama-engine --model /usr/share/ollama/.ollama/models/blobs/sha256-aeda25e63ebd698fab8638ffb778e68bed908b960d39d0becc650fa981609d25 --ctx-size 8192 --batch-size 512 --threads 28 --no-mmap --parallel 2 --port 45897" Jul 15 02:00:50 amobile ollama[17467]: time=2025-07-15T02:00:50.167-04:00 level=INFO source=sched.go:483 msg="loaded runners" count=1 Jul 15 02:00:50 amobile ollama[17467]: time=2025-07-15T02:00:50.167-04:00 level=INFO source=server.go:598 msg="waiting for llama runner to start responding" Jul 15 02:00:50 amobile ollama[17467]: time=2025-07-15T02:00:50.167-04:00 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server not responding" Jul 15 02:00:50 amobile ollama[17467]: time=2025-07-15T02:00:50.185-04:00 level=INFO source=runner.go:925 msg="starting ollama engine" Jul 15 02:00:50 amobile ollama[17467]: time=2025-07-15T02:00:50.185-04:00 level=INFO source=runner.go:983 msg="Server listening on 127.0.0.1:45897" Jul 15 02:00:50 amobile ollama[17467]: time=2025-07-15T02:00:50.274-04:00 level=INFO source=ggml.go:92 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=883 num_key_values=36 Jul 15 02:00:50 amobile ollama[17467]: load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-haswell.so Jul 15 02:00:50 amobile ollama[17467]: time=2025-07-15T02:00:50.281-04:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc) Jul 15 02:00:50 amobile ollama[17467]: time=2025-07-15T02:00:50.286-04:00 level=INFO source=ggml.go:359 msg="offloading 0 repeating layers to GPU" Jul 15 02:00:50 amobile ollama[17467]: time=2025-07-15T02:00:50.287-04:00 level=INFO source=ggml.go:363 msg="offloading output layer to CPU" Jul 15 02:00:50 amobile ollama[17467]: time=2025-07-15T02:00:50.287-04:00 level=INFO source=ggml.go:375 msg="offloaded 0/35 layers to GPU" Jul 15 02:00:50 amobile ollama[17467]: time=2025-07-15T02:00:50.287-04:00 level=INFO source=ggml.go:377 msg="model weights" buffer=CPU size="3.6 GiB" Jul 15 02:00:50 amobile ollama[17467]: time=2025-07-15T02:00:50.419-04:00 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server loading model" Jul 15 02:00:50 amobile ollama[17467]: time=2025-07-15T02:00:50.575-04:00 level=INFO source=ggml.go:666 msg="compute graph" backend=CPU buffer_type=CPU size="1.1 GiB" Jul 15 02:00:50 amobile ollama[17467]: time=2025-07-15T02:00:50.848-04:00 level=INFO source=ggml.go:666 msg="compute graph" backend=CPU buffer_type=CPU size="1.1 GiB" Jul 15 02:00:51 amobile ollama[17467]: time=2025-07-15T02:00:51.684-04:00 level=INFO source=server.go:637 msg="llama runner started in 1.52 seconds" Jul 15 02:00:51 amobile ollama[17467]: [GIN] 2025/07/15 - 02:00:51 | 200 | 1.910645146s | 127.0.0.1 | POST "/api/generate" Jul 15 02:04:07 amobile ollama[17467]: [GIN] 2025/07/15 - 02:04:07 | 200 | 2m49s | 127.0.0.1 | POST "/api/chat" Jul 15 02:06:47 amobile ollama[17467]: [GIN] 2025/07/15 - 02:06:47 | 200 | 2m12s | 127.0.0.1 | POST "/api/chat" Jul 15 02:10:37 amobile ollama[17467]: [GIN] 2025/07/15 - 02:10:37 | 200 | 2m24s | 127.0.0.1 | POST "/api/chat" Jul 15 02:13:36 amobile ollama[17467]: [GIN] 2025/07/15 - 02:13:36 | 200 | 1m45s | 127.0.0.1 | POST "/api/chat" Jul 15 02:18:37 amobile ollama[17467]: [GIN] 2025/07/15 - 02:18:37 | 200 | 1m29s | 127.0.0.1 | POST "/api/chat" Jul 15 02:20:22 amobile ollama[17467]: [GIN] 2025/07/15 - 02:20:22 | 200 | 24.101449501s | 127.0.0.1 | POST "/api/chat" Jul 15 02:22:28 amobile ollama[17467]: [GIN] 2025/07/15 - 02:22:28 | 200 | 1m16s | 127.0.0.1 | POST "/api/chat" Jul 15 02:24:37 amobile ollama[17467]: [GIN] 2025/07/15 - 02:24:37 | 200 | 1m15s | 127.0.0.1 | POST "/api/chat" Jul 15 02:26:57 amobile ollama[17467]: [GIN] 2025/07/15 - 02:26:57 | 200 | 1m14s | 127.0.0.1 | POST "/api/chat" Jul 15 02:31:07 amobile ollama[17467]: [GIN] 2025/07/15 - 02:31:07 | 200 | 2m13s | 127.0.0.1 | POST "/api/chat" Jul 15 02:32:33 amobile ollama[17467]: [GIN] 2025/07/15 - 02:32:33 | 200 | 11.564672162s | 127.0.0.1 | POST "/api/chat" Jul 15 02:36:32 amobile ollama[17467]: [GIN] 2025/07/15 - 02:36:32 | 200 | 2m21s | 127.0.0.1 | POST "/api/chat" Jul 15 02:41:53 amobile ollama[17467]: [GIN] 2025/07/15 - 02:41:53 | 200 | 46.782257017s | 127.0.0.1 | POST "/api/chat" Jul 15 02:43:59 amobile ollama[17467]: [GIN] 2025/07/15 - 02:43:59 | 200 | 1m3s | 127.0.0.1 | POST "/api/chat" Jul 15 10:58:17 amobile ollama[17467]: time=2025-07-15T10:58:17.463-04:00 level=INFO source=server.go:135 msg="system memory" total="31.2 GiB" free="27.2 GiB" free_swap="30.0 GiB" Jul 15 10:58:17 amobile ollama[17467]: time=2025-07-15T10:58:17.465-04:00 level=INFO source=server.go:175 msg=offload library=cpu layers.requested=-1 layers.model=35 layers.offload=0 layers.split="" memory.available="[27.2 GiB]" memory.gpu_overhead="0 B" memory.required.full="5.1 GiB" memory.required.partial="0 B" memory.required.kv="450.0 MiB" memory.required.allocations="[5.1 GiB]" memory.weights.total="2.3 GiB" memory.weights.repeating="1.8 GiB" memory.weights.nonrepeating="525.0 MiB" memory.graph.full="517.0 MiB" memory.graph.partial="1.0 GiB" projector.weights="795.9 MiB" projector.graph="1.0 GiB" Jul 15 10:58:17 amobile ollama[17467]: time=2025-07-15T10:58:17.561-04:00 level=INFO source=server.go:438 msg="starting llama server" cmd="/usr/local/bin/ollama runner --ollama-engine --model /usr/share/ollama/.ollama/models/blobs/sha256-aeda25e63ebd698fab8638ffb778e68bed908b960d39d0becc650fa981609d25 --ctx-size 8192 --batch-size 512 --threads 28 --no-mmap --parallel 2 --port 41967" Jul 15 10:58:17 amobile ollama[17467]: time=2025-07-15T10:58:17.562-04:00 level=INFO source=sched.go:483 msg="loaded runners" count=1 Jul 15 10:58:17 amobile ollama[17467]: time=2025-07-15T10:58:17.562-04:00 level=INFO source=server.go:598 msg="waiting for llama runner to start responding" Jul 15 10:58:17 amobile ollama[17467]: time=2025-07-15T10:58:17.562-04:00 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server not responding" Jul 15 10:58:17 amobile ollama[17467]: time=2025-07-15T10:58:17.577-04:00 level=INFO source=runner.go:925 msg="starting ollama engine" Jul 15 10:58:17 amobile ollama[17467]: time=2025-07-15T10:58:17.577-04:00 level=INFO source=runner.go:983 msg="Server listening on 127.0.0.1:41967" Jul 15 10:58:17 amobile ollama[17467]: time=2025-07-15T10:58:17.664-04:00 level=INFO source=ggml.go:92 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=883 num_key_values=36 Jul 15 10:58:17 amobile ollama[17467]: load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-haswell.so Jul 15 10:58:17 amobile ollama[17467]: time=2025-07-15T10:58:17.671-04:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc) Jul 15 10:58:17 amobile ollama[17467]: time=2025-07-15T10:58:17.677-04:00 level=INFO source=ggml.go:359 msg="offloading 0 repeating layers to GPU" Jul 15 10:58:17 amobile ollama[17467]: time=2025-07-15T10:58:17.677-04:00 level=INFO source=ggml.go:363 msg="offloading output layer to CPU" Jul 15 10:58:17 amobile ollama[17467]: time=2025-07-15T10:58:17.677-04:00 level=INFO source=ggml.go:375 msg="offloaded 0/35 layers to GPU" Jul 15 10:58:17 amobile ollama[17467]: time=2025-07-15T10:58:17.677-04:00 level=INFO source=ggml.go:377 msg="model weights" buffer=CPU size="3.6 GiB" Jul 15 10:58:17 amobile ollama[17467]: time=2025-07-15T10:58:17.814-04:00 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server loading model" Jul 15 10:58:17 amobile ollama[17467]: time=2025-07-15T10:58:17.966-04:00 level=INFO source=ggml.go:666 msg="compute graph" backend=CPU buffer_type=CPU size="1.1 GiB" Jul 15 10:58:18 amobile ollama[17467]: time=2025-07-15T10:58:18.140-04:00 level=INFO source=ggml.go:666 msg="compute graph" backend=CPU buffer_type=CPU size="1.1 GiB" Jul 15 10:58:19 amobile ollama[17467]: time=2025-07-15T10:58:19.581-04:00 level=INFO source=server.go:637 msg="llama runner started in 2.02 seconds" Jul 15 10:58:23 amobile ollama[17467]: [GIN] 2025/07/15 - 10:58:23 | 200 | 6.619394623s | 127.0.0.1 | POST "/api/chat" Jul 15 10:58:51 amobile ollama[17467]: [GIN] 2025/07/15 - 10:58:51 | 200 | 2.678771024s | 127.0.0.1 | POST "/api/chat" Jul 15 10:59:28 amobile ollama[17467]: [GIN] 2025/07/15 - 10:59:28 | 200 | 16.793880245s | 127.0.0.1 | POST "/api/chat" Jul 15 11:01:09 amobile ollama[17467]: [GIN] 2025/07/15 - 11:01:09 | 200 | 5.583076216s | 127.0.0.1 | POST "/api/chat" Jul 15 12:24:46 amobile ollama[17467]: [GIN] 2025/07/15 - 12:24:46 | 200 | 56.614µs | 127.0.0.1 | HEAD "/" Jul 15 12:24:46 amobile ollama[17467]: [GIN] 2025/07/15 - 12:24:46 | 200 | 187.47452ms | 127.0.0.1 | POST "/api/show" Jul 15 12:24:47 amobile ollama[17467]: time=2025-07-15T12:24:47.225-04:00 level=INFO source=server.go:135 msg="system memory" total="31.2 GiB" free="27.2 GiB" free_swap="30.0 GiB" Jul 15 12:24:47 amobile ollama[17467]: time=2025-07-15T12:24:47.227-04:00 level=INFO source=server.go:175 msg=offload library=cpu layers.requested=-1 layers.model=35 layers.offload=0 layers.split="" memory.available="[27.2 GiB]" memory.gpu_overhead="0 B" memory.required.full="5.1 GiB" memory.required.partial="0 B" memory.required.kv="450.0 MiB" memory.required.allocations="[5.1 GiB]" memory.weights.total="2.3 GiB" memory.weights.repeating="1.8 GiB" memory.weights.nonrepeating="525.0 MiB" memory.graph.full="517.0 MiB" memory.graph.partial="1.0 GiB" projector.weights="795.9 MiB" projector.graph="1.0 GiB" Jul 15 12:24:47 amobile ollama[17467]: time=2025-07-15T12:24:47.325-04:00 level=INFO source=server.go:438 msg="starting llama server" cmd="/usr/local/bin/ollama runner --ollama-engine --model /usr/share/ollama/.ollama/models/blobs/sha256-aeda25e63ebd698fab8638ffb778e68bed908b960d39d0becc650fa981609d25 --ctx-size 8192 --batch-size 512 --threads 28 --no-mmap --parallel 2 --port 38447" Jul 15 12:24:47 amobile ollama[17467]: time=2025-07-15T12:24:47.325-04:00 level=INFO source=sched.go:483 msg="loaded runners" count=1 Jul 15 12:24:47 amobile ollama[17467]: time=2025-07-15T12:24:47.325-04:00 level=INFO source=server.go:598 msg="waiting for llama runner to start responding" Jul 15 12:24:47 amobile ollama[17467]: time=2025-07-15T12:24:47.325-04:00 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server not responding" Jul 15 12:24:47 amobile ollama[17467]: time=2025-07-15T12:24:47.340-04:00 level=INFO source=runner.go:925 msg="starting ollama engine" Jul 15 12:24:47 amobile ollama[17467]: time=2025-07-15T12:24:47.341-04:00 level=INFO source=runner.go:983 msg="Server listening on 127.0.0.1:38447" Jul 15 12:24:47 amobile ollama[17467]: time=2025-07-15T12:24:47.430-04:00 level=INFO source=ggml.go:92 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=883 num_key_values=36 Jul 15 12:24:47 amobile ollama[17467]: load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-haswell.so Jul 15 12:24:47 amobile ollama[17467]: time=2025-07-15T12:24:47.437-04:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc) Jul 15 12:24:47 amobile ollama[17467]: time=2025-07-15T12:24:47.443-04:00 level=INFO source=ggml.go:359 msg="offloading 0 repeating layers to GPU" Jul 15 12:24:47 amobile ollama[17467]: time=2025-07-15T12:24:47.443-04:00 level=INFO source=ggml.go:363 msg="offloading output layer to CPU" Jul 15 12:24:47 amobile ollama[17467]: time=2025-07-15T12:24:47.443-04:00 level=INFO source=ggml.go:375 msg="offloaded 0/35 layers to GPU" Jul 15 12:24:47 amobile ollama[17467]: time=2025-07-15T12:24:47.443-04:00 level=INFO source=ggml.go:377 msg="model weights" buffer=CPU size="3.6 GiB" Jul 15 12:24:47 amobile ollama[17467]: time=2025-07-15T12:24:47.576-04:00 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server loading model" Jul 15 12:24:47 amobile ollama[17467]: time=2025-07-15T12:24:47.743-04:00 level=INFO source=ggml.go:666 msg="compute graph" backend=CPU buffer_type=CPU size="1.1 GiB" Jul 15 12:24:48 amobile ollama[17467]: time=2025-07-15T12:24:48.011-04:00 level=INFO source=ggml.go:666 msg="compute graph" backend=CPU buffer_type=CPU size="1.1 GiB" Jul 15 12:24:49 amobile ollama[17467]: time=2025-07-15T12:24:49.594-04:00 level=INFO source=server.go:637 msg="llama runner started in 2.27 seconds" Jul 15 12:24:49 amobile ollama[17467]: [GIN] 2025/07/15 - 12:24:49 | 200 | 2.646873422s | 127.0.0.1 | POST "/api/generate" Jul 15 12:28:09 amobile ollama[17467]: [GIN] 2025/07/15 - 12:28:09 | 200 | 2m41s | 127.0.0.1 | POST "/api/chat" Jul 15 12:30:20 amobile ollama[17467]: [GIN] 2025/07/15 - 12:30:20 | 200 | 46.245741541s | 127.0.0.1 | POST "/api/chat" Jul 15 12:31:18 amobile ollama[17467]: [GIN] 2025/07/15 - 12:31:18 | 200 | 40.434622179s | 127.0.0.1 | POST "/api/chat" Jul 15 12:32:04 amobile ollama[17467]: [GIN] 2025/07/15 - 12:32:04 | 200 | 11.571502803s | 127.0.0.1 | POST "/api/chat" Jul 15 14:05:59 amobile ollama[17467]: time=2025-07-15T14:05:59.853-04:00 level=INFO source=server.go:135 msg="system memory" total="31.2 GiB" free="27.3 GiB" free_swap="30.1 GiB" Jul 15 14:05:59 amobile ollama[17467]: time=2025-07-15T14:05:59.855-04:00 level=INFO source=server.go:175 msg=offload library=cpu layers.requested=-1 layers.model=35 layers.offload=0 layers.split="" memory.available="[27.3 GiB]" memory.gpu_overhead="0 B" memory.required.full="5.1 GiB" memory.required.partial="0 B" memory.required.kv="450.0 MiB" memory.required.allocations="[5.1 GiB]" memory.weights.total="2.3 GiB" memory.weights.repeating="1.8 GiB" memory.weights.nonrepeating="525.0 MiB" memory.graph.full="517.0 MiB" memory.graph.partial="1.0 GiB" projector.weights="795.9 MiB" projector.graph="1.0 GiB" Jul 15 14:05:59 amobile ollama[17467]: time=2025-07-15T14:05:59.955-04:00 level=INFO source=server.go:438 msg="starting llama server" cmd="/usr/local/bin/ollama runner --ollama-engine --model /usr/share/ollama/.ollama/models/blobs/sha256-aeda25e63ebd698fab8638ffb778e68bed908b960d39d0becc650fa981609d25 --ctx-size 8192 --batch-size 512 --threads 28 --no-mmap --parallel 2 --port 43873" Jul 15 14:05:59 amobile ollama[17467]: time=2025-07-15T14:05:59.955-04:00 level=INFO source=sched.go:483 msg="loaded runners" count=1 Jul 15 14:05:59 amobile ollama[17467]: time=2025-07-15T14:05:59.955-04:00 level=INFO source=server.go:598 msg="waiting for llama runner to start responding" Jul 15 14:05:59 amobile ollama[17467]: time=2025-07-15T14:05:59.955-04:00 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server not responding" Jul 15 14:05:59 amobile ollama[17467]: time=2025-07-15T14:05:59.975-04:00 level=INFO source=runner.go:925 msg="starting ollama engine" Jul 15 14:05:59 amobile ollama[17467]: time=2025-07-15T14:05:59.976-04:00 level=INFO source=runner.go:983 msg="Server listening on 127.0.0.1:43873" Jul 15 14:06:00 amobile ollama[17467]: time=2025-07-15T14:06:00.077-04:00 level=INFO source=ggml.go:92 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=883 num_key_values=36 Jul 15 14:06:00 amobile ollama[17467]: load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-haswell.so Jul 15 14:06:00 amobile ollama[17467]: time=2025-07-15T14:06:00.084-04:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc) Jul 15 14:06:00 amobile ollama[17467]: time=2025-07-15T14:06:00.090-04:00 level=INFO source=ggml.go:359 msg="offloading 0 repeating layers to GPU" Jul 15 14:06:00 amobile ollama[17467]: time=2025-07-15T14:06:00.090-04:00 level=INFO source=ggml.go:363 msg="offloading output layer to CPU" Jul 15 14:06:00 amobile ollama[17467]: time=2025-07-15T14:06:00.090-04:00 level=INFO source=ggml.go:375 msg="offloaded 0/35 layers to GPU" Jul 15 14:06:00 amobile ollama[17467]: time=2025-07-15T14:06:00.090-04:00 level=INFO source=ggml.go:377 msg="model weights" buffer=CPU size="3.6 GiB" Jul 15 14:06:00 amobile ollama[17467]: time=2025-07-15T14:06:00.207-04:00 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server loading model" Jul 15 14:06:00 amobile ollama[17467]: time=2025-07-15T14:06:00.381-04:00 level=INFO source=ggml.go:666 msg="compute graph" backend=CPU buffer_type=CPU size="1.1 GiB" Jul 15 14:06:00 amobile ollama[17467]: time=2025-07-15T14:06:00.559-04:00 level=INFO source=ggml.go:666 msg="compute graph" backend=CPU buffer_type=CPU size="1.1 GiB" Jul 15 14:06:01 amobile ollama[17467]: time=2025-07-15T14:06:01.977-04:00 level=INFO source=server.go:637 msg="llama runner started in 2.02 seconds" Jul 15 14:06:14 amobile ollama[17467]: [GIN] 2025/07/15 - 14:06:14 | 200 | 15.243695865s | 127.0.0.1 | POST "/api/chat" Jul 15 14:08:49 amobile ollama[17467]: [GIN] 2025/07/15 - 14:08:49 | 200 | 7.81688474s | 127.0.0.1 | POST "/api/chat" Jul 15 14:10:32 amobile ollama[17467]: [GIN] 2025/07/15 - 14:10:32 | 200 | 5.57302473s | 127.0.0.1 | POST "/api/chat" Jul 15 14:11:52 amobile ollama[17467]: [GIN] 2025/07/15 - 14:11:52 | 200 | 4.563861397s | 127.0.0.1 | POST "/api/chat" Jul 15 14:12:21 amobile ollama[17467]: [GIN] 2025/07/15 - 14:12:21 | 200 | 7.445482222s | 127.0.0.1 | POST "/api/chat" Jul 15 14:13:40 amobile ollama[17467]: [GIN] 2025/07/15 - 14:13:40 | 200 | 8.141572187s | 127.0.0.1 | POST "/api/chat" Jul 15 14:15:38 amobile ollama[17467]: [GIN] 2025/07/15 - 14:15:38 | 200 | 11.704692526s | 127.0.0.1 | POST "/api/chat" Jul 15 14:17:23 amobile ollama[17467]: [GIN] 2025/07/15 - 14:17:23 | 200 | 8.167364087s | 127.0.0.1 | POST "/api/chat" Jul 15 14:19:24 amobile ollama[17467]: [GIN] 2025/07/15 - 14:19:24 | 200 | 8.725596273s | 127.0.0.1 | POST "/api/chat" Jul 15 14:20:54 amobile ollama[17467]: [GIN] 2025/07/15 - 14:20:54 | 200 | 7.287166608s | 127.0.0.1 | POST "/api/chat" Jul 15 14:23:09 amobile ollama[17467]: [GIN] 2025/07/15 - 14:23:09 | 200 | 7.923928608s | 127.0.0.1 | POST "/api/chat" Jul 15 14:28:00 amobile ollama[17467]: [GIN] 2025/07/15 - 14:28:00 | 200 | 9.638210214s | 127.0.0.1 | POST "/api/chat" Jul 15 14:30:20 amobile ollama[17467]: [GIN] 2025/07/15 - 14:30:20 | 200 | 11.398374606s | 127.0.0.1 | POST "/api/chat" Jul 15 14:32:11 amobile ollama[17467]: [GIN] 2025/07/15 - 14:32:11 | 200 | 6.043563297s | 127.0.0.1 | POST "/api/chat" Jul 15 14:35:04 amobile ollama[17467]: [GIN] 2025/07/15 - 14:35:04 | 200 | 10.856669259s | 127.0.0.1 | POST "/api/chat" Jul 15 14:38:09 amobile ollama[17467]: [GIN] 2025/07/15 - 14:38:09 | 200 | 10.15686096s | 127.0.0.1 | POST "/api/chat" Jul 15 14:39:15 amobile ollama[17467]: [GIN] 2025/07/15 - 14:39:15 | 200 | 6.940100831s | 127.0.0.1 | POST "/api/chat" Jul 15 14:41:27 amobile ollama[17467]: [GIN] 2025/07/15 - 14:41:27 | 200 | 12.226914926s | 127.0.0.1 | POST "/api/chat" Jul 15 14:44:28 amobile ollama[17467]: [GIN] 2025/07/15 - 14:44:28 | 200 | 23.563901162s | 127.0.0.1 | POST "/api/chat" Jul 15 14:48:26 amobile ollama[17467]: [GIN] 2025/07/15 - 14:48:26 | 200 | 33.467552018s | 127.0.0.1 | POST "/api/chat" Jul 15 14:50:38 amobile ollama[17467]: [GIN] 2025/07/15 - 14:50:38 | 200 | 32.005724467s | 127.0.0.1 | POST "/api/chat" Jul 15 14:52:51 amobile ollama[17467]: [GIN] 2025/07/15 - 14:52:51 | 200 | 34.606237099s | 127.0.0.1 | POST "/api/chat" Jul 15 14:55:03 amobile ollama[17467]: [GIN] 2025/07/15 - 14:55:03 | 200 | 39.147461074s | 127.0.0.1 | POST "/api/chat" Jul 15 14:57:11 amobile ollama[17467]: [GIN] 2025/07/15 - 14:57:11 | 200 | 37.597258006s | 127.0.0.1 | POST "/api/chat" Jul 15 14:57:45 amobile ollama[17467]: [GIN] 2025/07/15 - 14:57:45 | 200 | 60.643µs | 127.0.0.1 | HEAD "/" Jul 15 14:57:45 amobile ollama[17467]: [GIN] 2025/07/15 - 14:57:45 | 200 | 75.829µs | 127.0.0.1 | GET "/api/ps" Jul 15 14:57:49 amobile ollama[17467]: [GIN] 2025/07/15 - 14:57:49 | 200 | 5.326923899s | 127.0.0.1 | POST "/api/chat" Jul 15 14:58:40 amobile ollama[17467]: [GIN] 2025/07/15 - 14:58:40 | 200 | 39.146µs | 127.0.0.1 | HEAD "/" Jul 15 14:58:40 amobile ollama[17467]: [GIN] 2025/07/15 - 14:58:40 | 200 | 37.34µs | 127.0.0.1 | GET "/api/ps" Jul 15 15:31:24 amobile ollama[17467]: [GIN] 2025/07/15 - 15:31:24 | 200 | 39.551µs | 127.0.0.1 | HEAD "/" Jul 15 15:31:24 amobile ollama[17467]: [GIN] 2025/07/15 - 15:31:24 | 200 | 193.354766ms | 127.0.0.1 | POST "/api/show" Jul 15 15:31:25 amobile ollama[17467]: time=2025-07-15T15:31:25.125-04:00 level=INFO source=server.go:135 msg="system memory" total="31.2 GiB" free="27.2 GiB" free_swap="30.1 GiB" Jul 15 15:31:25 amobile ollama[17467]: time=2025-07-15T15:31:25.127-04:00 level=INFO source=server.go:175 msg=offload library=cpu layers.requested=-1 layers.model=35 layers.offload=0 layers.split="" memory.available="[27.2 GiB]" memory.gpu_overhead="0 B" memory.required.full="5.1 GiB" memory.required.partial="0 B" memory.required.kv="450.0 MiB" memory.required.allocations="[5.1 GiB]" memory.weights.total="2.3 GiB" memory.weights.repeating="1.8 GiB" memory.weights.nonrepeating="525.0 MiB" memory.graph.full="517.0 MiB" memory.graph.partial="1.0 GiB" projector.weights="795.9 MiB" projector.graph="1.0 GiB" Jul 15 15:31:25 amobile ollama[17467]: time=2025-07-15T15:31:25.223-04:00 level=INFO source=server.go:438 msg="starting llama server" cmd="/usr/local/bin/ollama runner --ollama-engine --model /usr/share/ollama/.ollama/models/blobs/sha256-aeda25e63ebd698fab8638ffb778e68bed908b960d39d0becc650fa981609d25 --ctx-size 8192 --batch-size 512 --threads 28 --no-mmap --parallel 2 --port 35539" Jul 15 15:31:25 amobile ollama[17467]: time=2025-07-15T15:31:25.223-04:00 level=INFO source=sched.go:483 msg="loaded runners" count=1 Jul 15 15:31:25 amobile ollama[17467]: time=2025-07-15T15:31:25.223-04:00 level=INFO source=server.go:598 msg="waiting for llama runner to start responding" Jul 15 15:31:25 amobile ollama[17467]: time=2025-07-15T15:31:25.223-04:00 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server not responding" Jul 15 15:31:25 amobile ollama[17467]: time=2025-07-15T15:31:25.237-04:00 level=INFO source=runner.go:925 msg="starting ollama engine" Jul 15 15:31:25 amobile ollama[17467]: time=2025-07-15T15:31:25.237-04:00 level=INFO source=runner.go:983 msg="Server listening on 127.0.0.1:35539" Jul 15 15:31:25 amobile ollama[17467]: time=2025-07-15T15:31:25.331-04:00 level=INFO source=ggml.go:92 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=883 num_key_values=36 Jul 15 15:31:25 amobile ollama[17467]: load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-haswell.so Jul 15 15:31:25 amobile ollama[17467]: time=2025-07-15T15:31:25.337-04:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc) Jul 15 15:31:25 amobile ollama[17467]: time=2025-07-15T15:31:25.343-04:00 level=INFO source=ggml.go:359 msg="offloading 0 repeating layers to GPU" Jul 15 15:31:25 amobile ollama[17467]: time=2025-07-15T15:31:25.343-04:00 level=INFO source=ggml.go:363 msg="offloading output layer to CPU" Jul 15 15:31:25 amobile ollama[17467]: time=2025-07-15T15:31:25.343-04:00 level=INFO source=ggml.go:375 msg="offloaded 0/35 layers to GPU" Jul 15 15:31:25 amobile ollama[17467]: time=2025-07-15T15:31:25.343-04:00 level=INFO source=ggml.go:377 msg="model weights" buffer=CPU size="3.6 GiB" Jul 15 15:31:25 amobile ollama[17467]: time=2025-07-15T15:31:25.475-04:00 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server loading model" Jul 15 15:31:25 amobile ollama[17467]: time=2025-07-15T15:31:25.638-04:00 level=INFO source=ggml.go:666 msg="compute graph" backend=CPU buffer_type=CPU size="1.1 GiB" Jul 15 15:31:25 amobile ollama[17467]: time=2025-07-15T15:31:25.913-04:00 level=INFO source=ggml.go:666 msg="compute graph" backend=CPU buffer_type=CPU size="1.1 GiB" Jul 15 15:31:27 amobile ollama[17467]: time=2025-07-15T15:31:27.248-04:00 level=INFO source=server.go:637 msg="llama runner started in 2.02 seconds" Jul 15 15:31:27 amobile ollama[17467]: [GIN] 2025/07/15 - 15:31:27 | 200 | 2.458372036s | 127.0.0.1 | POST "/api/generate" Jul 15 15:32:06 amobile ollama[17467]: [GIN] 2025/07/15 - 15:32:06 | 200 | 12.149772723s | 127.0.0.1 | POST "/api/chat" Jul 15 15:32:50 amobile ollama[17467]: [GIN] 2025/07/15 - 15:32:50 | 200 | 19.018623689s | 127.0.0.1 | POST "/api/chat" Jul 15 15:33:40 amobile ollama[17467]: [GIN] 2025/07/15 - 15:33:40 | 200 | 19.08334206s | 127.0.0.1 | POST "/api/chat" Jul 15 15:34:59 amobile ollama[17467]: [GIN] 2025/07/15 - 15:34:59 | 200 | 26.413488556s | 127.0.0.1 | POST "/api/chat" Jul 15 15:37:22 amobile ollama[17467]: [GIN] 2025/07/15 - 15:37:22 | 200 | 27.871657486s | 127.0.0.1 | POST "/api/chat" Jul 15 15:38:35 amobile ollama[17467]: [GIN] 2025/07/15 - 15:38:35 | 200 | 27.072325445s | 127.0.0.1 | POST "/api/chat" Jul 15 15:40:25 amobile ollama[17467]: [GIN] 2025/07/15 - 15:40:25 | 200 | 38.748835117s | 127.0.0.1 | POST "/api/chat" Jul 15 15:42:57 amobile ollama[17467]: [GIN] 2025/07/15 - 15:42:57 | 200 | 3.887759995s | 127.0.0.1 | POST "/api/chat" Jul 15 15:47:15 amobile ollama[17467]: [GIN] 2025/07/15 - 15:47:15 | 200 | 32.823251665s | 127.0.0.1 | POST "/api/chat" Jul 15 15:47:48 amobile ollama[17467]: [GIN] 2025/07/15 - 15:47:48 | 200 | 8.797014247s | 127.0.0.1 | POST "/api/chat" Jul 15 15:48:16 amobile ollama[17467]: [GIN] 2025/07/15 - 15:48:16 | 200 | 23.792104523s | 127.0.0.1 | POST "/api/chat" Jul 15 15:48:38 amobile ollama[17467]: [GIN] 2025/07/15 - 15:48:38 | 200 | 9.21259348s | 127.0.0.1 | POST "/api/chat" Jul 15 15:49:31 amobile ollama[17467]: [GIN] 2025/07/15 - 15:49:31 | 200 | 21.710448987s | 127.0.0.1 | POST "/api/chat" Jul 15 15:49:35 amobile ollama[17467]: [GIN] 2025/07/15 - 15:49:35 | 200 | 32.825344864s | 127.0.0.1 | POST "/api/chat" Jul 15 15:49:40 amobile ollama[17467]: [GIN] 2025/07/15 - 15:49:40 | 200 | 7.626245474s | 127.0.0.1 | POST "/api/chat" Jul 15 15:50:47 amobile ollama[17467]: [GIN] 2025/07/15 - 15:50:47 | 200 | 6.319395124s | 127.0.0.1 | POST "/api/chat" Jul 15 15:50:55 amobile ollama[17467]: [GIN] 2025/07/15 - 15:50:55 | 200 | 16.482319877s | 127.0.0.1 | POST "/api/chat" Jul 15 16:00:37 amobile ollama[17467]: time=2025-07-15T16:00:37.011-04:00 level=INFO source=server.go:135 msg="system memory" total="31.2 GiB" free="27.1 GiB" free_swap="30.1 GiB" Jul 15 16:00:37 amobile ollama[17467]: time=2025-07-15T16:00:37.013-04:00 level=INFO source=server.go:175 msg=offload library=cpu layers.requested=-1 layers.model=35 layers.offload=0 layers.split="" memory.available="[27.1 GiB]" memory.gpu_overhead="0 B" memory.required.full="5.1 GiB" memory.required.partial="0 B" memory.required.kv="450.0 MiB" memory.required.allocations="[5.1 GiB]" memory.weights.total="2.3 GiB" memory.weights.repeating="1.8 GiB" memory.weights.nonrepeating="525.0 MiB" memory.graph.full="517.0 MiB" memory.graph.partial="1.0 GiB" projector.weights="795.9 MiB" projector.graph="1.0 GiB" Jul 15 16:00:37 amobile ollama[17467]: time=2025-07-15T16:00:37.108-04:00 level=INFO source=server.go:438 msg="starting llama server" cmd="/usr/local/bin/ollama runner --ollama-engine --model /usr/share/ollama/.ollama/models/blobs/sha256-aeda25e63ebd698fab8638ffb778e68bed908b960d39d0becc650fa981609d25 --ctx-size 8192 --batch-size 512 --threads 28 --no-mmap --parallel 2 --port 40683" Jul 15 16:00:37 amobile ollama[17467]: time=2025-07-15T16:00:37.108-04:00 level=INFO source=sched.go:483 msg="loaded runners" count=1 Jul 15 16:00:37 amobile ollama[17467]: time=2025-07-15T16:00:37.109-04:00 level=INFO source=server.go:598 msg="waiting for llama runner to start responding" Jul 15 16:00:37 amobile ollama[17467]: time=2025-07-15T16:00:37.109-04:00 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server not responding" Jul 15 16:00:37 amobile ollama[17467]: time=2025-07-15T16:00:37.136-04:00 level=INFO source=runner.go:925 msg="starting ollama engine" Jul 15 16:00:37 amobile ollama[17467]: time=2025-07-15T16:00:37.136-04:00 level=INFO source=runner.go:983 msg="Server listening on 127.0.0.1:40683" Jul 15 16:00:37 amobile ollama[17467]: time=2025-07-15T16:00:37.233-04:00 level=INFO source=ggml.go:92 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=883 num_key_values=36 Jul 15 16:00:37 amobile ollama[17467]: load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-haswell.so Jul 15 16:00:37 amobile ollama[17467]: time=2025-07-15T16:00:37.240-04:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc) Jul 15 16:00:37 amobile ollama[17467]: time=2025-07-15T16:00:37.245-04:00 level=INFO source=ggml.go:359 msg="offloading 0 repeating layers to GPU" Jul 15 16:00:37 amobile ollama[17467]: time=2025-07-15T16:00:37.245-04:00 level=INFO source=ggml.go:363 msg="offloading output layer to CPU" Jul 15 16:00:37 amobile ollama[17467]: time=2025-07-15T16:00:37.245-04:00 level=INFO source=ggml.go:375 msg="offloaded 0/35 layers to GPU" Jul 15 16:00:37 amobile ollama[17467]: time=2025-07-15T16:00:37.245-04:00 level=INFO source=ggml.go:377 msg="model weights" buffer=CPU size="3.6 GiB" Jul 15 16:00:37 amobile ollama[17467]: time=2025-07-15T16:00:37.361-04:00 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server loading model" Jul 15 16:00:37 amobile ollama[17467]: time=2025-07-15T16:00:37.537-04:00 level=INFO source=ggml.go:666 msg="compute graph" backend=CPU buffer_type=CPU size="1.1 GiB" Jul 15 16:00:37 amobile ollama[17467]: time=2025-07-15T16:00:37.714-04:00 level=INFO source=ggml.go:666 msg="compute graph" backend=CPU buffer_type=CPU size="1.1 GiB" Jul 15 16:00:39 amobile ollama[17467]: time=2025-07-15T16:00:39.125-04:00 level=INFO source=server.go:637 msg="llama runner started in 2.02 seconds" Jul 15 16:00:50 amobile ollama[17467]: [GIN] 2025/07/15 - 16:00:50 | 200 | 13.453382374s | 127.0.0.1 | POST "/api/chat" Jul 15 16:01:37 amobile ollama[17467]: [GIN] 2025/07/15 - 16:01:37 | 200 | 7.010397526s | 127.0.0.1 | POST "/api/chat" Jul 15 16:02:47 amobile ollama[17467]: [GIN] 2025/07/15 - 16:02:47 | 200 | 9.218088928s | 127.0.0.1 | POST "/api/chat" Jul 15 16:03:40 amobile ollama[17467]: [GIN] 2025/07/15 - 16:03:40 | 200 | 8.419104993s | 127.0.0.1 | POST "/api/chat" Jul 15 16:06:18 amobile ollama[17467]: [GIN] 2025/07/15 - 16:06:18 | 200 | 44.640608373s | 127.0.0.1 | POST "/api/chat" Jul 15 16:06:50 amobile ollama[17467]: [GIN] 2025/07/15 - 16:06:50 | 200 | 17.14751161s | 127.0.0.1 | POST "/api/chat" Jul 15 16:07:39 amobile ollama[17467]: [GIN] 2025/07/15 - 16:07:39 | 200 | 38.689371786s | 127.0.0.1 | POST "/api/chat" Jul 15 16:08:12 amobile ollama[17467]: [GIN] 2025/07/15 - 16:08:12 | 200 | 32.899070555s | 127.0.0.1 | POST "/api/chat" Jul 15 16:08:34 amobile ollama[17467]: time=2025-07-15T16:08:34.345-04:00 level=WARN source=runner.go:157 msg="truncating input prompt" limit=4096 prompt=19010 keep=4 new=4096 Jul 15 16:09:46 amobile ollama[17467]: [GIN] 2025/07/15 - 16:09:46 | 200 | 25.107300894s | 127.0.0.1 | POST "/api/chat" Jul 15 16:12:18 amobile ollama[17467]: [GIN] 2025/07/15 - 16:12:18 | 200 | 3m43s | 127.0.0.1 | POST "/api/chat" Jul 15 16:12:19 amobile ollama[17467]: [GIN] 2025/07/15 - 16:12:19 | 200 | 1m36s | 127.0.0.1 | POST "/api/chat" Jul 15 16:15:05 amobile ollama[17467]: [GIN] 2025/07/15 - 16:15:05 | 200 | 1m53s | 127.0.0.1 | POST "/api/chat" Jul 15 16:15:38 amobile ollama[17467]: [GIN] 2025/07/15 - 16:15:38 | 200 | 1m14s | 127.0.0.1 | POST "/api/chat" Jul 15 16:15:48 amobile ollama[17467]: [GIN] 2025/07/15 - 16:15:48 | 200 | 33.070491476s | 127.0.0.1 | POST "/api/chat" Jul 15 16:17:31 amobile ollama[17467]: [GIN] 2025/07/15 - 16:17:31 | 200 | 1m0s | 127.0.0.1 | POST "/api/chat" Jul 15 16:21:06 amobile ollama[17467]: [GIN] 2025/07/15 - 16:21:06 | 200 | 34.303595005s | 127.0.0.1 | POST "/api/chat" Jul 15 16:21:32 amobile ollama[17467]: panic: failed to decode batch: could not find a kv cache slot (cache: 2560 batch: 512) Jul 15 16:21:32 amobile ollama[17467]: goroutine 8 [running]: Jul 15 16:21:32 amobile ollama[17467]: github.com/ollama/ollama/runner/ollamarunner.(*Server).run(0xc0002b6900, {0x55867a095700, 0xc0000008c0}) Jul 15 16:21:32 amobile ollama[17467]: github.com/ollama/ollama/runner/ollamarunner/runner.go:364 +0x65 Jul 15 16:21:32 amobile ollama[17467]: created by github.com/ollama/ollama/runner/ollamarunner.Execute in goroutine 1 Jul 15 16:21:32 amobile ollama[17467]: github.com/ollama/ollama/runner/ollamarunner/runner.go:960 +0xa74 Jul 15 16:21:32 amobile ollama[17467]: time=2025-07-15T16:21:32.337-04:00 level=ERROR source=server.go:807 msg="post predict" error="Post \"http://127.0.0.1:40683/completion\": EOF" Jul 15 16:21:32 amobile ollama[17467]: [GIN] 2025/07/15 - 16:21:32 | 500 | 11.994152362s | 127.0.0.1 | POST "/api/chat" Jul 15 16:22:39 amobile ollama[17467]: time=2025-07-15T16:22:39.458-04:00 level=INFO source=server.go:135 msg="system memory" total="31.2 GiB" free="27.2 GiB" free_swap="30.1 GiB" Jul 15 16:22:39 amobile ollama[17467]: time=2025-07-15T16:22:39.461-04:00 level=INFO source=server.go:175 msg=offload library=cpu layers.requested=-1 layers.model=35 layers.offload=0 layers.split="" memory.available="[27.2 GiB]" memory.gpu_overhead="0 B" memory.required.full="5.1 GiB" memory.required.partial="0 B" memory.required.kv="450.0 MiB" memory.required.allocations="[5.1 GiB]" memory.weights.total="2.3 GiB" memory.weights.repeating="1.8 GiB" memory.weights.nonrepeating="525.0 MiB" memory.graph.full="517.0 MiB" memory.graph.partial="1.0 GiB" projector.weights="795.9 MiB" projector.graph="1.0 GiB" Jul 15 16:22:39 amobile ollama[17467]: time=2025-07-15T16:22:39.562-04:00 level=INFO source=server.go:438 msg="starting llama server" cmd="/usr/local/bin/ollama runner --ollama-engine --model /usr/share/ollama/.ollama/models/blobs/sha256-aeda25e63ebd698fab8638ffb778e68bed908b960d39d0becc650fa981609d25 --ctx-size 8192 --batch-size 512 --threads 28 --no-mmap --parallel 2 --port 45949" Jul 15 16:22:39 amobile ollama[17467]: time=2025-07-15T16:22:39.562-04:00 level=INFO source=sched.go:483 msg="loaded runners" count=1 Jul 15 16:22:39 amobile ollama[17467]: time=2025-07-15T16:22:39.562-04:00 level=INFO source=server.go:598 msg="waiting for llama runner to start responding" Jul 15 16:22:39 amobile ollama[17467]: time=2025-07-15T16:22:39.563-04:00 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server not responding" Jul 15 16:22:39 amobile ollama[17467]: time=2025-07-15T16:22:39.581-04:00 level=INFO source=runner.go:925 msg="starting ollama engine" Jul 15 16:22:39 amobile ollama[17467]: time=2025-07-15T16:22:39.582-04:00 level=INFO source=runner.go:983 msg="Server listening on 127.0.0.1:45949" Jul 15 16:22:39 amobile ollama[17467]: time=2025-07-15T16:22:39.678-04:00 level=INFO source=ggml.go:92 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=883 num_key_values=36 Jul 15 16:22:39 amobile ollama[17467]: load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-haswell.so Jul 15 16:22:39 amobile ollama[17467]: time=2025-07-15T16:22:39.684-04:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc) Jul 15 16:22:39 amobile ollama[17467]: time=2025-07-15T16:22:39.690-04:00 level=INFO source=ggml.go:359 msg="offloading 0 repeating layers to GPU" Jul 15 16:22:39 amobile ollama[17467]: time=2025-07-15T16:22:39.690-04:00 level=INFO source=ggml.go:363 msg="offloading output layer to CPU" Jul 15 16:22:39 amobile ollama[17467]: time=2025-07-15T16:22:39.690-04:00 level=INFO source=ggml.go:375 msg="offloaded 0/35 layers to GPU" Jul 15 16:22:39 amobile ollama[17467]: time=2025-07-15T16:22:39.690-04:00 level=INFO source=ggml.go:377 msg="model weights" buffer=CPU size="3.6 GiB" Jul 15 16:22:39 amobile ollama[17467]: time=2025-07-15T16:22:39.814-04:00 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server loading model" Jul 15 16:22:39 amobile ollama[17467]: time=2025-07-15T16:22:39.973-04:00 level=INFO source=ggml.go:666 msg="compute graph" backend=CPU buffer_type=CPU size="1.1 GiB" Jul 15 16:22:40 amobile ollama[17467]: time=2025-07-15T16:22:40.238-04:00 level=INFO source=ggml.go:666 msg="compute graph" backend=CPU buffer_type=CPU size="1.1 GiB" Jul 15 16:22:41 amobile ollama[17467]: time=2025-07-15T16:22:41.844-04:00 level=INFO source=server.go:637 msg="llama runner started in 2.28 seconds" Jul 15 16:23:21 amobile ollama[17467]: [GIN] 2025/07/15 - 16:23:21 | 200 | 42.358125818s | 127.0.0.1 | POST "/api/chat" Jul 15 16:35:49 amobile ollama[17467]: time=2025-07-15T16:35:49.962-04:00 level=INFO source=server.go:135 msg="system memory" total="31.2 GiB" free="27.2 GiB" free_swap="30.1 GiB" Jul 15 16:35:49 amobile ollama[17467]: time=2025-07-15T16:35:49.964-04:00 level=INFO source=server.go:175 msg=offload library=cpu layers.requested=-1 layers.model=35 layers.offload=0 layers.split="" memory.available="[27.2 GiB]" memory.gpu_overhead="0 B" memory.required.full="5.1 GiB" memory.required.partial="0 B" memory.required.kv="450.0 MiB" memory.required.allocations="[5.1 GiB]" memory.weights.total="2.3 GiB" memory.weights.repeating="1.8 GiB" memory.weights.nonrepeating="525.0 MiB" memory.graph.full="517.0 MiB" memory.graph.partial="1.0 GiB" projector.weights="795.9 MiB" projector.graph="1.0 GiB" Jul 15 16:35:50 amobile ollama[17467]: time=2025-07-15T16:35:50.062-04:00 level=INFO source=server.go:438 msg="starting llama server" cmd="/usr/local/bin/ollama runner --ollama-engine --model /usr/share/ollama/.ollama/models/blobs/sha256-aeda25e63ebd698fab8638ffb778e68bed908b960d39d0becc650fa981609d25 --ctx-size 8192 --batch-size 512 --threads 28 --no-mmap --parallel 2 --port 36385" Jul 15 16:35:50 amobile ollama[17467]: time=2025-07-15T16:35:50.062-04:00 level=INFO source=sched.go:483 msg="loaded runners" count=1 Jul 15 16:35:50 amobile ollama[17467]: time=2025-07-15T16:35:50.063-04:00 level=INFO source=server.go:598 msg="waiting for llama runner to start responding" Jul 15 16:35:50 amobile ollama[17467]: time=2025-07-15T16:35:50.063-04:00 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server not responding" Jul 15 16:35:50 amobile ollama[17467]: time=2025-07-15T16:35:50.080-04:00 level=INFO source=runner.go:925 msg="starting ollama engine" Jul 15 16:35:50 amobile ollama[17467]: time=2025-07-15T16:35:50.081-04:00 level=INFO source=runner.go:983 msg="Server listening on 127.0.0.1:36385" Jul 15 16:35:50 amobile ollama[17467]: time=2025-07-15T16:35:50.178-04:00 level=INFO source=ggml.go:92 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=883 num_key_values=36 Jul 15 16:35:50 amobile ollama[17467]: load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-haswell.so Jul 15 16:35:50 amobile ollama[17467]: time=2025-07-15T16:35:50.186-04:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc) Jul 15 16:35:50 amobile ollama[17467]: time=2025-07-15T16:35:50.193-04:00 level=INFO source=ggml.go:359 msg="offloading 0 repeating layers to GPU" Jul 15 16:35:50 amobile ollama[17467]: time=2025-07-15T16:35:50.193-04:00 level=INFO source=ggml.go:363 msg="offloading output layer to CPU" Jul 15 16:35:50 amobile ollama[17467]: time=2025-07-15T16:35:50.193-04:00 level=INFO source=ggml.go:375 msg="offloaded 0/35 layers to GPU" Jul 15 16:35:50 amobile ollama[17467]: time=2025-07-15T16:35:50.193-04:00 level=INFO source=ggml.go:377 msg="model weights" buffer=CPU size="3.6 GiB" Jul 15 16:35:50 amobile ollama[17467]: time=2025-07-15T16:35:50.314-04:00 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server loading model" Jul 15 16:35:50 amobile ollama[17467]: time=2025-07-15T16:35:50.481-04:00 level=INFO source=ggml.go:666 msg="compute graph" backend=CPU buffer_type=CPU size="1.1 GiB" Jul 15 16:35:50 amobile ollama[17467]: time=2025-07-15T16:35:50.656-04:00 level=INFO source=ggml.go:666 msg="compute graph" backend=CPU buffer_type=CPU size="1.1 GiB" Jul 15 16:35:52 amobile ollama[17467]: time=2025-07-15T16:35:52.094-04:00 level=INFO source=server.go:637 msg="llama runner started in 2.03 seconds" Jul 15 16:36:34 amobile ollama[17467]: [GIN] 2025/07/15 - 16:36:34 | 200 | 44.698539465s | 127.0.0.1 | POST "/api/chat" Jul 15 16:37:39 amobile ollama[17467]: [GIN] 2025/07/15 - 16:37:39 | 200 | 15.367562581s | 127.0.0.1 | POST "/api/chat" Jul 15 16:39:19 amobile ollama[17467]: [GIN] 2025/07/15 - 16:39:19 | 200 | 58.985695818s | 127.0.0.1 | POST "/api/chat" Jul 15 16:39:26 amobile ollama[17467]: [GIN] 2025/07/15 - 16:39:26 | 200 | 1m4s | 127.0.0.1 | POST "/api/chat" Jul 15 16:40:27 amobile ollama[17467]: [GIN] 2025/07/15 - 16:40:27 | 200 | 32.498111154s | 127.0.0.1 | POST "/api/chat" Jul 15 16:41:19 amobile ollama[17467]: [GIN] 2025/07/15 - 16:41:19 | 200 | 25.790679315s | 127.0.0.1 | POST "/api/chat" Jul 15 16:42:42 amobile ollama[17467]: [GIN] 2025/07/15 - 16:42:42 | 200 | 29.962983809s | 127.0.0.1 | POST "/api/chat" Jul 15 16:44:26 amobile ollama[17467]: [GIN] 2025/07/15 - 16:44:26 | 200 | 20.049622997s | 127.0.0.1 | POST "/api/chat" Jul 15 16:45:03 amobile ollama[17467]: [GIN] 2025/07/15 - 16:45:03 | 200 | 11.432715593s | 127.0.0.1 | POST "/api/chat" Jul 15 16:45:30 amobile ollama[17467]: [GIN] 2025/07/15 - 16:45:30 | 200 | 15.81967878s | 127.0.0.1 | POST "/api/chat" Jul 15 16:47:41 amobile ollama[17467]: [GIN] 2025/07/15 - 16:47:41 | 200 | 1m15s | 127.0.0.1 | POST "/api/chat" Jul 15 16:49:29 amobile ollama[17467]: [GIN] 2025/07/15 - 16:49:29 | 200 | 3m9s | 127.0.0.1 | POST "/api/chat" Jul 15 16:51:05 amobile ollama[17467]: [GIN] 2025/07/15 - 16:51:05 | 200 | 14.459985878s | 127.0.0.1 | POST "/api/chat" Jul 15 16:52:31 amobile ollama[17467]: panic: failed to decode batch: could not find a kv cache slot (cache: 2560 batch: 512) Jul 15 16:52:31 amobile ollama[17467]: goroutine 14 [running]: Jul 15 16:52:31 amobile ollama[17467]: github.com/ollama/ollama/runner/ollamarunner.(*Server).run(0xc00065e900, {0x555e3641c700, 0xc0001308c0}) Jul 15 16:52:31 amobile ollama[17467]: github.com/ollama/ollama/runner/ollamarunner/runner.go:364 +0x65 Jul 15 16:52:31 amobile ollama[17467]: created by github.com/ollama/ollama/runner/ollamarunner.Execute in goroutine 1 Jul 15 16:52:31 amobile ollama[17467]: github.com/ollama/ollama/runner/ollamarunner/runner.go:960 +0xa74 Jul 15 16:52:31 amobile ollama[17467]: time=2025-07-15T16:52:31.866-04:00 level=ERROR source=server.go:807 msg="post predict" error="Post \"http://127.0.0.1:36385/completion\": EOF" Jul 15 16:52:31 amobile ollama[17467]: [GIN] 2025/07/15 - 16:52:31 | 200 | 13.142628839s | 127.0.0.1 | POST "/api/chat" Jul 15 16:53:23 amobile ollama[17467]: time=2025-07-15T16:53:23.611-04:00 level=INFO source=server.go:135 msg="system memory" total="31.2 GiB" free="27.2 GiB" free_swap="30.1 GiB" Jul 15 16:53:23 amobile ollama[17467]: time=2025-07-15T16:53:23.613-04:00 level=INFO source=server.go:175 msg=offload library=cpu layers.requested=-1 layers.model=35 layers.offload=0 layers.split="" memory.available="[27.2 GiB]" memory.gpu_overhead="0 B" memory.required.full="5.1 GiB" memory.required.partial="0 B" memory.required.kv="450.0 MiB" memory.required.allocations="[5.1 GiB]" memory.weights.total="2.3 GiB" memory.weights.repeating="1.8 GiB" memory.weights.nonrepeating="525.0 MiB" memory.graph.full="517.0 MiB" memory.graph.partial="1.0 GiB" projector.weights="795.9 MiB" projector.graph="1.0 GiB" Jul 15 16:53:23 amobile ollama[17467]: time=2025-07-15T16:53:23.724-04:00 level=INFO source=server.go:438 msg="starting llama server" cmd="/usr/local/bin/ollama runner --ollama-engine --model /usr/share/ollama/.ollama/models/blobs/sha256-aeda25e63ebd698fab8638ffb778e68bed908b960d39d0becc650fa981609d25 --ctx-size 8192 --batch-size 512 --threads 28 --no-mmap --parallel 2 --port 39263" Jul 15 16:53:23 amobile ollama[17467]: time=2025-07-15T16:53:23.724-04:00 level=INFO source=sched.go:483 msg="loaded runners" count=1 Jul 15 16:53:23 amobile ollama[17467]: time=2025-07-15T16:53:23.724-04:00 level=INFO source=server.go:598 msg="waiting for llama runner to start responding" Jul 15 16:53:23 amobile ollama[17467]: time=2025-07-15T16:53:23.724-04:00 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server not responding" Jul 15 16:53:23 amobile ollama[17467]: time=2025-07-15T16:53:23.739-04:00 level=INFO source=runner.go:925 msg="starting ollama engine" Jul 15 16:53:23 amobile ollama[17467]: time=2025-07-15T16:53:23.740-04:00 level=INFO source=runner.go:983 msg="Server listening on 127.0.0.1:39263" Jul 15 16:53:23 amobile ollama[17467]: time=2025-07-15T16:53:23.836-04:00 level=INFO source=ggml.go:92 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=883 num_key_values=36 Jul 15 16:53:23 amobile ollama[17467]: load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-haswell.so Jul 15 16:53:23 amobile ollama[17467]: time=2025-07-15T16:53:23.842-04:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc) Jul 15 16:53:23 amobile ollama[17467]: time=2025-07-15T16:53:23.848-04:00 level=INFO source=ggml.go:359 msg="offloading 0 repeating layers to GPU" Jul 15 16:53:23 amobile ollama[17467]: time=2025-07-15T16:53:23.848-04:00 level=INFO source=ggml.go:363 msg="offloading output layer to CPU" Jul 15 16:53:23 amobile ollama[17467]: time=2025-07-15T16:53:23.848-04:00 level=INFO source=ggml.go:375 msg="offloaded 0/35 layers to GPU" Jul 15 16:53:23 amobile ollama[17467]: time=2025-07-15T16:53:23.848-04:00 level=INFO source=ggml.go:377 msg="model weights" buffer=CPU size="3.6 GiB" Jul 15 16:53:23 amobile ollama[17467]: time=2025-07-15T16:53:23.976-04:00 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server loading model" Jul 15 16:53:24 amobile ollama[17467]: time=2025-07-15T16:53:24.139-04:00 level=INFO source=ggml.go:666 msg="compute graph" backend=CPU buffer_type=CPU size="1.1 GiB" Jul 15 16:53:24 amobile ollama[17467]: time=2025-07-15T16:53:24.409-04:00 level=INFO source=ggml.go:666 msg="compute graph" backend=CPU buffer_type=CPU size="1.1 GiB" Jul 15 16:53:26 amobile ollama[17467]: time=2025-07-15T16:53:26.009-04:00 level=INFO source=server.go:637 msg="llama runner started in 2.28 seconds" Jul 15 16:54:38 amobile ollama[17467]: [GIN] 2025/07/15 - 16:54:38 | 200 | 1m15s | 127.0.0.1 | POST "/api/chat" Jul 15 16:56:20 amobile ollama[17467]: [GIN] 2025/07/15 - 16:56:20 | 200 | 1m9s | 127.0.0.1 | POST "/api/chat" Jul 15 16:59:05 amobile ollama[17467]: [GIN] 2025/07/15 - 16:59:05 | 200 | 1m34s | 127.0.0.1 | POST "/api/chat" Jul 15 17:00:40 amobile ollama[17467]: [GIN] 2025/07/15 - 17:00:40 | 200 | 1m33s | 127.0.0.1 | POST "/api/chat" Jul 15 17:01:10 amobile ollama[17467]: [GIN] 2025/07/15 - 17:01:10 | 200 | 140.204µs | 127.0.0.1 | GET "/api/version" Jul 15 17:02:26 amobile ollama[17467]: [GIN] 2025/07/15 - 17:02:26 | 200 | 1m37s | 127.0.0.1 | POST "/api/chat" Jul 15 17:35:05 amobile ollama[17467]: [GIN] 2025/07/15 - 17:35:05 | 200 | 52.773µs | 127.0.0.1 | HEAD "/" Jul 15 17:35:05 amobile ollama[17467]: [GIN] 2025/07/15 - 17:35:05 | 200 | 204.188719ms | 127.0.0.1 | POST "/api/show" Jul 15 17:35:05 amobile ollama[17467]: time=2025-07-15T17:35:05.674-04:00 level=INFO source=server.go:135 msg="system memory" total="31.2 GiB" free="27.2 GiB" free_swap="30.1 GiB" Jul 15 17:35:05 amobile ollama[17467]: time=2025-07-15T17:35:05.676-04:00 level=INFO source=server.go:175 msg=offload library=cpu layers.requested=-1 layers.model=35 layers.offload=0 layers.split="" memory.available="[27.2 GiB]" memory.gpu_overhead="0 B" memory.required.full="5.1 GiB" memory.required.partial="0 B" memory.required.kv="450.0 MiB" memory.required.allocations="[5.1 GiB]" memory.weights.total="2.3 GiB" memory.weights.repeating="1.8 GiB" memory.weights.nonrepeating="525.0 MiB" memory.graph.full="517.0 MiB" memory.graph.partial="1.0 GiB" projector.weights="795.9 MiB" projector.graph="1.0 GiB" Jul 15 17:35:05 amobile ollama[17467]: time=2025-07-15T17:35:05.774-04:00 level=INFO source=server.go:438 msg="starting llama server" cmd="/usr/local/bin/ollama runner --ollama-engine --model /usr/share/ollama/.ollama/models/blobs/sha256-aeda25e63ebd698fab8638ffb778e68bed908b960d39d0becc650fa981609d25 --ctx-size 8192 --batch-size 512 --threads 28 --no-mmap --parallel 2 --port 36275" Jul 15 17:35:05 amobile ollama[17467]: time=2025-07-15T17:35:05.774-04:00 level=INFO source=sched.go:483 msg="loaded runners" count=1 Jul 15 17:35:05 amobile ollama[17467]: time=2025-07-15T17:35:05.774-04:00 level=INFO source=server.go:598 msg="waiting for llama runner to start responding" Jul 15 17:35:05 amobile ollama[17467]: time=2025-07-15T17:35:05.774-04:00 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server not responding" Jul 15 17:35:05 amobile ollama[17467]: time=2025-07-15T17:35:05.788-04:00 level=INFO source=runner.go:925 msg="starting ollama engine" Jul 15 17:35:05 amobile ollama[17467]: time=2025-07-15T17:35:05.790-04:00 level=INFO source=runner.go:983 msg="Server listening on 127.0.0.1:36275" Jul 15 17:35:05 amobile ollama[17467]: time=2025-07-15T17:35:05.880-04:00 level=INFO source=ggml.go:92 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=883 num_key_values=36 Jul 15 17:35:05 amobile ollama[17467]: load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-haswell.so Jul 15 17:35:05 amobile ollama[17467]: time=2025-07-15T17:35:05.888-04:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc) Jul 15 17:35:05 amobile ollama[17467]: time=2025-07-15T17:35:05.893-04:00 level=INFO source=ggml.go:359 msg="offloading 0 repeating layers to GPU" Jul 15 17:35:05 amobile ollama[17467]: time=2025-07-15T17:35:05.893-04:00 level=INFO source=ggml.go:363 msg="offloading output layer to CPU" Jul 15 17:35:05 amobile ollama[17467]: time=2025-07-15T17:35:05.893-04:00 level=INFO source=ggml.go:375 msg="offloaded 0/35 layers to GPU" Jul 15 17:35:05 amobile ollama[17467]: time=2025-07-15T17:35:05.893-04:00 level=INFO source=ggml.go:377 msg="model weights" buffer=CPU size="3.6 GiB" Jul 15 17:35:06 amobile ollama[17467]: time=2025-07-15T17:35:06.025-04:00 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server loading model" Jul 15 17:35:06 amobile ollama[17467]: time=2025-07-15T17:35:06.174-04:00 level=INFO source=ggml.go:666 msg="compute graph" backend=CPU buffer_type=CPU size="1.1 GiB" Jul 15 17:35:06 amobile ollama[17467]: time=2025-07-15T17:35:06.447-04:00 level=INFO source=ggml.go:666 msg="compute graph" backend=CPU buffer_type=CPU size="1.1 GiB" Jul 15 17:35:08 amobile ollama[17467]: time=2025-07-15T17:35:08.049-04:00 level=INFO source=server.go:637 msg="llama runner started in 2.27 seconds" Jul 15 17:35:08 amobile ollama[17467]: [GIN] 2025/07/15 - 17:35:08 | 200 | 2.657473117s | 127.0.0.1 | POST "/api/generate" Jul 15 17:39:08 amobile ollama[17467]: [GIN] 2025/07/15 - 17:39:08 | 200 | 2m2s | 127.0.0.1 | POST "/api/chat" Jul 15 17:41:57 amobile ollama[17467]: [GIN] 2025/07/15 - 17:41:57 | 200 | 42.836773699s | 127.0.0.1 | POST "/api/chat" Jul 15 17:43:41 amobile ollama[17467]: [GIN] 2025/07/15 - 17:43:41 | 200 | 31.474378088s | 127.0.0.1 | POST "/api/chat" ```
Author
Owner

@rick-github commented on GitHub (Jul 15, 2025):

Jul 15 16:21:32 amobile ollama[17467]: panic: failed to decode batch: could not find a kv cache slot (cache: 2560 batch: 512)
Jul 15 16:21:32 amobile ollama[17467]: goroutine 8 [running]:
Jul 15 16:21:32 amobile ollama[17467]: github.com/ollama/ollama/runner/ollamarunner.(*Server).run(0xc0002b6900, {0x55867a095700, 0xc0000008c0})
Jul 15 16:21:32 amobile ollama[17467]:         github.com/ollama/ollama/runner/ollamarunner/runner.go:364 +0x65
Jul 15 16:21:32 amobile ollama[17467]: created by github.com/ollama/ollama/runner/ollamarunner.Execute in goroutine 1
Jul 15 16:21:32 amobile ollama[17467]:         github.com/ollama/ollama/runner/ollamarunner/runner.go:960 +0xa74
Jul 15 16:21:32 amobile ollama[17467]: time=2025-07-15T16:21:32.337-04:00 level=ERROR source=server.go:807 msg="post predict" error="Post \"http://127.0.0.1:40683/completion\": EOF"

Could be #10127. A workaround is to set OLLAMA_NUM_PARALLEL=1.

<!-- gh-comment-id:3075958270 --> @rick-github commented on GitHub (Jul 15, 2025): ``` Jul 15 16:21:32 amobile ollama[17467]: panic: failed to decode batch: could not find a kv cache slot (cache: 2560 batch: 512) Jul 15 16:21:32 amobile ollama[17467]: goroutine 8 [running]: Jul 15 16:21:32 amobile ollama[17467]: github.com/ollama/ollama/runner/ollamarunner.(*Server).run(0xc0002b6900, {0x55867a095700, 0xc0000008c0}) Jul 15 16:21:32 amobile ollama[17467]: github.com/ollama/ollama/runner/ollamarunner/runner.go:364 +0x65 Jul 15 16:21:32 amobile ollama[17467]: created by github.com/ollama/ollama/runner/ollamarunner.Execute in goroutine 1 Jul 15 16:21:32 amobile ollama[17467]: github.com/ollama/ollama/runner/ollamarunner/runner.go:960 +0xa74 Jul 15 16:21:32 amobile ollama[17467]: time=2025-07-15T16:21:32.337-04:00 level=ERROR source=server.go:807 msg="post predict" error="Post \"http://127.0.0.1:40683/completion\": EOF" ``` Could be #10127. A workaround is to set `OLLAMA_NUM_PARALLEL=1`.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#54061