[GH-ISSUE #11434] crash when running gemma3 model and ask it to calculate the time between two dates. #54061

New Issue

GiteaMirror · 2026-04-29T05:10:08-05:00

GiteaMirror commented

2026-04-29 05:10:08 -05:00

Originally created by @abcbarryn on GitHub (Jul 15, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/11434

What is the issue?

I asked the model to calculate the length of time between July 15th, 2025 and July 25th, 2025 and it start shuttering and then crashed.

Relevant log output

Jul 15 01:24:42 amobile ollama[17467]: time=2025-07-15T01:24:42.026-04:00 level=INFO source=server.go:135 msg="system memory" total="31.2 GiB" free="27.6 GiB" free_swap="29.7 GiB"
Jul 15 01:24:42 amobile ollama[17467]: time=2025-07-15T01:24:42.029-04:00 level=INFO source=server.go:175 msg=offload library=cpu layers.requested=-1 layers.model=35 layers.offload=0 layers.split="" memory.available="[27.6 GiB]" memory.gpu_overhead="0 B" memory.required.full="5.1 GiB" memory.required.partial="0 B" memory.required.kv="450.0 MiB" memory.required.allocations="[5.1 GiB]" memory.weights.total="2.3 GiB" memory.weights.repeating="1.8 GiB" memory.weights.nonrepeating="525.0 MiB" memory.graph.full="517.0 MiB" memory.graph.partial="1.0 GiB" projector.weights="795.9 MiB" projector.graph="1.0 GiB"
Jul 15 01:24:42 amobile ollama[17467]: time=2025-07-15T01:24:42.148-04:00 level=INFO source=server.go:438 msg="starting llama server" cmd="/usr/local/bin/ollama runner --ollama-engine --model /usr/share/ollama/.ollama/models/blobs/sha256-aeda25e63ebd698fab8638ffb778e68bed908b960d39d0becc650fa981609d25 --ctx-size 8192 --batch-size 512 --threads 28 --no-mmap --parallel 2 --port 41521"
Jul 15 01:24:42 amobile ollama[17467]: time=2025-07-15T01:24:42.148-04:00 level=INFO source=sched.go:483 msg="loaded runners" count=1

OS

SuSE Linux

GPU

None

CPU

Two CPUs with 14 cores each.

Ollama version

ollama version is 0.9.6

Originally created by @abcbarryn on GitHub (Jul 15, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/11434 ### What is the issue? I asked the model to calculate the length of time between July 15th, 2025 and July 25th, 2025 and it start shuttering and then crashed. ### Relevant log output ```shell Jul 15 01:24:42 amobile ollama[17467]: time=2025-07-15T01:24:42.026-04:00 level=INFO source=server.go:135 msg="system memory" total="31.2 GiB" free="27.6 GiB" free_swap="29.7 GiB" Jul 15 01:24:42 amobile ollama[17467]: time=2025-07-15T01:24:42.029-04:00 level=INFO source=server.go:175 msg=offload library=cpu layers.requested=-1 layers.model=35 layers.offload=0 layers.split="" memory.available="[27.6 GiB]" memory.gpu_overhead="0 B" memory.required.full="5.1 GiB" memory.required.partial="0 B" memory.required.kv="450.0 MiB" memory.required.allocations="[5.1 GiB]" memory.weights.total="2.3 GiB" memory.weights.repeating="1.8 GiB" memory.weights.nonrepeating="525.0 MiB" memory.graph.full="517.0 MiB" memory.graph.partial="1.0 GiB" projector.weights="795.9 MiB" projector.graph="1.0 GiB" Jul 15 01:24:42 amobile ollama[17467]: time=2025-07-15T01:24:42.148-04:00 level=INFO source=server.go:438 msg="starting llama server" cmd="/usr/local/bin/ollama runner --ollama-engine --model /usr/share/ollama/.ollama/models/blobs/sha256-aeda25e63ebd698fab8638ffb778e68bed908b960d39d0becc650fa981609d25 --ctx-size 8192 --batch-size 512 --threads 28 --no-mmap --parallel 2 --port 41521" Jul 15 01:24:42 amobile ollama[17467]: time=2025-07-15T01:24:42.148-04:00 level=INFO source=sched.go:483 msg="loaded runners" count=1 ``` ### OS SuSE Linux ### GPU None ### CPU Two CPUs with 14 cores each. ### Ollama version ollama version is 0.9.6

GiteaMirror added the bug label 2026-04-29 05:10:08 -05:00

GiteaMirror closed this issue

2026-04-29 05:10:09 -05:00

GiteaMirror commented

2026-04-29 05:10:11 -05:00

@rick-github commented on GitHub (Jul 15, 2025):

The log doesn't show a crash.

@rick-github commented on GitHub (Jul 15, 2025): The log doesn't show a crash.

GiteaMirror commented

2026-04-29 05:10:12 -05:00

@abcbarryn commented on GitHub (Jul 15, 2025):

Maybe the model crashed? Something crashed. The terminal was stuck in a shuttering output loop until I pressed CTRL-C.

@abcbarryn commented on GitHub (Jul 15, 2025): Maybe the model crashed? Something crashed. The terminal was stuck in a shuttering output loop until I pressed CTRL-C.

GiteaMirror commented

2026-04-29 05:10:13 -05:00

@rick-github commented on GitHub (Jul 15, 2025):

Perhaps if you add more than 4 lines of log.

@rick-github commented on GitHub (Jul 15, 2025): Perhaps if you add more than 4 lines of log.

GiteaMirror commented

2026-04-29 05:10:14 -05:00

@abcbarryn commented on GitHub (Jul 15, 2025):

Jul 15 01:06:05 amobile ollama[17467]: [GIN] 2025/07/15 - 01:06:05 | 200 |       43.89µs |       127.0.0.1 | HEAD     "/"
Jul 15 01:06:05 amobile ollama[17467]: [GIN] 2025/07/15 - 01:06:05 | 200 |      34.607µs |       127.0.0.1 | GET      "/api/ps"
Jul 15 01:06:15 amobile ollama[17467]: [GIN] 2025/07/15 - 01:06:15 | 200 |      34.303µs |       127.0.0.1 | HEAD     "/"
Jul 15 01:06:15 amobile ollama[17467]: [GIN] 2025/07/15 - 01:06:15 | 200 |  199.449466ms |       127.0.0.1 | POST     "/api/show"
Jul 15 01:06:16 amobile ollama[17467]: time=2025-07-15T01:06:16.135-04:00 level=INFO source=server.go:135 msg="system memory" total="31.2 GiB" free="27.7 GiB" free_swap="29.7 GiB"
Jul 15 01:06:16 amobile ollama[17467]: time=2025-07-15T01:06:16.138-04:00 level=INFO source=server.go:175 msg=offload library=cpu layers.requested=-1 layers.model=35 layers.offload=0 layers.split="" memory.available="[27.7 GiB]" memory.gpu_overhead="0 B" memory.required.full="5.1 GiB" memory.required.partial="0 B" memory.required.kv="450.0 MiB" memory.required.allocations="[5.1 GiB]" memory.weights.total="2.3 GiB" memory.weights.repeating="1.8 GiB" memory.weights.nonrepeating="525.0 MiB" memory.graph.full="517.0 MiB" memory.graph.partial="1.0 GiB" projector.weights="795.9 MiB" projector.graph="1.0 GiB"
Jul 15 01:06:16 amobile ollama[17467]: time=2025-07-15T01:06:16.236-04:00 level=INFO source=server.go:438 msg="starting llama server" cmd="/usr/local/bin/ollama runner --ollama-engine --model /usr/share/ollama/.ollama/models/blobs/sha256-aeda25e63ebd698fab8638ffb778e68bed908b960d39d0becc650fa981609d25 --ctx-size 8192 --batch-size 512 --threads 28 --no-mmap --parallel 2 --port 33357"
Jul 15 01:06:16 amobile ollama[17467]: time=2025-07-15T01:06:16.237-04:00 level=INFO source=sched.go:483 msg="loaded runners" count=1
Jul 15 01:06:16 amobile ollama[17467]: time=2025-07-15T01:06:16.237-04:00 level=INFO source=server.go:598 msg="waiting for llama runner to start responding"
Jul 15 01:06:16 amobile ollama[17467]: time=2025-07-15T01:06:16.237-04:00 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server not responding"
Jul 15 01:06:16 amobile ollama[17467]: time=2025-07-15T01:06:16.253-04:00 level=INFO source=runner.go:925 msg="starting ollama engine"
Jul 15 01:06:16 amobile ollama[17467]: time=2025-07-15T01:06:16.254-04:00 level=INFO source=runner.go:983 msg="Server listening on 127.0.0.1:33357"
Jul 15 01:06:16 amobile ollama[17467]: time=2025-07-15T01:06:16.344-04:00 level=INFO source=ggml.go:92 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=883 num_key_values=36
Jul 15 01:06:16 amobile ollama[17467]: load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-haswell.so
Jul 15 01:06:16 amobile ollama[17467]: time=2025-07-15T01:06:16.351-04:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc)
Jul 15 01:06:16 amobile ollama[17467]: time=2025-07-15T01:06:16.357-04:00 level=INFO source=ggml.go:359 msg="offloading 0 repeating layers to GPU"
Jul 15 01:06:16 amobile ollama[17467]: time=2025-07-15T01:06:16.357-04:00 level=INFO source=ggml.go:363 msg="offloading output layer to CPU"
Jul 15 01:06:16 amobile ollama[17467]: time=2025-07-15T01:06:16.357-04:00 level=INFO source=ggml.go:375 msg="offloaded 0/35 layers to GPU"
Jul 15 01:06:16 amobile ollama[17467]: time=2025-07-15T01:06:16.357-04:00 level=INFO source=ggml.go:377 msg="model weights" buffer=CPU size="3.6 GiB"
Jul 15 01:06:16 amobile ollama[17467]: time=2025-07-15T01:06:16.489-04:00 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server loading model"
Jul 15 01:06:16 amobile ollama[17467]: time=2025-07-15T01:06:16.666-04:00 level=INFO source=ggml.go:666 msg="compute graph" backend=CPU buffer_type=CPU size="1.1 GiB"
Jul 15 01:06:16 amobile ollama[17467]: time=2025-07-15T01:06:16.962-04:00 level=INFO source=ggml.go:666 msg="compute graph" backend=CPU buffer_type=CPU size="1.1 GiB"
Jul 15 01:06:17 amobile ollama[17467]: time=2025-07-15T01:06:17.762-04:00 level=INFO source=server.go:637 msg="llama runner started in 1.53 seconds"
Jul 15 01:06:17 amobile ollama[17467]: [GIN] 2025/07/15 - 01:06:17 | 200 |   1.92506674s |       127.0.0.1 | POST     "/api/generate"
Jul 15 01:09:41 amobile ollama[17467]: [GIN] 2025/07/15 - 01:09:41 | 200 |         2m42s |       127.0.0.1 | POST     "/api/chat"
Jul 15 01:12:48 amobile ollama[17467]: [GIN] 2025/07/15 - 01:12:48 | 200 |          2m0s |       127.0.0.1 | POST     "/api/chat"
Jul 15 01:19:34 amobile ollama[17467]: [GIN] 2025/07/15 - 01:19:34 | 200 |         2m42s |       127.0.0.1 | POST     "/api/chat"
Jul 15 01:24:42 amobile ollama[17467]: time=2025-07-15T01:24:42.026-04:00 level=INFO source=server.go:135 msg="system memory" total="31.2 GiB" free="27.6 GiB" free_swap="29.7 GiB"
Jul 15 01:24:42 amobile ollama[17467]: time=2025-07-15T01:24:42.029-04:00 level=INFO source=server.go:175 msg=offload library=cpu layers.requested=-1 layers.model=35 layers.offload=0 layers.split="" memory.available="[27.6 GiB]" memory.gpu_overhead="0 B" memory.required.full="5.1 GiB" memory.required.partial="0 B" memory.required.kv="450.0 MiB" memory.required.allocations="[5.1 GiB]" memory.weights.total="2.3 GiB" memory.weights.repeating="1.8 GiB" memory.weights.nonrepeating="525.0 MiB" memory.graph.full="517.0 MiB" memory.graph.partial="1.0 GiB" projector.weights="795.9 MiB" projector.graph="1.0 GiB"
Jul 15 01:24:42 amobile ollama[17467]: time=2025-07-15T01:24:42.148-04:00 level=INFO source=server.go:438 msg="starting llama server" cmd="/usr/local/bin/ollama runner --ollama-engine --model /usr/share/ollama/.ollama/models/blobs/sha256-aeda25e63ebd698fab8638ffb778e68bed908b960d39d0becc650fa981609d25 --ctx-size 8192 --batch-size 512 --threads 28 --no-mmap --parallel 2 --port 41521"
Jul 15 01:24:42 amobile ollama[17467]: time=2025-07-15T01:24:42.148-04:00 level=INFO source=sched.go:483 msg="loaded runners" count=1
Jul 15 01:24:42 amobile ollama[17467]: time=2025-07-15T01:24:42.148-04:00 level=INFO source=server.go:598 msg="waiting for llama runner to start responding"
Jul 15 01:24:42 amobile ollama[17467]: time=2025-07-15T01:24:42.149-04:00 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server not responding"
Jul 15 01:24:42 amobile ollama[17467]: time=2025-07-15T01:24:42.168-04:00 level=INFO source=runner.go:925 msg="starting ollama engine"
Jul 15 01:24:42 amobile ollama[17467]: time=2025-07-15T01:24:42.168-04:00 level=INFO source=runner.go:983 msg="Server listening on 127.0.0.1:41521"
Jul 15 01:24:42 amobile ollama[17467]: time=2025-07-15T01:24:42.273-04:00 level=INFO source=ggml.go:92 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=883 num_key_values=36
Jul 15 01:24:42 amobile ollama[17467]: load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-haswell.so
Jul 15 01:24:42 amobile ollama[17467]: time=2025-07-15T01:24:42.280-04:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc)
Jul 15 01:24:42 amobile ollama[17467]: time=2025-07-15T01:24:42.285-04:00 level=INFO source=ggml.go:359 msg="offloading 0 repeating layers to GPU"
Jul 15 01:24:42 amobile ollama[17467]: time=2025-07-15T01:24:42.286-04:00 level=INFO source=ggml.go:363 msg="offloading output layer to CPU"
Jul 15 01:24:42 amobile ollama[17467]: time=2025-07-15T01:24:42.286-04:00 level=INFO source=ggml.go:375 msg="offloaded 0/35 layers to GPU"
Jul 15 01:24:42 amobile ollama[17467]: time=2025-07-15T01:24:42.286-04:00 level=INFO source=ggml.go:377 msg="model weights" buffer=CPU size="3.6 GiB"
Jul 15 01:24:42 amobile ollama[17467]: time=2025-07-15T01:24:42.401-04:00 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server loading model"
Jul 15 01:24:42 amobile ollama[17467]: time=2025-07-15T01:24:42.570-04:00 level=INFO source=ggml.go:666 msg="compute graph" backend=CPU buffer_type=CPU size="1.1 GiB"
Jul 15 01:24:42 amobile ollama[17467]: time=2025-07-15T01:24:42.747-04:00 level=INFO source=ggml.go:666 msg="compute graph" backend=CPU buffer_type=CPU size="1.1 GiB"
Jul 15 01:24:43 amobile ollama[17467]: time=2025-07-15T01:24:43.911-04:00 level=INFO source=server.go:637 msg="llama runner started in 1.76 seconds"
Jul 15 01:27:35 amobile ollama[17467]: [GIN] 2025/07/15 - 01:27:35 | 200 |         2m53s |       127.0.0.1 | POST     "/api/chat"
Jul 15 01:33:05 amobile ollama[17467]: [GIN] 2025/07/15 - 01:33:05 | 200 |  37.98540726s |       127.0.0.1 | POST     "/api/chat"
Jul 15 01:40:16 amobile ollama[17467]: time=2025-07-15T01:40:16.619-04:00 level=INFO source=server.go:135 msg="system memory" total="31.2 GiB" free="27.6 GiB" free_swap="29.7 GiB"
Jul 15 01:40:16 amobile ollama[17467]: time=2025-07-15T01:40:16.621-04:00 level=INFO source=server.go:175 msg=offload library=cpu layers.requested=-1 layers.model=35 layers.offload=0 layers.split="" memory.available="[27.6 GiB]" memory.gpu_overhead="0 B" memory.required.full="5.1 GiB" memory.required.partial="0 B" memory.required.kv="450.0 MiB" memory.required.allocations="[5.1 GiB]" memory.weights.total="2.3 GiB" memory.weights.repeating="1.8 GiB" memory.weights.nonrepeating="525.0 MiB" memory.graph.full="517.0 MiB" memory.graph.partial="1.0 GiB" projector.weights="795.9 MiB" projector.graph="1.0 GiB"
Jul 15 01:40:16 amobile ollama[17467]: time=2025-07-15T01:40:16.716-04:00 level=INFO source=server.go:438 msg="starting llama server" cmd="/usr/local/bin/ollama runner --ollama-engine --model /usr/share/ollama/.ollama/models/blobs/sha256-aeda25e63ebd698fab8638ffb778e68bed908b960d39d0becc650fa981609d25 --ctx-size 8192 --batch-size 512 --threads 28 --no-mmap --parallel 2 --port 41437"
Jul 15 01:40:16 amobile ollama[17467]: time=2025-07-15T01:40:16.716-04:00 level=INFO source=sched.go:483 msg="loaded runners" count=1
Jul 15 01:40:16 amobile ollama[17467]: time=2025-07-15T01:40:16.716-04:00 level=INFO source=server.go:598 msg="waiting for llama runner to start responding"
Jul 15 01:40:16 amobile ollama[17467]: time=2025-07-15T01:40:16.717-04:00 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server not responding"
Jul 15 01:40:16 amobile ollama[17467]: time=2025-07-15T01:40:16.734-04:00 level=INFO source=runner.go:925 msg="starting ollama engine"
Jul 15 01:40:16 amobile ollama[17467]: time=2025-07-15T01:40:16.736-04:00 level=INFO source=runner.go:983 msg="Server listening on 127.0.0.1:41437"
Jul 15 01:40:16 amobile ollama[17467]: time=2025-07-15T01:40:16.827-04:00 level=INFO source=ggml.go:92 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=883 num_key_values=36
Jul 15 01:40:16 amobile ollama[17467]: load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-haswell.so
Jul 15 01:40:16 amobile ollama[17467]: time=2025-07-15T01:40:16.834-04:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc)
Jul 15 01:40:16 amobile ollama[17467]: time=2025-07-15T01:40:16.841-04:00 level=INFO source=ggml.go:359 msg="offloading 0 repeating layers to GPU"
Jul 15 01:40:16 amobile ollama[17467]: time=2025-07-15T01:40:16.841-04:00 level=INFO source=ggml.go:363 msg="offloading output layer to CPU"
Jul 15 01:40:16 amobile ollama[17467]: time=2025-07-15T01:40:16.841-04:00 level=INFO source=ggml.go:375 msg="offloaded 0/35 layers to GPU"
Jul 15 01:40:16 amobile ollama[17467]: time=2025-07-15T01:40:16.841-04:00 level=INFO source=ggml.go:377 msg="model weights" buffer=CPU size="3.6 GiB"
Jul 15 01:40:16 amobile ollama[17467]: time=2025-07-15T01:40:16.968-04:00 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server loading model"
Jul 15 01:40:17 amobile ollama[17467]: time=2025-07-15T01:40:17.113-04:00 level=INFO source=ggml.go:666 msg="compute graph" backend=CPU buffer_type=CPU size="1.1 GiB"
Jul 15 01:40:17 amobile ollama[17467]: time=2025-07-15T01:40:17.288-04:00 level=INFO source=ggml.go:666 msg="compute graph" backend=CPU buffer_type=CPU size="1.1 GiB"
Jul 15 01:40:18 amobile ollama[17467]: time=2025-07-15T01:40:18.226-04:00 level=INFO source=server.go:637 msg="llama runner started in 1.51 seconds"
Jul 15 01:41:38 amobile ollama[17467]: [GIN] 2025/07/15 - 01:41:38 | 200 |         1m22s |       127.0.0.1 | POST     "/api/chat"
Jul 15 01:44:40 amobile ollama[17467]: [GIN] 2025/07/15 - 01:44:40 | 200 |          1m3s |       127.0.0.1 | POST     "/api/chat"
Jul 15 01:49:32 amobile ollama[17467]: [GIN] 2025/07/15 - 01:49:32 | 200 |   26.3552658s |       127.0.0.1 | POST     "/api/chat"
Jul 15 02:00:49 amobile ollama[17467]: [GIN] 2025/07/15 - 02:00:49 | 200 |      39.415µs |       127.0.0.1 | HEAD     "/"
Jul 15 02:00:49 amobile ollama[17467]: [GIN] 2025/07/15 - 02:00:49 | 200 |  182.084631ms |       127.0.0.1 | POST     "/api/show"
Jul 15 02:00:50 amobile ollama[17467]: time=2025-07-15T02:00:50.063-04:00 level=INFO source=server.go:135 msg="system memory" total="31.2 GiB" free="27.6 GiB" free_swap="29.7 GiB"
Jul 15 02:00:50 amobile ollama[17467]: time=2025-07-15T02:00:50.065-04:00 level=INFO source=server.go:175 msg=offload library=cpu layers.requested=-1 layers.model=35 layers.offload=0 layers.split="" memory.available="[27.6 GiB]" memory.gpu_overhead="0 B" memory.required.full="5.1 GiB" memory.required.partial="0 B" memory.required.kv="450.0 MiB" memory.required.allocations="[5.1 GiB]" memory.weights.total="2.3 GiB" memory.weights.repeating="1.8 GiB" memory.weights.nonrepeating="525.0 MiB" memory.graph.full="517.0 MiB" memory.graph.partial="1.0 GiB" projector.weights="795.9 MiB" projector.graph="1.0 GiB"
Jul 15 02:00:50 amobile ollama[17467]: time=2025-07-15T02:00:50.166-04:00 level=INFO source=server.go:438 msg="starting llama server" cmd="/usr/local/bin/ollama runner --ollama-engine --model /usr/share/ollama/.ollama/models/blobs/sha256-aeda25e63ebd698fab8638ffb778e68bed908b960d39d0becc650fa981609d25 --ctx-size 8192 --batch-size 512 --threads 28 --no-mmap --parallel 2 --port 45897"
Jul 15 02:00:50 amobile ollama[17467]: time=2025-07-15T02:00:50.167-04:00 level=INFO source=sched.go:483 msg="loaded runners" count=1
Jul 15 02:00:50 amobile ollama[17467]: time=2025-07-15T02:00:50.167-04:00 level=INFO source=server.go:598 msg="waiting for llama runner to start responding"
Jul 15 02:00:50 amobile ollama[17467]: time=2025-07-15T02:00:50.167-04:00 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server not responding"
Jul 15 02:00:50 amobile ollama[17467]: time=2025-07-15T02:00:50.185-04:00 level=INFO source=runner.go:925 msg="starting ollama engine"
Jul 15 02:00:50 amobile ollama[17467]: time=2025-07-15T02:00:50.185-04:00 level=INFO source=runner.go:983 msg="Server listening on 127.0.0.1:45897"
Jul 15 02:00:50 amobile ollama[17467]: time=2025-07-15T02:00:50.274-04:00 level=INFO source=ggml.go:92 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=883 num_key_values=36
Jul 15 02:00:50 amobile ollama[17467]: load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-haswell.so
Jul 15 02:00:50 amobile ollama[17467]: time=2025-07-15T02:00:50.281-04:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc)
Jul 15 02:00:50 amobile ollama[17467]: time=2025-07-15T02:00:50.286-04:00 level=INFO source=ggml.go:359 msg="offloading 0 repeating layers to GPU"
Jul 15 02:00:50 amobile ollama[17467]: time=2025-07-15T02:00:50.287-04:00 level=INFO source=ggml.go:363 msg="offloading output layer to CPU"
Jul 15 02:00:50 amobile ollama[17467]: time=2025-07-15T02:00:50.287-04:00 level=INFO source=ggml.go:375 msg="offloaded 0/35 layers to GPU"
Jul 15 02:00:50 amobile ollama[17467]: time=2025-07-15T02:00:50.287-04:00 level=INFO source=ggml.go:377 msg="model weights" buffer=CPU size="3.6 GiB"
Jul 15 02:00:50 amobile ollama[17467]: time=2025-07-15T02:00:50.419-04:00 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server loading model"
Jul 15 02:00:50 amobile ollama[17467]: time=2025-07-15T02:00:50.575-04:00 level=INFO source=ggml.go:666 msg="compute graph" backend=CPU buffer_type=CPU size="1.1 GiB"
Jul 15 02:00:50 amobile ollama[17467]: time=2025-07-15T02:00:50.848-04:00 level=INFO source=ggml.go:666 msg="compute graph" backend=CPU buffer_type=CPU size="1.1 GiB"
Jul 15 02:00:51 amobile ollama[17467]: time=2025-07-15T02:00:51.684-04:00 level=INFO source=server.go:637 msg="llama runner started in 1.52 seconds"
Jul 15 02:00:51 amobile ollama[17467]: [GIN] 2025/07/15 - 02:00:51 | 200 |  1.910645146s |       127.0.0.1 | POST     "/api/generate"
Jul 15 02:04:07 amobile ollama[17467]: [GIN] 2025/07/15 - 02:04:07 | 200 |         2m49s |       127.0.0.1 | POST     "/api/chat"
Jul 15 02:06:47 amobile ollama[17467]: [GIN] 2025/07/15 - 02:06:47 | 200 |         2m12s |       127.0.0.1 | POST     "/api/chat"
Jul 15 02:10:37 amobile ollama[17467]: [GIN] 2025/07/15 - 02:10:37 | 200 |         2m24s |       127.0.0.1 | POST     "/api/chat"
Jul 15 02:13:36 amobile ollama[17467]: [GIN] 2025/07/15 - 02:13:36 | 200 |         1m45s |       127.0.0.1 | POST     "/api/chat"
Jul 15 02:18:37 amobile ollama[17467]: [GIN] 2025/07/15 - 02:18:37 | 200 |         1m29s |       127.0.0.1 | POST     "/api/chat"
Jul 15 02:20:22 amobile ollama[17467]: [GIN] 2025/07/15 - 02:20:22 | 200 | 24.101449501s |       127.0.0.1 | POST     "/api/chat"
Jul 15 02:22:28 amobile ollama[17467]: [GIN] 2025/07/15 - 02:22:28 | 200 |         1m16s |       127.0.0.1 | POST     "/api/chat"
Jul 15 02:24:37 amobile ollama[17467]: [GIN] 2025/07/15 - 02:24:37 | 200 |         1m15s |       127.0.0.1 | POST     "/api/chat"
Jul 15 02:26:57 amobile ollama[17467]: [GIN] 2025/07/15 - 02:26:57 | 200 |         1m14s |       127.0.0.1 | POST     "/api/chat"
Jul 15 02:31:07 amobile ollama[17467]: [GIN] 2025/07/15 - 02:31:07 | 200 |         2m13s |       127.0.0.1 | POST     "/api/chat"
Jul 15 02:32:33 amobile ollama[17467]: [GIN] 2025/07/15 - 02:32:33 | 200 | 11.564672162s |       127.0.0.1 | POST     "/api/chat"
Jul 15 02:36:32 amobile ollama[17467]: [GIN] 2025/07/15 - 02:36:32 | 200 |         2m21s |       127.0.0.1 | POST     "/api/chat"
Jul 15 02:41:53 amobile ollama[17467]: [GIN] 2025/07/15 - 02:41:53 | 200 | 46.782257017s |       127.0.0.1 | POST     "/api/chat"
Jul 15 02:43:59 amobile ollama[17467]: [GIN] 2025/07/15 - 02:43:59 | 200 |          1m3s |       127.0.0.1 | POST     "/api/chat"
Jul 15 10:58:17 amobile ollama[17467]: time=2025-07-15T10:58:17.463-04:00 level=INFO source=server.go:135 msg="system memory" total="31.2 GiB" free="27.2 GiB" free_swap="30.0 GiB"
Jul 15 10:58:17 amobile ollama[17467]: time=2025-07-15T10:58:17.465-04:00 level=INFO source=server.go:175 msg=offload library=cpu layers.requested=-1 layers.model=35 layers.offload=0 layers.split="" memory.available="[27.2 GiB]" memory.gpu_overhead="0 B" memory.required.full="5.1 GiB" memory.required.partial="0 B" memory.required.kv="450.0 MiB" memory.required.allocations="[5.1 GiB]" memory.weights.total="2.3 GiB" memory.weights.repeating="1.8 GiB" memory.weights.nonrepeating="525.0 MiB" memory.graph.full="517.0 MiB" memory.graph.partial="1.0 GiB" projector.weights="795.9 MiB" projector.graph="1.0 GiB"
Jul 15 10:58:17 amobile ollama[17467]: time=2025-07-15T10:58:17.561-04:00 level=INFO source=server.go:438 msg="starting llama server" cmd="/usr/local/bin/ollama runner --ollama-engine --model /usr/share/ollama/.ollama/models/blobs/sha256-aeda25e63ebd698fab8638ffb778e68bed908b960d39d0becc650fa981609d25 --ctx-size 8192 --batch-size 512 --threads 28 --no-mmap --parallel 2 --port 41967"
Jul 15 10:58:17 amobile ollama[17467]: time=2025-07-15T10:58:17.562-04:00 level=INFO source=sched.go:483 msg="loaded runners" count=1
Jul 15 10:58:17 amobile ollama[17467]: time=2025-07-15T10:58:17.562-04:00 level=INFO source=server.go:598 msg="waiting for llama runner to start responding"
Jul 15 10:58:17 amobile ollama[17467]: time=2025-07-15T10:58:17.562-04:00 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server not responding"
Jul 15 10:58:17 amobile ollama[17467]: time=2025-07-15T10:58:17.577-04:00 level=INFO source=runner.go:925 msg="starting ollama engine"
Jul 15 10:58:17 amobile ollama[17467]: time=2025-07-15T10:58:17.577-04:00 level=INFO source=runner.go:983 msg="Server listening on 127.0.0.1:41967"
Jul 15 10:58:17 amobile ollama[17467]: time=2025-07-15T10:58:17.664-04:00 level=INFO source=ggml.go:92 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=883 num_key_values=36
Jul 15 10:58:17 amobile ollama[17467]: load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-haswell.so
Jul 15 10:58:17 amobile ollama[17467]: time=2025-07-15T10:58:17.671-04:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc)
Jul 15 10:58:17 amobile ollama[17467]: time=2025-07-15T10:58:17.677-04:00 level=INFO source=ggml.go:359 msg="offloading 0 repeating layers to GPU"
Jul 15 10:58:17 amobile ollama[17467]: time=2025-07-15T10:58:17.677-04:00 level=INFO source=ggml.go:363 msg="offloading output layer to CPU"
Jul 15 10:58:17 amobile ollama[17467]: time=2025-07-15T10:58:17.677-04:00 level=INFO source=ggml.go:375 msg="offloaded 0/35 layers to GPU"
Jul 15 10:58:17 amobile ollama[17467]: time=2025-07-15T10:58:17.677-04:00 level=INFO source=ggml.go:377 msg="model weights" buffer=CPU size="3.6 GiB"
Jul 15 10:58:17 amobile ollama[17467]: time=2025-07-15T10:58:17.814-04:00 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server loading model"
Jul 15 10:58:17 amobile ollama[17467]: time=2025-07-15T10:58:17.966-04:00 level=INFO source=ggml.go:666 msg="compute graph" backend=CPU buffer_type=CPU size="1.1 GiB"
Jul 15 10:58:18 amobile ollama[17467]: time=2025-07-15T10:58:18.140-04:00 level=INFO source=ggml.go:666 msg="compute graph" backend=CPU buffer_type=CPU size="1.1 GiB"
Jul 15 10:58:19 amobile ollama[17467]: time=2025-07-15T10:58:19.581-04:00 level=INFO source=server.go:637 msg="llama runner started in 2.02 seconds"
Jul 15 10:58:23 amobile ollama[17467]: [GIN] 2025/07/15 - 10:58:23 | 200 |  6.619394623s |       127.0.0.1 | POST     "/api/chat"
Jul 15 10:58:51 amobile ollama[17467]: [GIN] 2025/07/15 - 10:58:51 | 200 |  2.678771024s |       127.0.0.1 | POST     "/api/chat"
Jul 15 10:59:28 amobile ollama[17467]: [GIN] 2025/07/15 - 10:59:28 | 200 | 16.793880245s |       127.0.0.1 | POST     "/api/chat"
Jul 15 11:01:09 amobile ollama[17467]: [GIN] 2025/07/15 - 11:01:09 | 200 |  5.583076216s |       127.0.0.1 | POST     "/api/chat"
Jul 15 12:24:46 amobile ollama[17467]: [GIN] 2025/07/15 - 12:24:46 | 200 |      56.614µs |       127.0.0.1 | HEAD     "/"
Jul 15 12:24:46 amobile ollama[17467]: [GIN] 2025/07/15 - 12:24:46 | 200 |   187.47452ms |       127.0.0.1 | POST     "/api/show"
Jul 15 12:24:47 amobile ollama[17467]: time=2025-07-15T12:24:47.225-04:00 level=INFO source=server.go:135 msg="system memory" total="31.2 GiB" free="27.2 GiB" free_swap="30.0 GiB"
Jul 15 12:24:47 amobile ollama[17467]: time=2025-07-15T12:24:47.227-04:00 level=INFO source=server.go:175 msg=offload library=cpu layers.requested=-1 layers.model=35 layers.offload=0 layers.split="" memory.available="[27.2 GiB]" memory.gpu_overhead="0 B" memory.required.full="5.1 GiB" memory.required.partial="0 B" memory.required.kv="450.0 MiB" memory.required.allocations="[5.1 GiB]" memory.weights.total="2.3 GiB" memory.weights.repeating="1.8 GiB" memory.weights.nonrepeating="525.0 MiB" memory.graph.full="517.0 MiB" memory.graph.partial="1.0 GiB" projector.weights="795.9 MiB" projector.graph="1.0 GiB"
Jul 15 12:24:47 amobile ollama[17467]: time=2025-07-15T12:24:47.325-04:00 level=INFO source=server.go:438 msg="starting llama server" cmd="/usr/local/bin/ollama runner --ollama-engine --model /usr/share/ollama/.ollama/models/blobs/sha256-aeda25e63ebd698fab8638ffb778e68bed908b960d39d0becc650fa981609d25 --ctx-size 8192 --batch-size 512 --threads 28 --no-mmap --parallel 2 --port 38447"
Jul 15 12:24:47 amobile ollama[17467]: time=2025-07-15T12:24:47.325-04:00 level=INFO source=sched.go:483 msg="loaded runners" count=1
Jul 15 12:24:47 amobile ollama[17467]: time=2025-07-15T12:24:47.325-04:00 level=INFO source=server.go:598 msg="waiting for llama runner to start responding"
Jul 15 12:24:47 amobile ollama[17467]: time=2025-07-15T12:24:47.325-04:00 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server not responding"
Jul 15 12:24:47 amobile ollama[17467]: time=2025-07-15T12:24:47.340-04:00 level=INFO source=runner.go:925 msg="starting ollama engine"
Jul 15 12:24:47 amobile ollama[17467]: time=2025-07-15T12:24:47.341-04:00 level=INFO source=runner.go:983 msg="Server listening on 127.0.0.1:38447"
Jul 15 12:24:47 amobile ollama[17467]: time=2025-07-15T12:24:47.430-04:00 level=INFO source=ggml.go:92 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=883 num_key_values=36
Jul 15 12:24:47 amobile ollama[17467]: load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-haswell.so
Jul 15 12:24:47 amobile ollama[17467]: time=2025-07-15T12:24:47.437-04:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc)
Jul 15 12:24:47 amobile ollama[17467]: time=2025-07-15T12:24:47.443-04:00 level=INFO source=ggml.go:359 msg="offloading 0 repeating layers to GPU"
Jul 15 12:24:47 amobile ollama[17467]: time=2025-07-15T12:24:47.443-04:00 level=INFO source=ggml.go:363 msg="offloading output layer to CPU"
Jul 15 12:24:47 amobile ollama[17467]: time=2025-07-15T12:24:47.443-04:00 level=INFO source=ggml.go:375 msg="offloaded 0/35 layers to GPU"
Jul 15 12:24:47 amobile ollama[17467]: time=2025-07-15T12:24:47.443-04:00 level=INFO source=ggml.go:377 msg="model weights" buffer=CPU size="3.6 GiB"
Jul 15 12:24:47 amobile ollama[17467]: time=2025-07-15T12:24:47.576-04:00 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server loading model"
Jul 15 12:24:47 amobile ollama[17467]: time=2025-07-15T12:24:47.743-04:00 level=INFO source=ggml.go:666 msg="compute graph" backend=CPU buffer_type=CPU size="1.1 GiB"
Jul 15 12:24:48 amobile ollama[17467]: time=2025-07-15T12:24:48.011-04:00 level=INFO source=ggml.go:666 msg="compute graph" backend=CPU buffer_type=CPU size="1.1 GiB"
Jul 15 12:24:49 amobile ollama[17467]: time=2025-07-15T12:24:49.594-04:00 level=INFO source=server.go:637 msg="llama runner started in 2.27 seconds"
Jul 15 12:24:49 amobile ollama[17467]: [GIN] 2025/07/15 - 12:24:49 | 200 |  2.646873422s |       127.0.0.1 | POST     "/api/generate"
Jul 15 12:28:09 amobile ollama[17467]: [GIN] 2025/07/15 - 12:28:09 | 200 |         2m41s |       127.0.0.1 | POST     "/api/chat"
Jul 15 12:30:20 amobile ollama[17467]: [GIN] 2025/07/15 - 12:30:20 | 200 | 46.245741541s |       127.0.0.1 | POST     "/api/chat"
Jul 15 12:31:18 amobile ollama[17467]: [GIN] 2025/07/15 - 12:31:18 | 200 | 40.434622179s |       127.0.0.1 | POST     "/api/chat"
Jul 15 12:32:04 amobile ollama[17467]: [GIN] 2025/07/15 - 12:32:04 | 200 | 11.571502803s |       127.0.0.1 | POST     "/api/chat"
Jul 15 14:05:59 amobile ollama[17467]: time=2025-07-15T14:05:59.853-04:00 level=INFO source=server.go:135 msg="system memory" total="31.2 GiB" free="27.3 GiB" free_swap="30.1 GiB"
Jul 15 14:05:59 amobile ollama[17467]: time=2025-07-15T14:05:59.855-04:00 level=INFO source=server.go:175 msg=offload library=cpu layers.requested=-1 layers.model=35 layers.offload=0 layers.split="" memory.available="[27.3 GiB]" memory.gpu_overhead="0 B" memory.required.full="5.1 GiB" memory.required.partial="0 B" memory.required.kv="450.0 MiB" memory.required.allocations="[5.1 GiB]" memory.weights.total="2.3 GiB" memory.weights.repeating="1.8 GiB" memory.weights.nonrepeating="525.0 MiB" memory.graph.full="517.0 MiB" memory.graph.partial="1.0 GiB" projector.weights="795.9 MiB" projector.graph="1.0 GiB"
Jul 15 14:05:59 amobile ollama[17467]: time=2025-07-15T14:05:59.955-04:00 level=INFO source=server.go:438 msg="starting llama server" cmd="/usr/local/bin/ollama runner --ollama-engine --model /usr/share/ollama/.ollama/models/blobs/sha256-aeda25e63ebd698fab8638ffb778e68bed908b960d39d0becc650fa981609d25 --ctx-size 8192 --batch-size 512 --threads 28 --no-mmap --parallel 2 --port 43873"
Jul 15 14:05:59 amobile ollama[17467]: time=2025-07-15T14:05:59.955-04:00 level=INFO source=sched.go:483 msg="loaded runners" count=1
Jul 15 14:05:59 amobile ollama[17467]: time=2025-07-15T14:05:59.955-04:00 level=INFO source=server.go:598 msg="waiting for llama runner to start responding"
Jul 15 14:05:59 amobile ollama[17467]: time=2025-07-15T14:05:59.955-04:00 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server not responding"
Jul 15 14:05:59 amobile ollama[17467]: time=2025-07-15T14:05:59.975-04:00 level=INFO source=runner.go:925 msg="starting ollama engine"
Jul 15 14:05:59 amobile ollama[17467]: time=2025-07-15T14:05:59.976-04:00 level=INFO source=runner.go:983 msg="Server listening on 127.0.0.1:43873"
Jul 15 14:06:00 amobile ollama[17467]: time=2025-07-15T14:06:00.077-04:00 level=INFO source=ggml.go:92 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=883 num_key_values=36
Jul 15 14:06:00 amobile ollama[17467]: load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-haswell.so
Jul 15 14:06:00 amobile ollama[17467]: time=2025-07-15T14:06:00.084-04:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc)
Jul 15 14:06:00 amobile ollama[17467]: time=2025-07-15T14:06:00.090-04:00 level=INFO source=ggml.go:359 msg="offloading 0 repeating layers to GPU"
Jul 15 14:06:00 amobile ollama[17467]: time=2025-07-15T14:06:00.090-04:00 level=INFO source=ggml.go:363 msg="offloading output layer to CPU"
Jul 15 14:06:00 amobile ollama[17467]: time=2025-07-15T14:06:00.090-04:00 level=INFO source=ggml.go:375 msg="offloaded 0/35 layers to GPU"
Jul 15 14:06:00 amobile ollama[17467]: time=2025-07-15T14:06:00.090-04:00 level=INFO source=ggml.go:377 msg="model weights" buffer=CPU size="3.6 GiB"
Jul 15 14:06:00 amobile ollama[17467]: time=2025-07-15T14:06:00.207-04:00 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server loading model"
Jul 15 14:06:00 amobile ollama[17467]: time=2025-07-15T14:06:00.381-04:00 level=INFO source=ggml.go:666 msg="compute graph" backend=CPU buffer_type=CPU size="1.1 GiB"
Jul 15 14:06:00 amobile ollama[17467]: time=2025-07-15T14:06:00.559-04:00 level=INFO source=ggml.go:666 msg="compute graph" backend=CPU buffer_type=CPU size="1.1 GiB"
Jul 15 14:06:01 amobile ollama[17467]: time=2025-07-15T14:06:01.977-04:00 level=INFO source=server.go:637 msg="llama runner started in 2.02 seconds"
Jul 15 14:06:14 amobile ollama[17467]: [GIN] 2025/07/15 - 14:06:14 | 200 | 15.243695865s |       127.0.0.1 | POST     "/api/chat"
Jul 15 14:08:49 amobile ollama[17467]: [GIN] 2025/07/15 - 14:08:49 | 200 |   7.81688474s |       127.0.0.1 | POST     "/api/chat"
Jul 15 14:10:32 amobile ollama[17467]: [GIN] 2025/07/15 - 14:10:32 | 200 |   5.57302473s |       127.0.0.1 | POST     "/api/chat"
Jul 15 14:11:52 amobile ollama[17467]: [GIN] 2025/07/15 - 14:11:52 | 200 |  4.563861397s |       127.0.0.1 | POST     "/api/chat"
Jul 15 14:12:21 amobile ollama[17467]: [GIN] 2025/07/15 - 14:12:21 | 200 |  7.445482222s |       127.0.0.1 | POST     "/api/chat"
Jul 15 14:13:40 amobile ollama[17467]: [GIN] 2025/07/15 - 14:13:40 | 200 |  8.141572187s |       127.0.0.1 | POST     "/api/chat"
Jul 15 14:15:38 amobile ollama[17467]: [GIN] 2025/07/15 - 14:15:38 | 200 | 11.704692526s |       127.0.0.1 | POST     "/api/chat"
Jul 15 14:17:23 amobile ollama[17467]: [GIN] 2025/07/15 - 14:17:23 | 200 |  8.167364087s |       127.0.0.1 | POST     "/api/chat"
Jul 15 14:19:24 amobile ollama[17467]: [GIN] 2025/07/15 - 14:19:24 | 200 |  8.725596273s |       127.0.0.1 | POST     "/api/chat"
Jul 15 14:20:54 amobile ollama[17467]: [GIN] 2025/07/15 - 14:20:54 | 200 |  7.287166608s |       127.0.0.1 | POST     "/api/chat"
Jul 15 14:23:09 amobile ollama[17467]: [GIN] 2025/07/15 - 14:23:09 | 200 |  7.923928608s |       127.0.0.1 | POST     "/api/chat"
Jul 15 14:28:00 amobile ollama[17467]: [GIN] 2025/07/15 - 14:28:00 | 200 |  9.638210214s |       127.0.0.1 | POST     "/api/chat"
Jul 15 14:30:20 amobile ollama[17467]: [GIN] 2025/07/15 - 14:30:20 | 200 | 11.398374606s |       127.0.0.1 | POST     "/api/chat"
Jul 15 14:32:11 amobile ollama[17467]: [GIN] 2025/07/15 - 14:32:11 | 200 |  6.043563297s |       127.0.0.1 | POST     "/api/chat"
Jul 15 14:35:04 amobile ollama[17467]: [GIN] 2025/07/15 - 14:35:04 | 200 | 10.856669259s |       127.0.0.1 | POST     "/api/chat"
Jul 15 14:38:09 amobile ollama[17467]: [GIN] 2025/07/15 - 14:38:09 | 200 |  10.15686096s |       127.0.0.1 | POST     "/api/chat"
Jul 15 14:39:15 amobile ollama[17467]: [GIN] 2025/07/15 - 14:39:15 | 200 |  6.940100831s |       127.0.0.1 | POST     "/api/chat"
Jul 15 14:41:27 amobile ollama[17467]: [GIN] 2025/07/15 - 14:41:27 | 200 | 12.226914926s |       127.0.0.1 | POST     "/api/chat"
Jul 15 14:44:28 amobile ollama[17467]: [GIN] 2025/07/15 - 14:44:28 | 200 | 23.563901162s |       127.0.0.1 | POST     "/api/chat"
Jul 15 14:48:26 amobile ollama[17467]: [GIN] 2025/07/15 - 14:48:26 | 200 | 33.467552018s |       127.0.0.1 | POST     "/api/chat"
Jul 15 14:50:38 amobile ollama[17467]: [GIN] 2025/07/15 - 14:50:38 | 200 | 32.005724467s |       127.0.0.1 | POST     "/api/chat"
Jul 15 14:52:51 amobile ollama[17467]: [GIN] 2025/07/15 - 14:52:51 | 200 | 34.606237099s |       127.0.0.1 | POST     "/api/chat"
Jul 15 14:55:03 amobile ollama[17467]: [GIN] 2025/07/15 - 14:55:03 | 200 | 39.147461074s |       127.0.0.1 | POST     "/api/chat"
Jul 15 14:57:11 amobile ollama[17467]: [GIN] 2025/07/15 - 14:57:11 | 200 | 37.597258006s |       127.0.0.1 | POST     "/api/chat"
Jul 15 14:57:45 amobile ollama[17467]: [GIN] 2025/07/15 - 14:57:45 | 200 |      60.643µs |       127.0.0.1 | HEAD     "/"
Jul 15 14:57:45 amobile ollama[17467]: [GIN] 2025/07/15 - 14:57:45 | 200 |      75.829µs |       127.0.0.1 | GET      "/api/ps"
Jul 15 14:57:49 amobile ollama[17467]: [GIN] 2025/07/15 - 14:57:49 | 200 |  5.326923899s |       127.0.0.1 | POST     "/api/chat"
Jul 15 14:58:40 amobile ollama[17467]: [GIN] 2025/07/15 - 14:58:40 | 200 |      39.146µs |       127.0.0.1 | HEAD     "/"
Jul 15 14:58:40 amobile ollama[17467]: [GIN] 2025/07/15 - 14:58:40 | 200 |       37.34µs |       127.0.0.1 | GET      "/api/ps"
Jul 15 15:31:24 amobile ollama[17467]: [GIN] 2025/07/15 - 15:31:24 | 200 |      39.551µs |       127.0.0.1 | HEAD     "/"
Jul 15 15:31:24 amobile ollama[17467]: [GIN] 2025/07/15 - 15:31:24 | 200 |  193.354766ms |       127.0.0.1 | POST     "/api/show"
Jul 15 15:31:25 amobile ollama[17467]: time=2025-07-15T15:31:25.125-04:00 level=INFO source=server.go:135 msg="system memory" total="31.2 GiB" free="27.2 GiB" free_swap="30.1 GiB"
Jul 15 15:31:25 amobile ollama[17467]: time=2025-07-15T15:31:25.127-04:00 level=INFO source=server.go:175 msg=offload library=cpu layers.requested=-1 layers.model=35 layers.offload=0 layers.split="" memory.available="[27.2 GiB]" memory.gpu_overhead="0 B" memory.required.full="5.1 GiB" memory.required.partial="0 B" memory.required.kv="450.0 MiB" memory.required.allocations="[5.1 GiB]" memory.weights.total="2.3 GiB" memory.weights.repeating="1.8 GiB" memory.weights.nonrepeating="525.0 MiB" memory.graph.full="517.0 MiB" memory.graph.partial="1.0 GiB" projector.weights="795.9 MiB" projector.graph="1.0 GiB"
Jul 15 15:31:25 amobile ollama[17467]: time=2025-07-15T15:31:25.223-04:00 level=INFO source=server.go:438 msg="starting llama server" cmd="/usr/local/bin/ollama runner --ollama-engine --model /usr/share/ollama/.ollama/models/blobs/sha256-aeda25e63ebd698fab8638ffb778e68bed908b960d39d0becc650fa981609d25 --ctx-size 8192 --batch-size 512 --threads 28 --no-mmap --parallel 2 --port 35539"
Jul 15 15:31:25 amobile ollama[17467]: time=2025-07-15T15:31:25.223-04:00 level=INFO source=sched.go:483 msg="loaded runners" count=1
Jul 15 15:31:25 amobile ollama[17467]: time=2025-07-15T15:31:25.223-04:00 level=INFO source=server.go:598 msg="waiting for llama runner to start responding"
Jul 15 15:31:25 amobile ollama[17467]: time=2025-07-15T15:31:25.223-04:00 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server not responding"
Jul 15 15:31:25 amobile ollama[17467]: time=2025-07-15T15:31:25.237-04:00 level=INFO source=runner.go:925 msg="starting ollama engine"
Jul 15 15:31:25 amobile ollama[17467]: time=2025-07-15T15:31:25.237-04:00 level=INFO source=runner.go:983 msg="Server listening on 127.0.0.1:35539"
Jul 15 15:31:25 amobile ollama[17467]: time=2025-07-15T15:31:25.331-04:00 level=INFO source=ggml.go:92 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=883 num_key_values=36
Jul 15 15:31:25 amobile ollama[17467]: load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-haswell.so
Jul 15 15:31:25 amobile ollama[17467]: time=2025-07-15T15:31:25.337-04:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc)
Jul 15 15:31:25 amobile ollama[17467]: time=2025-07-15T15:31:25.343-04:00 level=INFO source=ggml.go:359 msg="offloading 0 repeating layers to GPU"
Jul 15 15:31:25 amobile ollama[17467]: time=2025-07-15T15:31:25.343-04:00 level=INFO source=ggml.go:363 msg="offloading output layer to CPU"
Jul 15 15:31:25 amobile ollama[17467]: time=2025-07-15T15:31:25.343-04:00 level=INFO source=ggml.go:375 msg="offloaded 0/35 layers to GPU"
Jul 15 15:31:25 amobile ollama[17467]: time=2025-07-15T15:31:25.343-04:00 level=INFO source=ggml.go:377 msg="model weights" buffer=CPU size="3.6 GiB"
Jul 15 15:31:25 amobile ollama[17467]: time=2025-07-15T15:31:25.475-04:00 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server loading model"
Jul 15 15:31:25 amobile ollama[17467]: time=2025-07-15T15:31:25.638-04:00 level=INFO source=ggml.go:666 msg="compute graph" backend=CPU buffer_type=CPU size="1.1 GiB"
Jul 15 15:31:25 amobile ollama[17467]: time=2025-07-15T15:31:25.913-04:00 level=INFO source=ggml.go:666 msg="compute graph" backend=CPU buffer_type=CPU size="1.1 GiB"
Jul 15 15:31:27 amobile ollama[17467]: time=2025-07-15T15:31:27.248-04:00 level=INFO source=server.go:637 msg="llama runner started in 2.02 seconds"
Jul 15 15:31:27 amobile ollama[17467]: [GIN] 2025/07/15 - 15:31:27 | 200 |  2.458372036s |       127.0.0.1 | POST     "/api/generate"
Jul 15 15:32:06 amobile ollama[17467]: [GIN] 2025/07/15 - 15:32:06 | 200 | 12.149772723s |       127.0.0.1 | POST     "/api/chat"
Jul 15 15:32:50 amobile ollama[17467]: [GIN] 2025/07/15 - 15:32:50 | 200 | 19.018623689s |       127.0.0.1 | POST     "/api/chat"
Jul 15 15:33:40 amobile ollama[17467]: [GIN] 2025/07/15 - 15:33:40 | 200 |  19.08334206s |       127.0.0.1 | POST     "/api/chat"
Jul 15 15:34:59 amobile ollama[17467]: [GIN] 2025/07/15 - 15:34:59 | 200 | 26.413488556s |       127.0.0.1 | POST     "/api/chat"
Jul 15 15:37:22 amobile ollama[17467]: [GIN] 2025/07/15 - 15:37:22 | 200 | 27.871657486s |       127.0.0.1 | POST     "/api/chat"
Jul 15 15:38:35 amobile ollama[17467]: [GIN] 2025/07/15 - 15:38:35 | 200 | 27.072325445s |       127.0.0.1 | POST     "/api/chat"
Jul 15 15:40:25 amobile ollama[17467]: [GIN] 2025/07/15 - 15:40:25 | 200 | 38.748835117s |       127.0.0.1 | POST     "/api/chat"
Jul 15 15:42:57 amobile ollama[17467]: [GIN] 2025/07/15 - 15:42:57 | 200 |  3.887759995s |       127.0.0.1 | POST     "/api/chat"
Jul 15 15:47:15 amobile ollama[17467]: [GIN] 2025/07/15 - 15:47:15 | 200 | 32.823251665s |       127.0.0.1 | POST     "/api/chat"
Jul 15 15:47:48 amobile ollama[17467]: [GIN] 2025/07/15 - 15:47:48 | 200 |  8.797014247s |       127.0.0.1 | POST     "/api/chat"
Jul 15 15:48:16 amobile ollama[17467]: [GIN] 2025/07/15 - 15:48:16 | 200 | 23.792104523s |       127.0.0.1 | POST     "/api/chat"
Jul 15 15:48:38 amobile ollama[17467]: [GIN] 2025/07/15 - 15:48:38 | 200 |   9.21259348s |       127.0.0.1 | POST     "/api/chat"
Jul 15 15:49:31 amobile ollama[17467]: [GIN] 2025/07/15 - 15:49:31 | 200 | 21.710448987s |       127.0.0.1 | POST     "/api/chat"
Jul 15 15:49:35 amobile ollama[17467]: [GIN] 2025/07/15 - 15:49:35 | 200 | 32.825344864s |       127.0.0.1 | POST     "/api/chat"
Jul 15 15:49:40 amobile ollama[17467]: [GIN] 2025/07/15 - 15:49:40 | 200 |  7.626245474s |       127.0.0.1 | POST     "/api/chat"
Jul 15 15:50:47 amobile ollama[17467]: [GIN] 2025/07/15 - 15:50:47 | 200 |  6.319395124s |       127.0.0.1 | POST     "/api/chat"
Jul 15 15:50:55 amobile ollama[17467]: [GIN] 2025/07/15 - 15:50:55 | 200 | 16.482319877s |       127.0.0.1 | POST     "/api/chat"
Jul 15 16:00:37 amobile ollama[17467]: time=2025-07-15T16:00:37.011-04:00 level=INFO source=server.go:135 msg="system memory" total="31.2 GiB" free="27.1 GiB" free_swap="30.1 GiB"
Jul 15 16:00:37 amobile ollama[17467]: time=2025-07-15T16:00:37.013-04:00 level=INFO source=server.go:175 msg=offload library=cpu layers.requested=-1 layers.model=35 layers.offload=0 layers.split="" memory.available="[27.1 GiB]" memory.gpu_overhead="0 B" memory.required.full="5.1 GiB" memory.required.partial="0 B" memory.required.kv="450.0 MiB" memory.required.allocations="[5.1 GiB]" memory.weights.total="2.3 GiB" memory.weights.repeating="1.8 GiB" memory.weights.nonrepeating="525.0 MiB" memory.graph.full="517.0 MiB" memory.graph.partial="1.0 GiB" projector.weights="795.9 MiB" projector.graph="1.0 GiB"
Jul 15 16:00:37 amobile ollama[17467]: time=2025-07-15T16:00:37.108-04:00 level=INFO source=server.go:438 msg="starting llama server" cmd="/usr/local/bin/ollama runner --ollama-engine --model /usr/share/ollama/.ollama/models/blobs/sha256-aeda25e63ebd698fab8638ffb778e68bed908b960d39d0becc650fa981609d25 --ctx-size 8192 --batch-size 512 --threads 28 --no-mmap --parallel 2 --port 40683"
Jul 15 16:00:37 amobile ollama[17467]: time=2025-07-15T16:00:37.108-04:00 level=INFO source=sched.go:483 msg="loaded runners" count=1
Jul 15 16:00:37 amobile ollama[17467]: time=2025-07-15T16:00:37.109-04:00 level=INFO source=server.go:598 msg="waiting for llama runner to start responding"
Jul 15 16:00:37 amobile ollama[17467]: time=2025-07-15T16:00:37.109-04:00 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server not responding"
Jul 15 16:00:37 amobile ollama[17467]: time=2025-07-15T16:00:37.136-04:00 level=INFO source=runner.go:925 msg="starting ollama engine"
Jul 15 16:00:37 amobile ollama[17467]: time=2025-07-15T16:00:37.136-04:00 level=INFO source=runner.go:983 msg="Server listening on 127.0.0.1:40683"
Jul 15 16:00:37 amobile ollama[17467]: time=2025-07-15T16:00:37.233-04:00 level=INFO source=ggml.go:92 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=883 num_key_values=36
Jul 15 16:00:37 amobile ollama[17467]: load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-haswell.so
Jul 15 16:00:37 amobile ollama[17467]: time=2025-07-15T16:00:37.240-04:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc)
Jul 15 16:00:37 amobile ollama[17467]: time=2025-07-15T16:00:37.245-04:00 level=INFO source=ggml.go:359 msg="offloading 0 repeating layers to GPU"
Jul 15 16:00:37 amobile ollama[17467]: time=2025-07-15T16:00:37.245-04:00 level=INFO source=ggml.go:363 msg="offloading output layer to CPU"
Jul 15 16:00:37 amobile ollama[17467]: time=2025-07-15T16:00:37.245-04:00 level=INFO source=ggml.go:375 msg="offloaded 0/35 layers to GPU"
Jul 15 16:00:37 amobile ollama[17467]: time=2025-07-15T16:00:37.245-04:00 level=INFO source=ggml.go:377 msg="model weights" buffer=CPU size="3.6 GiB"
Jul 15 16:00:37 amobile ollama[17467]: time=2025-07-15T16:00:37.361-04:00 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server loading model"
Jul 15 16:00:37 amobile ollama[17467]: time=2025-07-15T16:00:37.537-04:00 level=INFO source=ggml.go:666 msg="compute graph" backend=CPU buffer_type=CPU size="1.1 GiB"
Jul 15 16:00:37 amobile ollama[17467]: time=2025-07-15T16:00:37.714-04:00 level=INFO source=ggml.go:666 msg="compute graph" backend=CPU buffer_type=CPU size="1.1 GiB"
Jul 15 16:00:39 amobile ollama[17467]: time=2025-07-15T16:00:39.125-04:00 level=INFO source=server.go:637 msg="llama runner started in 2.02 seconds"
Jul 15 16:00:50 amobile ollama[17467]: [GIN] 2025/07/15 - 16:00:50 | 200 | 13.453382374s |       127.0.0.1 | POST     "/api/chat"
Jul 15 16:01:37 amobile ollama[17467]: [GIN] 2025/07/15 - 16:01:37 | 200 |  7.010397526s |       127.0.0.1 | POST     "/api/chat"
Jul 15 16:02:47 amobile ollama[17467]: [GIN] 2025/07/15 - 16:02:47 | 200 |  9.218088928s |       127.0.0.1 | POST     "/api/chat"
Jul 15 16:03:40 amobile ollama[17467]: [GIN] 2025/07/15 - 16:03:40 | 200 |  8.419104993s |       127.0.0.1 | POST     "/api/chat"
Jul 15 16:06:18 amobile ollama[17467]: [GIN] 2025/07/15 - 16:06:18 | 200 | 44.640608373s |       127.0.0.1 | POST     "/api/chat"
Jul 15 16:06:50 amobile ollama[17467]: [GIN] 2025/07/15 - 16:06:50 | 200 |  17.14751161s |       127.0.0.1 | POST     "/api/chat"
Jul 15 16:07:39 amobile ollama[17467]: [GIN] 2025/07/15 - 16:07:39 | 200 | 38.689371786s |       127.0.0.1 | POST     "/api/chat"
Jul 15 16:08:12 amobile ollama[17467]: [GIN] 2025/07/15 - 16:08:12 | 200 | 32.899070555s |       127.0.0.1 | POST     "/api/chat"
Jul 15 16:08:34 amobile ollama[17467]: time=2025-07-15T16:08:34.345-04:00 level=WARN source=runner.go:157 msg="truncating input prompt" limit=4096 prompt=19010 keep=4 new=4096
Jul 15 16:09:46 amobile ollama[17467]: [GIN] 2025/07/15 - 16:09:46 | 200 | 25.107300894s |       127.0.0.1 | POST     "/api/chat"
Jul 15 16:12:18 amobile ollama[17467]: [GIN] 2025/07/15 - 16:12:18 | 200 |         3m43s |       127.0.0.1 | POST     "/api/chat"
Jul 15 16:12:19 amobile ollama[17467]: [GIN] 2025/07/15 - 16:12:19 | 200 |         1m36s |       127.0.0.1 | POST     "/api/chat"
Jul 15 16:15:05 amobile ollama[17467]: [GIN] 2025/07/15 - 16:15:05 | 200 |         1m53s |       127.0.0.1 | POST     "/api/chat"
Jul 15 16:15:38 amobile ollama[17467]: [GIN] 2025/07/15 - 16:15:38 | 200 |         1m14s |       127.0.0.1 | POST     "/api/chat"
Jul 15 16:15:48 amobile ollama[17467]: [GIN] 2025/07/15 - 16:15:48 | 200 | 33.070491476s |       127.0.0.1 | POST     "/api/chat"
Jul 15 16:17:31 amobile ollama[17467]: [GIN] 2025/07/15 - 16:17:31 | 200 |          1m0s |       127.0.0.1 | POST     "/api/chat"
Jul 15 16:21:06 amobile ollama[17467]: [GIN] 2025/07/15 - 16:21:06 | 200 | 34.303595005s |       127.0.0.1 | POST     "/api/chat"
Jul 15 16:21:32 amobile ollama[17467]: panic: failed to decode batch: could not find a kv cache slot (cache: 2560 batch: 512)
Jul 15 16:21:32 amobile ollama[17467]: goroutine 8 [running]:
Jul 15 16:21:32 amobile ollama[17467]: github.com/ollama/ollama/runner/ollamarunner.(*Server).run(0xc0002b6900, {0x55867a095700, 0xc0000008c0})
Jul 15 16:21:32 amobile ollama[17467]:         github.com/ollama/ollama/runner/ollamarunner/runner.go:364 +0x65
Jul 15 16:21:32 amobile ollama[17467]: created by github.com/ollama/ollama/runner/ollamarunner.Execute in goroutine 1
Jul 15 16:21:32 amobile ollama[17467]:         github.com/ollama/ollama/runner/ollamarunner/runner.go:960 +0xa74
Jul 15 16:21:32 amobile ollama[17467]: time=2025-07-15T16:21:32.337-04:00 level=ERROR source=server.go:807 msg="post predict" error="Post \"http://127.0.0.1:40683/completion\": EOF"
Jul 15 16:21:32 amobile ollama[17467]: [GIN] 2025/07/15 - 16:21:32 | 500 | 11.994152362s |       127.0.0.1 | POST     "/api/chat"
Jul 15 16:22:39 amobile ollama[17467]: time=2025-07-15T16:22:39.458-04:00 level=INFO source=server.go:135 msg="system memory" total="31.2 GiB" free="27.2 GiB" free_swap="30.1 GiB"
Jul 15 16:22:39 amobile ollama[17467]: time=2025-07-15T16:22:39.461-04:00 level=INFO source=server.go:175 msg=offload library=cpu layers.requested=-1 layers.model=35 layers.offload=0 layers.split="" memory.available="[27.2 GiB]" memory.gpu_overhead="0 B" memory.required.full="5.1 GiB" memory.required.partial="0 B" memory.required.kv="450.0 MiB" memory.required.allocations="[5.1 GiB]" memory.weights.total="2.3 GiB" memory.weights.repeating="1.8 GiB" memory.weights.nonrepeating="525.0 MiB" memory.graph.full="517.0 MiB" memory.graph.partial="1.0 GiB" projector.weights="795.9 MiB" projector.graph="1.0 GiB"
Jul 15 16:22:39 amobile ollama[17467]: time=2025-07-15T16:22:39.562-04:00 level=INFO source=server.go:438 msg="starting llama server" cmd="/usr/local/bin/ollama runner --ollama-engine --model /usr/share/ollama/.ollama/models/blobs/sha256-aeda25e63ebd698fab8638ffb778e68bed908b960d39d0becc650fa981609d25 --ctx-size 8192 --batch-size 512 --threads 28 --no-mmap --parallel 2 --port 45949"
Jul 15 16:22:39 amobile ollama[17467]: time=2025-07-15T16:22:39.562-04:00 level=INFO source=sched.go:483 msg="loaded runners" count=1
Jul 15 16:22:39 amobile ollama[17467]: time=2025-07-15T16:22:39.562-04:00 level=INFO source=server.go:598 msg="waiting for llama runner to start responding"
Jul 15 16:22:39 amobile ollama[17467]: time=2025-07-15T16:22:39.563-04:00 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server not responding"
Jul 15 16:22:39 amobile ollama[17467]: time=2025-07-15T16:22:39.581-04:00 level=INFO source=runner.go:925 msg="starting ollama engine"
Jul 15 16:22:39 amobile ollama[17467]: time=2025-07-15T16:22:39.582-04:00 level=INFO source=runner.go:983 msg="Server listening on 127.0.0.1:45949"
Jul 15 16:22:39 amobile ollama[17467]: time=2025-07-15T16:22:39.678-04:00 level=INFO source=ggml.go:92 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=883 num_key_values=36
Jul 15 16:22:39 amobile ollama[17467]: load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-haswell.so
Jul 15 16:22:39 amobile ollama[17467]: time=2025-07-15T16:22:39.684-04:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc)
Jul 15 16:22:39 amobile ollama[17467]: time=2025-07-15T16:22:39.690-04:00 level=INFO source=ggml.go:359 msg="offloading 0 repeating layers to GPU"
Jul 15 16:22:39 amobile ollama[17467]: time=2025-07-15T16:22:39.690-04:00 level=INFO source=ggml.go:363 msg="offloading output layer to CPU"
Jul 15 16:22:39 amobile ollama[17467]: time=2025-07-15T16:22:39.690-04:00 level=INFO source=ggml.go:375 msg="offloaded 0/35 layers to GPU"
Jul 15 16:22:39 amobile ollama[17467]: time=2025-07-15T16:22:39.690-04:00 level=INFO source=ggml.go:377 msg="model weights" buffer=CPU size="3.6 GiB"
Jul 15 16:22:39 amobile ollama[17467]: time=2025-07-15T16:22:39.814-04:00 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server loading model"
Jul 15 16:22:39 amobile ollama[17467]: time=2025-07-15T16:22:39.973-04:00 level=INFO source=ggml.go:666 msg="compute graph" backend=CPU buffer_type=CPU size="1.1 GiB"
Jul 15 16:22:40 amobile ollama[17467]: time=2025-07-15T16:22:40.238-04:00 level=INFO source=ggml.go:666 msg="compute graph" backend=CPU buffer_type=CPU size="1.1 GiB"
Jul 15 16:22:41 amobile ollama[17467]: time=2025-07-15T16:22:41.844-04:00 level=INFO source=server.go:637 msg="llama runner started in 2.28 seconds"
Jul 15 16:23:21 amobile ollama[17467]: [GIN] 2025/07/15 - 16:23:21 | 200 | 42.358125818s |       127.0.0.1 | POST     "/api/chat"
Jul 15 16:35:49 amobile ollama[17467]: time=2025-07-15T16:35:49.962-04:00 level=INFO source=server.go:135 msg="system memory" total="31.2 GiB" free="27.2 GiB" free_swap="30.1 GiB"
Jul 15 16:35:49 amobile ollama[17467]: time=2025-07-15T16:35:49.964-04:00 level=INFO source=server.go:175 msg=offload library=cpu layers.requested=-1 layers.model=35 layers.offload=0 layers.split="" memory.available="[27.2 GiB]" memory.gpu_overhead="0 B" memory.required.full="5.1 GiB" memory.required.partial="0 B" memory.required.kv="450.0 MiB" memory.required.allocations="[5.1 GiB]" memory.weights.total="2.3 GiB" memory.weights.repeating="1.8 GiB" memory.weights.nonrepeating="525.0 MiB" memory.graph.full="517.0 MiB" memory.graph.partial="1.0 GiB" projector.weights="795.9 MiB" projector.graph="1.0 GiB"
Jul 15 16:35:50 amobile ollama[17467]: time=2025-07-15T16:35:50.062-04:00 level=INFO source=server.go:438 msg="starting llama server" cmd="/usr/local/bin/ollama runner --ollama-engine --model /usr/share/ollama/.ollama/models/blobs/sha256-aeda25e63ebd698fab8638ffb778e68bed908b960d39d0becc650fa981609d25 --ctx-size 8192 --batch-size 512 --threads 28 --no-mmap --parallel 2 --port 36385"
Jul 15 16:35:50 amobile ollama[17467]: time=2025-07-15T16:35:50.062-04:00 level=INFO source=sched.go:483 msg="loaded runners" count=1
Jul 15 16:35:50 amobile ollama[17467]: time=2025-07-15T16:35:50.063-04:00 level=INFO source=server.go:598 msg="waiting for llama runner to start responding"
Jul 15 16:35:50 amobile ollama[17467]: time=2025-07-15T16:35:50.063-04:00 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server not responding"
Jul 15 16:35:50 amobile ollama[17467]: time=2025-07-15T16:35:50.080-04:00 level=INFO source=runner.go:925 msg="starting ollama engine"
Jul 15 16:35:50 amobile ollama[17467]: time=2025-07-15T16:35:50.081-04:00 level=INFO source=runner.go:983 msg="Server listening on 127.0.0.1:36385"
Jul 15 16:35:50 amobile ollama[17467]: time=2025-07-15T16:35:50.178-04:00 level=INFO source=ggml.go:92 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=883 num_key_values=36
Jul 15 16:35:50 amobile ollama[17467]: load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-haswell.so
Jul 15 16:35:50 amobile ollama[17467]: time=2025-07-15T16:35:50.186-04:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc)
Jul 15 16:35:50 amobile ollama[17467]: time=2025-07-15T16:35:50.193-04:00 level=INFO source=ggml.go:359 msg="offloading 0 repeating layers to GPU"
Jul 15 16:35:50 amobile ollama[17467]: time=2025-07-15T16:35:50.193-04:00 level=INFO source=ggml.go:363 msg="offloading output layer to CPU"
Jul 15 16:35:50 amobile ollama[17467]: time=2025-07-15T16:35:50.193-04:00 level=INFO source=ggml.go:375 msg="offloaded 0/35 layers to GPU"
Jul 15 16:35:50 amobile ollama[17467]: time=2025-07-15T16:35:50.193-04:00 level=INFO source=ggml.go:377 msg="model weights" buffer=CPU size="3.6 GiB"
Jul 15 16:35:50 amobile ollama[17467]: time=2025-07-15T16:35:50.314-04:00 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server loading model"
Jul 15 16:35:50 amobile ollama[17467]: time=2025-07-15T16:35:50.481-04:00 level=INFO source=ggml.go:666 msg="compute graph" backend=CPU buffer_type=CPU size="1.1 GiB"
Jul 15 16:35:50 amobile ollama[17467]: time=2025-07-15T16:35:50.656-04:00 level=INFO source=ggml.go:666 msg="compute graph" backend=CPU buffer_type=CPU size="1.1 GiB"
Jul 15 16:35:52 amobile ollama[17467]: time=2025-07-15T16:35:52.094-04:00 level=INFO source=server.go:637 msg="llama runner started in 2.03 seconds"
Jul 15 16:36:34 amobile ollama[17467]: [GIN] 2025/07/15 - 16:36:34 | 200 | 44.698539465s |       127.0.0.1 | POST     "/api/chat"
Jul 15 16:37:39 amobile ollama[17467]: [GIN] 2025/07/15 - 16:37:39 | 200 | 15.367562581s |       127.0.0.1 | POST     "/api/chat"
Jul 15 16:39:19 amobile ollama[17467]: [GIN] 2025/07/15 - 16:39:19 | 200 | 58.985695818s |       127.0.0.1 | POST     "/api/chat"
Jul 15 16:39:26 amobile ollama[17467]: [GIN] 2025/07/15 - 16:39:26 | 200 |          1m4s |       127.0.0.1 | POST     "/api/chat"
Jul 15 16:40:27 amobile ollama[17467]: [GIN] 2025/07/15 - 16:40:27 | 200 | 32.498111154s |       127.0.0.1 | POST     "/api/chat"
Jul 15 16:41:19 amobile ollama[17467]: [GIN] 2025/07/15 - 16:41:19 | 200 | 25.790679315s |       127.0.0.1 | POST     "/api/chat"
Jul 15 16:42:42 amobile ollama[17467]: [GIN] 2025/07/15 - 16:42:42 | 200 | 29.962983809s |       127.0.0.1 | POST     "/api/chat"
Jul 15 16:44:26 amobile ollama[17467]: [GIN] 2025/07/15 - 16:44:26 | 200 | 20.049622997s |       127.0.0.1 | POST     "/api/chat"
Jul 15 16:45:03 amobile ollama[17467]: [GIN] 2025/07/15 - 16:45:03 | 200 | 11.432715593s |       127.0.0.1 | POST     "/api/chat"
Jul 15 16:45:30 amobile ollama[17467]: [GIN] 2025/07/15 - 16:45:30 | 200 |  15.81967878s |       127.0.0.1 | POST     "/api/chat"
Jul 15 16:47:41 amobile ollama[17467]: [GIN] 2025/07/15 - 16:47:41 | 200 |         1m15s |       127.0.0.1 | POST     "/api/chat"
Jul 15 16:49:29 amobile ollama[17467]: [GIN] 2025/07/15 - 16:49:29 | 200 |          3m9s |       127.0.0.1 | POST     "/api/chat"
Jul 15 16:51:05 amobile ollama[17467]: [GIN] 2025/07/15 - 16:51:05 | 200 | 14.459985878s |       127.0.0.1 | POST     "/api/chat"
Jul 15 16:52:31 amobile ollama[17467]: panic: failed to decode batch: could not find a kv cache slot (cache: 2560 batch: 512)
Jul 15 16:52:31 amobile ollama[17467]: goroutine 14 [running]:
Jul 15 16:52:31 amobile ollama[17467]: github.com/ollama/ollama/runner/ollamarunner.(*Server).run(0xc00065e900, {0x555e3641c700, 0xc0001308c0})
Jul 15 16:52:31 amobile ollama[17467]:         github.com/ollama/ollama/runner/ollamarunner/runner.go:364 +0x65
Jul 15 16:52:31 amobile ollama[17467]: created by github.com/ollama/ollama/runner/ollamarunner.Execute in goroutine 1
Jul 15 16:52:31 amobile ollama[17467]:         github.com/ollama/ollama/runner/ollamarunner/runner.go:960 +0xa74
Jul 15 16:52:31 amobile ollama[17467]: time=2025-07-15T16:52:31.866-04:00 level=ERROR source=server.go:807 msg="post predict" error="Post \"http://127.0.0.1:36385/completion\": EOF"
Jul 15 16:52:31 amobile ollama[17467]: [GIN] 2025/07/15 - 16:52:31 | 200 | 13.142628839s |       127.0.0.1 | POST     "/api/chat"
Jul 15 16:53:23 amobile ollama[17467]: time=2025-07-15T16:53:23.611-04:00 level=INFO source=server.go:135 msg="system memory" total="31.2 GiB" free="27.2 GiB" free_swap="30.1 GiB"
Jul 15 16:53:23 amobile ollama[17467]: time=2025-07-15T16:53:23.613-04:00 level=INFO source=server.go:175 msg=offload library=cpu layers.requested=-1 layers.model=35 layers.offload=0 layers.split="" memory.available="[27.2 GiB]" memory.gpu_overhead="0 B" memory.required.full="5.1 GiB" memory.required.partial="0 B" memory.required.kv="450.0 MiB" memory.required.allocations="[5.1 GiB]" memory.weights.total="2.3 GiB" memory.weights.repeating="1.8 GiB" memory.weights.nonrepeating="525.0 MiB" memory.graph.full="517.0 MiB" memory.graph.partial="1.0 GiB" projector.weights="795.9 MiB" projector.graph="1.0 GiB"
Jul 15 16:53:23 amobile ollama[17467]: time=2025-07-15T16:53:23.724-04:00 level=INFO source=server.go:438 msg="starting llama server" cmd="/usr/local/bin/ollama runner --ollama-engine --model /usr/share/ollama/.ollama/models/blobs/sha256-aeda25e63ebd698fab8638ffb778e68bed908b960d39d0becc650fa981609d25 --ctx-size 8192 --batch-size 512 --threads 28 --no-mmap --parallel 2 --port 39263"
Jul 15 16:53:23 amobile ollama[17467]: time=2025-07-15T16:53:23.724-04:00 level=INFO source=sched.go:483 msg="loaded runners" count=1
Jul 15 16:53:23 amobile ollama[17467]: time=2025-07-15T16:53:23.724-04:00 level=INFO source=server.go:598 msg="waiting for llama runner to start responding"
Jul 15 16:53:23 amobile ollama[17467]: time=2025-07-15T16:53:23.724-04:00 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server not responding"
Jul 15 16:53:23 amobile ollama[17467]: time=2025-07-15T16:53:23.739-04:00 level=INFO source=runner.go:925 msg="starting ollama engine"
Jul 15 16:53:23 amobile ollama[17467]: time=2025-07-15T16:53:23.740-04:00 level=INFO source=runner.go:983 msg="Server listening on 127.0.0.1:39263"
Jul 15 16:53:23 amobile ollama[17467]: time=2025-07-15T16:53:23.836-04:00 level=INFO source=ggml.go:92 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=883 num_key_values=36
Jul 15 16:53:23 amobile ollama[17467]: load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-haswell.so
Jul 15 16:53:23 amobile ollama[17467]: time=2025-07-15T16:53:23.842-04:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc)
Jul 15 16:53:23 amobile ollama[17467]: time=2025-07-15T16:53:23.848-04:00 level=INFO source=ggml.go:359 msg="offloading 0 repeating layers to GPU"
Jul 15 16:53:23 amobile ollama[17467]: time=2025-07-15T16:53:23.848-04:00 level=INFO source=ggml.go:363 msg="offloading output layer to CPU"
Jul 15 16:53:23 amobile ollama[17467]: time=2025-07-15T16:53:23.848-04:00 level=INFO source=ggml.go:375 msg="offloaded 0/35 layers to GPU"
Jul 15 16:53:23 amobile ollama[17467]: time=2025-07-15T16:53:23.848-04:00 level=INFO source=ggml.go:377 msg="model weights" buffer=CPU size="3.6 GiB"
Jul 15 16:53:23 amobile ollama[17467]: time=2025-07-15T16:53:23.976-04:00 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server loading model"
Jul 15 16:53:24 amobile ollama[17467]: time=2025-07-15T16:53:24.139-04:00 level=INFO source=ggml.go:666 msg="compute graph" backend=CPU buffer_type=CPU size="1.1 GiB"
Jul 15 16:53:24 amobile ollama[17467]: time=2025-07-15T16:53:24.409-04:00 level=INFO source=ggml.go:666 msg="compute graph" backend=CPU buffer_type=CPU size="1.1 GiB"
Jul 15 16:53:26 amobile ollama[17467]: time=2025-07-15T16:53:26.009-04:00 level=INFO source=server.go:637 msg="llama runner started in 2.28 seconds"
Jul 15 16:54:38 amobile ollama[17467]: [GIN] 2025/07/15 - 16:54:38 | 200 |         1m15s |       127.0.0.1 | POST     "/api/chat"
Jul 15 16:56:20 amobile ollama[17467]: [GIN] 2025/07/15 - 16:56:20 | 200 |          1m9s |       127.0.0.1 | POST     "/api/chat"
Jul 15 16:59:05 amobile ollama[17467]: [GIN] 2025/07/15 - 16:59:05 | 200 |         1m34s |       127.0.0.1 | POST     "/api/chat"
Jul 15 17:00:40 amobile ollama[17467]: [GIN] 2025/07/15 - 17:00:40 | 200 |         1m33s |       127.0.0.1 | POST     "/api/chat"
Jul 15 17:01:10 amobile ollama[17467]: [GIN] 2025/07/15 - 17:01:10 | 200 |     140.204µs |       127.0.0.1 | GET      "/api/version"
Jul 15 17:02:26 amobile ollama[17467]: [GIN] 2025/07/15 - 17:02:26 | 200 |         1m37s |       127.0.0.1 | POST     "/api/chat"
Jul 15 17:35:05 amobile ollama[17467]: [GIN] 2025/07/15 - 17:35:05 | 200 |      52.773µs |       127.0.0.1 | HEAD     "/"
Jul 15 17:35:05 amobile ollama[17467]: [GIN] 2025/07/15 - 17:35:05 | 200 |  204.188719ms |       127.0.0.1 | POST     "/api/show"
Jul 15 17:35:05 amobile ollama[17467]: time=2025-07-15T17:35:05.674-04:00 level=INFO source=server.go:135 msg="system memory" total="31.2 GiB" free="27.2 GiB" free_swap="30.1 GiB"
Jul 15 17:35:05 amobile ollama[17467]: time=2025-07-15T17:35:05.676-04:00 level=INFO source=server.go:175 msg=offload library=cpu layers.requested=-1 layers.model=35 layers.offload=0 layers.split="" memory.available="[27.2 GiB]" memory.gpu_overhead="0 B" memory.required.full="5.1 GiB" memory.required.partial="0 B" memory.required.kv="450.0 MiB" memory.required.allocations="[5.1 GiB]" memory.weights.total="2.3 GiB" memory.weights.repeating="1.8 GiB" memory.weights.nonrepeating="525.0 MiB" memory.graph.full="517.0 MiB" memory.graph.partial="1.0 GiB" projector.weights="795.9 MiB" projector.graph="1.0 GiB"
Jul 15 17:35:05 amobile ollama[17467]: time=2025-07-15T17:35:05.774-04:00 level=INFO source=server.go:438 msg="starting llama server" cmd="/usr/local/bin/ollama runner --ollama-engine --model /usr/share/ollama/.ollama/models/blobs/sha256-aeda25e63ebd698fab8638ffb778e68bed908b960d39d0becc650fa981609d25 --ctx-size 8192 --batch-size 512 --threads 28 --no-mmap --parallel 2 --port 36275"
Jul 15 17:35:05 amobile ollama[17467]: time=2025-07-15T17:35:05.774-04:00 level=INFO source=sched.go:483 msg="loaded runners" count=1
Jul 15 17:35:05 amobile ollama[17467]: time=2025-07-15T17:35:05.774-04:00 level=INFO source=server.go:598 msg="waiting for llama runner to start responding"
Jul 15 17:35:05 amobile ollama[17467]: time=2025-07-15T17:35:05.774-04:00 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server not responding"
Jul 15 17:35:05 amobile ollama[17467]: time=2025-07-15T17:35:05.788-04:00 level=INFO source=runner.go:925 msg="starting ollama engine"
Jul 15 17:35:05 amobile ollama[17467]: time=2025-07-15T17:35:05.790-04:00 level=INFO source=runner.go:983 msg="Server listening on 127.0.0.1:36275"
Jul 15 17:35:05 amobile ollama[17467]: time=2025-07-15T17:35:05.880-04:00 level=INFO source=ggml.go:92 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=883 num_key_values=36
Jul 15 17:35:05 amobile ollama[17467]: load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-haswell.so
Jul 15 17:35:05 amobile ollama[17467]: time=2025-07-15T17:35:05.888-04:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc)
Jul 15 17:35:05 amobile ollama[17467]: time=2025-07-15T17:35:05.893-04:00 level=INFO source=ggml.go:359 msg="offloading 0 repeating layers to GPU"
Jul 15 17:35:05 amobile ollama[17467]: time=2025-07-15T17:35:05.893-04:00 level=INFO source=ggml.go:363 msg="offloading output layer to CPU"
Jul 15 17:35:05 amobile ollama[17467]: time=2025-07-15T17:35:05.893-04:00 level=INFO source=ggml.go:375 msg="offloaded 0/35 layers to GPU"
Jul 15 17:35:05 amobile ollama[17467]: time=2025-07-15T17:35:05.893-04:00 level=INFO source=ggml.go:377 msg="model weights" buffer=CPU size="3.6 GiB"
Jul 15 17:35:06 amobile ollama[17467]: time=2025-07-15T17:35:06.025-04:00 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server loading model"
Jul 15 17:35:06 amobile ollama[17467]: time=2025-07-15T17:35:06.174-04:00 level=INFO source=ggml.go:666 msg="compute graph" backend=CPU buffer_type=CPU size="1.1 GiB"
Jul 15 17:35:06 amobile ollama[17467]: time=2025-07-15T17:35:06.447-04:00 level=INFO source=ggml.go:666 msg="compute graph" backend=CPU buffer_type=CPU size="1.1 GiB"
Jul 15 17:35:08 amobile ollama[17467]: time=2025-07-15T17:35:08.049-04:00 level=INFO source=server.go:637 msg="llama runner started in 2.27 seconds"
Jul 15 17:35:08 amobile ollama[17467]: [GIN] 2025/07/15 - 17:35:08 | 200 |  2.657473117s |       127.0.0.1 | POST     "/api/generate"
Jul 15 17:39:08 amobile ollama[17467]: [GIN] 2025/07/15 - 17:39:08 | 200 |          2m2s |       127.0.0.1 | POST     "/api/chat"
Jul 15 17:41:57 amobile ollama[17467]: [GIN] 2025/07/15 - 17:41:57 | 200 | 42.836773699s |       127.0.0.1 | POST     "/api/chat"
Jul 15 17:43:41 amobile ollama[17467]: [GIN] 2025/07/15 - 17:43:41 | 200 | 31.474378088s |       127.0.0.1 | POST     "/api/chat"

@abcbarryn commented on GitHub (Jul 15, 2025): ``` Jul 15 01:06:05 amobile ollama[17467]: [GIN] 2025/07/15 - 01:06:05 | 200 | 43.89µs | 127.0.0.1 | HEAD "/" Jul 15 01:06:05 amobile ollama[17467]: [GIN] 2025/07/15 - 01:06:05 | 200 | 34.607µs | 127.0.0.1 | GET "/api/ps" Jul 15 01:06:15 amobile ollama[17467]: [GIN] 2025/07/15 - 01:06:15 | 200 | 34.303µs | 127.0.0.1 | HEAD "/" Jul 15 01:06:15 amobile ollama[17467]: [GIN] 2025/07/15 - 01:06:15 | 200 | 199.449466ms | 127.0.0.1 | POST "/api/show" Jul 15 01:06:16 amobile ollama[17467]: time=2025-07-15T01:06:16.135-04:00 level=INFO source=server.go:135 msg="system memory" total="31.2 GiB" free="27.7 GiB" free_swap="29.7 GiB" Jul 15 01:06:16 amobile ollama[17467]: time=2025-07-15T01:06:16.138-04:00 level=INFO source=server.go:175 msg=offload library=cpu layers.requested=-1 layers.model=35 layers.offload=0 layers.split="" memory.available="[27.7 GiB]" memory.gpu_overhead="0 B" memory.required.full="5.1 GiB" memory.required.partial="0 B" memory.required.kv="450.0 MiB" memory.required.allocations="[5.1 GiB]" memory.weights.total="2.3 GiB" memory.weights.repeating="1.8 GiB" memory.weights.nonrepeating="525.0 MiB" memory.graph.full="517.0 MiB" memory.graph.partial="1.0 GiB" projector.weights="795.9 MiB" projector.graph="1.0 GiB" Jul 15 01:06:16 amobile ollama[17467]: time=2025-07-15T01:06:16.236-04:00 level=INFO source=server.go:438 msg="starting llama server" cmd="/usr/local/bin/ollama runner --ollama-engine --model /usr/share/ollama/.ollama/models/blobs/sha256-aeda25e63ebd698fab8638ffb778e68bed908b960d39d0becc650fa981609d25 --ctx-size 8192 --batch-size 512 --threads 28 --no-mmap --parallel 2 --port 33357" Jul 15 01:06:16 amobile ollama[17467]: time=2025-07-15T01:06:16.237-04:00 level=INFO source=sched.go:483 msg="loaded runners" count=1 Jul 15 01:06:16 amobile ollama[17467]: time=2025-07-15T01:06:16.237-04:00 level=INFO source=server.go:598 msg="waiting for llama runner to start responding" Jul 15 01:06:16 amobile ollama[17467]: time=2025-07-15T01:06:16.237-04:00 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server not responding" Jul 15 01:06:16 amobile ollama[17467]: time=2025-07-15T01:06:16.253-04:00 level=INFO source=runner.go:925 msg="starting ollama engine" Jul 15 01:06:16 amobile ollama[17467]: time=2025-07-15T01:06:16.254-04:00 level=INFO source=runner.go:983 msg="Server listening on 127.0.0.1:33357" Jul 15 01:06:16 amobile ollama[17467]: time=2025-07-15T01:06:16.344-04:00 level=INFO source=ggml.go:92 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=883 num_key_values=36 Jul 15 01:06:16 amobile ollama[17467]: load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-haswell.so Jul 15 01:06:16 amobile ollama[17467]: time=2025-07-15T01:06:16.351-04:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc) Jul 15 01:06:16 amobile ollama[17467]: time=2025-07-15T01:06:16.357-04:00 level=INFO source=ggml.go:359 msg="offloading 0 repeating layers to GPU" Jul 15 01:06:16 amobile ollama[17467]: time=2025-07-15T01:06:16.357-04:00 level=INFO source=ggml.go:363 msg="offloading output layer to CPU" Jul 15 01:06:16 amobile ollama[17467]: time=2025-07-15T01:06:16.357-04:00 level=INFO source=ggml.go:375 msg="offloaded 0/35 layers to GPU" Jul 15 01:06:16 amobile ollama[17467]: time=2025-07-15T01:06:16.357-04:00 level=INFO source=ggml.go:377 msg="model weights" buffer=CPU size="3.6 GiB" Jul 15 01:06:16 amobile ollama[17467]: time=2025-07-15T01:06:16.489-04:00 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server loading model" Jul 15 01:06:16 amobile ollama[17467]: time=2025-07-15T01:06:16.666-04:00 level=INFO source=ggml.go:666 msg="compute graph" backend=CPU buffer_type=CPU size="1.1 GiB" Jul 15 01:06:16 amobile ollama[17467]: time=2025-07-15T01:06:16.962-04:00 level=INFO source=ggml.go:666 msg="compute graph" backend=CPU buffer_type=CPU size="1.1 GiB" Jul 15 01:06:17 amobile ollama[17467]: time=2025-07-15T01:06:17.762-04:00 level=INFO source=server.go:637 msg="llama runner started in 1.53 seconds" Jul 15 01:06:17 amobile ollama[17467]: [GIN] 2025/07/15 - 01:06:17 | 200 | 1.92506674s | 127.0.0.1 | POST "/api/generate" Jul 15 01:09:41 amobile ollama[17467]: [GIN] 2025/07/15 - 01:09:41 | 200 | 2m42s | 127.0.0.1 | POST "/api/chat" Jul 15 01:12:48 amobile ollama[17467]: [GIN] 2025/07/15 - 01:12:48 | 200 | 2m0s | 127.0.0.1 | POST "/api/chat" Jul 15 01:19:34 amobile ollama[17467]: [GIN] 2025/07/15 - 01:19:34 | 200 | 2m42s | 127.0.0.1 | POST "/api/chat" Jul 15 01:24:42 amobile ollama[17467]: time=2025-07-15T01:24:42.026-04:00 level=INFO source=server.go:135 msg="system memory" total="31.2 GiB" free="27.6 GiB" free_swap="29.7 GiB" Jul 15 01:24:42 amobile ollama[17467]: time=2025-07-15T01:24:42.029-04:00 level=INFO source=server.go:175 msg=offload library=cpu layers.requested=-1 layers.model=35 layers.offload=0 layers.split="" memory.available="[27.6 GiB]" memory.gpu_overhead="0 B" memory.required.full="5.1 GiB" memory.required.partial="0 B" memory.required.kv="450.0 MiB" memory.required.allocations="[5.1 GiB]" memory.weights.total="2.3 GiB" memory.weights.repeating="1.8 GiB" memory.weights.nonrepeating="525.0 MiB" memory.graph.full="517.0 MiB" memory.graph.partial="1.0 GiB" projector.weights="795.9 MiB" projector.graph="1.0 GiB" Jul 15 01:24:42 amobile ollama[17467]: time=2025-07-15T01:24:42.148-04:00 level=INFO source=server.go:438 msg="starting llama server" cmd="/usr/local/bin/ollama runner --ollama-engine --model /usr/share/ollama/.ollama/models/blobs/sha256-aeda25e63ebd698fab8638ffb778e68bed908b960d39d0becc650fa981609d25 --ctx-size 8192 --batch-size 512 --threads 28 --no-mmap --parallel 2 --port 41521" Jul 15 01:24:42 amobile ollama[17467]: time=2025-07-15T01:24:42.148-04:00 level=INFO source=sched.go:483 msg="loaded runners" count=1 Jul 15 01:24:42 amobile ollama[17467]: time=2025-07-15T01:24:42.148-04:00 level=INFO source=server.go:598 msg="waiting for llama runner to start responding" Jul 15 01:24:42 amobile ollama[17467]: time=2025-07-15T01:24:42.149-04:00 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server not responding" Jul 15 01:24:42 amobile ollama[17467]: time=2025-07-15T01:24:42.168-04:00 level=INFO source=runner.go:925 msg="starting ollama engine" Jul 15 01:24:42 amobile ollama[17467]: time=2025-07-15T01:24:42.168-04:00 level=INFO source=runner.go:983 msg="Server listening on 127.0.0.1:41521" Jul 15 01:24:42 amobile ollama[17467]: time=2025-07-15T01:24:42.273-04:00 level=INFO source=ggml.go:92 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=883 num_key_values=36 Jul 15 01:24:42 amobile ollama[17467]: load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-haswell.so Jul 15 01:24:42 amobile ollama[17467]: time=2025-07-15T01:24:42.280-04:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc) Jul 15 01:24:42 amobile ollama[17467]: time=2025-07-15T01:24:42.285-04:00 level=INFO source=ggml.go:359 msg="offloading 0 repeating layers to GPU" Jul 15 01:24:42 amobile ollama[17467]: time=2025-07-15T01:24:42.286-04:00 level=INFO source=ggml.go:363 msg="offloading output layer to CPU" Jul 15 01:24:42 amobile ollama[17467]: time=2025-07-15T01:24:42.286-04:00 level=INFO source=ggml.go:375 msg="offloaded 0/35 layers to GPU" Jul 15 01:24:42 amobile ollama[17467]: time=2025-07-15T01:24:42.286-04:00 level=INFO source=ggml.go:377 msg="model weights" buffer=CPU size="3.6 GiB" Jul 15 01:24:42 amobile ollama[17467]: time=2025-07-15T01:24:42.401-04:00 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server loading model" Jul 15 01:24:42 amobile ollama[17467]: time=2025-07-15T01:24:42.570-04:00 level=INFO source=ggml.go:666 msg="compute graph" backend=CPU buffer_type=CPU size="1.1 GiB" Jul 15 01:24:42 amobile ollama[17467]: time=2025-07-15T01:24:42.747-04:00 level=INFO source=ggml.go:666 msg="compute graph" backend=CPU buffer_type=CPU size="1.1 GiB" Jul 15 01:24:43 amobile ollama[17467]: time=2025-07-15T01:24:43.911-04:00 level=INFO source=server.go:637 msg="llama runner started in 1.76 seconds" Jul 15 01:27:35 amobile ollama[17467]: [GIN] 2025/07/15 - 01:27:35 | 200 | 2m53s | 127.0.0.1 | POST "/api/chat" Jul 15 01:33:05 amobile ollama[17467]: [GIN] 2025/07/15 - 01:33:05 | 200 | 37.98540726s | 127.0.0.1 | POST "/api/chat" Jul 15 01:40:16 amobile ollama[17467]: time=2025-07-15T01:40:16.619-04:00 level=INFO source=server.go:135 msg="system memory" total="31.2 GiB" free="27.6 GiB" free_swap="29.7 GiB" Jul 15 01:40:16 amobile ollama[17467]: time=2025-07-15T01:40:16.621-04:00 level=INFO source=server.go:175 msg=offload library=cpu layers.requested=-1 layers.model=35 layers.offload=0 layers.split="" memory.available="[27.6 GiB]" memory.gpu_overhead="0 B" memory.required.full="5.1 GiB" memory.required.partial="0 B" memory.required.kv="450.0 MiB" memory.required.allocations="[5.1 GiB]" memory.weights.total="2.3 GiB" memory.weights.repeating="1.8 GiB" memory.weights.nonrepeating="525.0 MiB" memory.graph.full="517.0 MiB" memory.graph.partial="1.0 GiB" projector.weights="795.9 MiB" projector.graph="1.0 GiB" Jul 15 01:40:16 amobile ollama[17467]: time=2025-07-15T01:40:16.716-04:00 level=INFO source=server.go:438 msg="starting llama server" cmd="/usr/local/bin/ollama runner --ollama-engine --model /usr/share/ollama/.ollama/models/blobs/sha256-aeda25e63ebd698fab8638ffb778e68bed908b960d39d0becc650fa981609d25 --ctx-size 8192 --batch-size 512 --threads 28 --no-mmap --parallel 2 --port 41437" Jul 15 01:40:16 amobile ollama[17467]: time=2025-07-15T01:40:16.716-04:00 level=INFO source=sched.go:483 msg="loaded runners" count=1 Jul 15 01:40:16 amobile ollama[17467]: time=2025-07-15T01:40:16.716-04:00 level=INFO source=server.go:598 msg="waiting for llama runner to start responding" Jul 15 01:40:16 amobile ollama[17467]: time=2025-07-15T01:40:16.717-04:00 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server not responding" Jul 15 01:40:16 amobile ollama[17467]: time=2025-07-15T01:40:16.734-04:00 level=INFO source=runner.go:925 msg="starting ollama engine" Jul 15 01:40:16 amobile ollama[17467]: time=2025-07-15T01:40:16.736-04:00 level=INFO source=runner.go:983 msg="Server listening on 127.0.0.1:41437" Jul 15 01:40:16 amobile ollama[17467]: time=2025-07-15T01:40:16.827-04:00 level=INFO source=ggml.go:92 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=883 num_key_values=36 Jul 15 01:40:16 amobile ollama[17467]: load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-haswell.so Jul 15 01:40:16 amobile ollama[17467]: time=2025-07-15T01:40:16.834-04:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc) Jul 15 01:40:16 amobile ollama[17467]: time=2025-07-15T01:40:16.841-04:00 level=INFO source=ggml.go:359 msg="offloading 0 repeating layers to GPU" Jul 15 01:40:16 amobile ollama[17467]: time=2025-07-15T01:40:16.841-04:00 level=INFO source=ggml.go:363 msg="offloading output layer to CPU" Jul 15 01:40:16 amobile ollama[17467]: time=2025-07-15T01:40:16.841-04:00 level=INFO source=ggml.go:375 msg="offloaded 0/35 layers to GPU" Jul 15 01:40:16 amobile ollama[17467]: time=2025-07-15T01:40:16.841-04:00 level=INFO source=ggml.go:377 msg="model weights" buffer=CPU size="3.6 GiB" Jul 15 01:40:16 amobile ollama[17467]: time=2025-07-15T01:40:16.968-04:00 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server loading model" Jul 15 01:40:17 amobile ollama[17467]: time=2025-07-15T01:40:17.113-04:00 level=INFO source=ggml.go:666 msg="compute graph" backend=CPU buffer_type=CPU size="1.1 GiB" Jul 15 01:40:17 amobile ollama[17467]: time=2025-07-15T01:40:17.288-04:00 level=INFO source=ggml.go:666 msg="compute graph" backend=CPU buffer_type=CPU size="1.1 GiB" Jul 15 01:40:18 amobile ollama[17467]: time=2025-07-15T01:40:18.226-04:00 level=INFO source=server.go:637 msg="llama runner started in 1.51 seconds" Jul 15 01:41:38 amobile ollama[17467]: [GIN] 2025/07/15 - 01:41:38 | 200 | 1m22s | 127.0.0.1 | POST "/api/chat" Jul 15 01:44:40 amobile ollama[17467]: [GIN] 2025/07/15 - 01:44:40 | 200 | 1m3s | 127.0.0.1 | POST "/api/chat" Jul 15 01:49:32 amobile ollama[17467]: [GIN] 2025/07/15 - 01:49:32 | 200 | 26.3552658s | 127.0.0.1 | POST "/api/chat" Jul 15 02:00:49 amobile ollama[17467]: [GIN] 2025/07/15 - 02:00:49 | 200 | 39.415µs | 127.0.0.1 | HEAD "/" Jul 15 02:00:49 amobile ollama[17467]: [GIN] 2025/07/15 - 02:00:49 | 200 | 182.084631ms | 127.0.0.1 | POST "/api/show" Jul 15 02:00:50 amobile ollama[17467]: time=2025-07-15T02:00:50.063-04:00 level=INFO source=server.go:135 msg="system memory" total="31.2 GiB" free="27.6 GiB" free_swap="29.7 GiB" Jul 15 02:00:50 amobile ollama[17467]: time=2025-07-15T02:00:50.065-04:00 level=INFO source=server.go:175 msg=offload library=cpu layers.requested=-1 layers.model=35 layers.offload=0 layers.split="" memory.available="[27.6 GiB]" memory.gpu_overhead="0 B" memory.required.full="5.1 GiB" memory.required.partial="0 B" memory.required.kv="450.0 MiB" memory.required.allocations="[5.1 GiB]" memory.weights.total="2.3 GiB" memory.weights.repeating="1.8 GiB" memory.weights.nonrepeating="525.0 MiB" memory.graph.full="517.0 MiB" memory.graph.partial="1.0 GiB" projector.weights="795.9 MiB" projector.graph="1.0 GiB" Jul 15 02:00:50 amobile ollama[17467]: time=2025-07-15T02:00:50.166-04:00 level=INFO source=server.go:438 msg="starting llama server" cmd="/usr/local/bin/ollama runner --ollama-engine --model /usr/share/ollama/.ollama/models/blobs/sha256-aeda25e63ebd698fab8638ffb778e68bed908b960d39d0becc650fa981609d25 --ctx-size 8192 --batch-size 512 --threads 28 --no-mmap --parallel 2 --port 45897" Jul 15 02:00:50 amobile ollama[17467]: time=2025-07-15T02:00:50.167-04:00 level=INFO source=sched.go:483 msg="loaded runners" count=1 Jul 15 02:00:50 amobile ollama[17467]: time=2025-07-15T02:00:50.167-04:00 level=INFO source=server.go:598 msg="waiting for llama runner to start responding" Jul 15 02:00:50 amobile ollama[17467]: time=2025-07-15T02:00:50.167-04:00 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server not responding" Jul 15 02:00:50 amobile ollama[17467]: time=2025-07-15T02:00:50.185-04:00 level=INFO source=runner.go:925 msg="starting ollama engine" Jul 15 02:00:50 amobile ollama[17467]: time=2025-07-15T02:00:50.185-04:00 level=INFO source=runner.go:983 msg="Server listening on 127.0.0.1:45897" Jul 15 02:00:50 amobile ollama[17467]: time=2025-07-15T02:00:50.274-04:00 level=INFO source=ggml.go:92 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=883 num_key_values=36 Jul 15 02:00:50 amobile ollama[17467]: load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-haswell.so Jul 15 02:00:50 amobile ollama[17467]: time=2025-07-15T02:00:50.281-04:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc) Jul 15 02:00:50 amobile ollama[17467]: time=2025-07-15T02:00:50.286-04:00 level=INFO source=ggml.go:359 msg="offloading 0 repeating layers to GPU" Jul 15 02:00:50 amobile ollama[17467]: time=2025-07-15T02:00:50.287-04:00 level=INFO source=ggml.go:363 msg="offloading output layer to CPU" Jul 15 02:00:50 amobile ollama[17467]: time=2025-07-15T02:00:50.287-04:00 level=INFO source=ggml.go:375 msg="offloaded 0/35 layers to GPU" Jul 15 02:00:50 amobile ollama[17467]: time=2025-07-15T02:00:50.287-04:00 level=INFO source=ggml.go:377 msg="model weights" buffer=CPU size="3.6 GiB" Jul 15 02:00:50 amobile ollama[17467]: time=2025-07-15T02:00:50.419-04:00 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server loading model" Jul 15 02:00:50 amobile ollama[17467]: time=2025-07-15T02:00:50.575-04:00 level=INFO source=ggml.go:666 msg="compute graph" backend=CPU buffer_type=CPU size="1.1 GiB" Jul 15 02:00:50 amobile ollama[17467]: time=2025-07-15T02:00:50.848-04:00 level=INFO source=ggml.go:666 msg="compute graph" backend=CPU buffer_type=CPU size="1.1 GiB" Jul 15 02:00:51 amobile ollama[17467]: time=2025-07-15T02:00:51.684-04:00 level=INFO source=server.go:637 msg="llama runner started in 1.52 seconds" Jul 15 02:00:51 amobile ollama[17467]: [GIN] 2025/07/15 - 02:00:51 | 200 | 1.910645146s | 127.0.0.1 | POST "/api/generate" Jul 15 02:04:07 amobile ollama[17467]: [GIN] 2025/07/15 - 02:04:07 | 200 | 2m49s | 127.0.0.1 | POST "/api/chat" Jul 15 02:06:47 amobile ollama[17467]: [GIN] 2025/07/15 - 02:06:47 | 200 | 2m12s | 127.0.0.1 | POST "/api/chat" Jul 15 02:10:37 amobile ollama[17467]: [GIN] 2025/07/15 - 02:10:37 | 200 | 2m24s | 127.0.0.1 | POST "/api/chat" Jul 15 02:13:36 amobile ollama[17467]: [GIN] 2025/07/15 - 02:13:36 | 200 | 1m45s | 127.0.0.1 | POST "/api/chat" Jul 15 02:18:37 amobile ollama[17467]: [GIN] 2025/07/15 - 02:18:37 | 200 | 1m29s | 127.0.0.1 | POST "/api/chat" Jul 15 02:20:22 amobile ollama[17467]: [GIN] 2025/07/15 - 02:20:22 | 200 | 24.101449501s | 127.0.0.1 | POST "/api/chat" Jul 15 02:22:28 amobile ollama[17467]: [GIN] 2025/07/15 - 02:22:28 | 200 | 1m16s | 127.0.0.1 | POST "/api/chat" Jul 15 02:24:37 amobile ollama[17467]: [GIN] 2025/07/15 - 02:24:37 | 200 | 1m15s | 127.0.0.1 | POST "/api/chat" Jul 15 02:26:57 amobile ollama[17467]: [GIN] 2025/07/15 - 02:26:57 | 200 | 1m14s | 127.0.0.1 | POST "/api/chat" Jul 15 02:31:07 amobile ollama[17467]: [GIN] 2025/07/15 - 02:31:07 | 200 | 2m13s | 127.0.0.1 | POST "/api/chat" Jul 15 02:32:33 amobile ollama[17467]: [GIN] 2025/07/15 - 02:32:33 | 200 | 11.564672162s | 127.0.0.1 | POST "/api/chat" Jul 15 02:36:32 amobile ollama[17467]: [GIN] 2025/07/15 - 02:36:32 | 200 | 2m21s | 127.0.0.1 | POST "/api/chat" Jul 15 02:41:53 amobile ollama[17467]: [GIN] 2025/07/15 - 02:41:53 | 200 | 46.782257017s | 127.0.0.1 | POST "/api/chat" Jul 15 02:43:59 amobile ollama[17467]: [GIN] 2025/07/15 - 02:43:59 | 200 | 1m3s | 127.0.0.1 | POST "/api/chat" Jul 15 10:58:17 amobile ollama[17467]: time=2025-07-15T10:58:17.463-04:00 level=INFO source=server.go:135 msg="system memory" total="31.2 GiB" free="27.2 GiB" free_swap="30.0 GiB" Jul 15 10:58:17 amobile ollama[17467]: time=2025-07-15T10:58:17.465-04:00 level=INFO source=server.go:175 msg=offload library=cpu layers.requested=-1 layers.model=35 layers.offload=0 layers.split="" memory.available="[27.2 GiB]" memory.gpu_overhead="0 B" memory.required.full="5.1 GiB" memory.required.partial="0 B" memory.required.kv="450.0 MiB" memory.required.allocations="[5.1 GiB]" memory.weights.total="2.3 GiB" memory.weights.repeating="1.8 GiB" memory.weights.nonrepeating="525.0 MiB" memory.graph.full="517.0 MiB" memory.graph.partial="1.0 GiB" projector.weights="795.9 MiB" projector.graph="1.0 GiB" Jul 15 10:58:17 amobile ollama[17467]: time=2025-07-15T10:58:17.561-04:00 level=INFO source=server.go:438 msg="starting llama server" cmd="/usr/local/bin/ollama runner --ollama-engine --model /usr/share/ollama/.ollama/models/blobs/sha256-aeda25e63ebd698fab8638ffb778e68bed908b960d39d0becc650fa981609d25 --ctx-size 8192 --batch-size 512 --threads 28 --no-mmap --parallel 2 --port 41967" Jul 15 10:58:17 amobile ollama[17467]: time=2025-07-15T10:58:17.562-04:00 level=INFO source=sched.go:483 msg="loaded runners" count=1 Jul 15 10:58:17 amobile ollama[17467]: time=2025-07-15T10:58:17.562-04:00 level=INFO source=server.go:598 msg="waiting for llama runner to start responding" Jul 15 10:58:17 amobile ollama[17467]: time=2025-07-15T10:58:17.562-04:00 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server not responding" Jul 15 10:58:17 amobile ollama[17467]: time=2025-07-15T10:58:17.577-04:00 level=INFO source=runner.go:925 msg="starting ollama engine" Jul 15 10:58:17 amobile ollama[17467]: time=2025-07-15T10:58:17.577-04:00 level=INFO source=runner.go:983 msg="Server listening on 127.0.0.1:41967" Jul 15 10:58:17 amobile ollama[17467]: time=2025-07-15T10:58:17.664-04:00 level=INFO source=ggml.go:92 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=883 num_key_values=36 Jul 15 10:58:17 amobile ollama[17467]: load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-haswell.so Jul 15 10:58:17 amobile ollama[17467]: time=2025-07-15T10:58:17.671-04:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc) Jul 15 10:58:17 amobile ollama[17467]: time=2025-07-15T10:58:17.677-04:00 level=INFO source=ggml.go:359 msg="offloading 0 repeating layers to GPU" Jul 15 10:58:17 amobile ollama[17467]: time=2025-07-15T10:58:17.677-04:00 level=INFO source=ggml.go:363 msg="offloading output layer to CPU" Jul 15 10:58:17 amobile ollama[17467]: time=2025-07-15T10:58:17.677-04:00 level=INFO source=ggml.go:375 msg="offloaded 0/35 layers to GPU" Jul 15 10:58:17 amobile ollama[17467]: time=2025-07-15T10:58:17.677-04:00 level=INFO source=ggml.go:377 msg="model weights" buffer=CPU size="3.6 GiB" Jul 15 10:58:17 amobile ollama[17467]: time=2025-07-15T10:58:17.814-04:00 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server loading model" Jul 15 10:58:17 amobile ollama[17467]: time=2025-07-15T10:58:17.966-04:00 level=INFO source=ggml.go:666 msg="compute graph" backend=CPU buffer_type=CPU size="1.1 GiB" Jul 15 10:58:18 amobile ollama[17467]: time=2025-07-15T10:58:18.140-04:00 level=INFO source=ggml.go:666 msg="compute graph" backend=CPU buffer_type=CPU size="1.1 GiB" Jul 15 10:58:19 amobile ollama[17467]: time=2025-07-15T10:58:19.581-04:00 level=INFO source=server.go:637 msg="llama runner started in 2.02 seconds" Jul 15 10:58:23 amobile ollama[17467]: [GIN] 2025/07/15 - 10:58:23 | 200 | 6.619394623s | 127.0.0.1 | POST "/api/chat" Jul 15 10:58:51 amobile ollama[17467]: [GIN] 2025/07/15 - 10:58:51 | 200 | 2.678771024s | 127.0.0.1 | POST "/api/chat" Jul 15 10:59:28 amobile ollama[17467]: [GIN] 2025/07/15 - 10:59:28 | 200 | 16.793880245s | 127.0.0.1 | POST "/api/chat" Jul 15 11:01:09 amobile ollama[17467]: [GIN] 2025/07/15 - 11:01:09 | 200 | 5.583076216s | 127.0.0.1 | POST "/api/chat" Jul 15 12:24:46 amobile ollama[17467]: [GIN] 2025/07/15 - 12:24:46 | 200 | 56.614µs | 127.0.0.1 | HEAD "/" Jul 15 12:24:46 amobile ollama[17467]: [GIN] 2025/07/15 - 12:24:46 | 200 | 187.47452ms | 127.0.0.1 | POST "/api/show" Jul 15 12:24:47 amobile ollama[17467]: time=2025-07-15T12:24:47.225-04:00 level=INFO source=server.go:135 msg="system memory" total="31.2 GiB" free="27.2 GiB" free_swap="30.0 GiB" Jul 15 12:24:47 amobile ollama[17467]: time=2025-07-15T12:24:47.227-04:00 level=INFO source=server.go:175 msg=offload library=cpu layers.requested=-1 layers.model=35 layers.offload=0 layers.split="" memory.available="[27.2 GiB]" memory.gpu_overhead="0 B" memory.required.full="5.1 GiB" memory.required.partial="0 B" memory.required.kv="450.0 MiB" memory.required.allocations="[5.1 GiB]" memory.weights.total="2.3 GiB" memory.weights.repeating="1.8 GiB" memory.weights.nonrepeating="525.0 MiB" memory.graph.full="517.0 MiB" memory.graph.partial="1.0 GiB" projector.weights="795.9 MiB" projector.graph="1.0 GiB" Jul 15 12:24:47 amobile ollama[17467]: time=2025-07-15T12:24:47.325-04:00 level=INFO source=server.go:438 msg="starting llama server" cmd="/usr/local/bin/ollama runner --ollama-engine --model /usr/share/ollama/.ollama/models/blobs/sha256-aeda25e63ebd698fab8638ffb778e68bed908b960d39d0becc650fa981609d25 --ctx-size 8192 --batch-size 512 --threads 28 --no-mmap --parallel 2 --port 38447" Jul 15 12:24:47 amobile ollama[17467]: time=2025-07-15T12:24:47.325-04:00 level=INFO source=sched.go:483 msg="loaded runners" count=1 Jul 15 12:24:47 amobile ollama[17467]: time=2025-07-15T12:24:47.325-04:00 level=INFO source=server.go:598 msg="waiting for llama runner to start responding" Jul 15 12:24:47 amobile ollama[17467]: time=2025-07-15T12:24:47.325-04:00 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server not responding" Jul 15 12:24:47 amobile ollama[17467]: time=2025-07-15T12:24:47.340-04:00 level=INFO source=runner.go:925 msg="starting ollama engine" Jul 15 12:24:47 amobile ollama[17467]: time=2025-07-15T12:24:47.341-04:00 level=INFO source=runner.go:983 msg="Server listening on 127.0.0.1:38447" Jul 15 12:24:47 amobile ollama[17467]: time=2025-07-15T12:24:47.430-04:00 level=INFO source=ggml.go:92 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=883 num_key_values=36 Jul 15 12:24:47 amobile ollama[17467]: load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-haswell.so Jul 15 12:24:47 amobile ollama[17467]: time=2025-07-15T12:24:47.437-04:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc) Jul 15 12:24:47 amobile ollama[17467]: time=2025-07-15T12:24:47.443-04:00 level=INFO source=ggml.go:359 msg="offloading 0 repeating layers to GPU" Jul 15 12:24:47 amobile ollama[17467]: time=2025-07-15T12:24:47.443-04:00 level=INFO source=ggml.go:363 msg="offloading output layer to CPU" Jul 15 12:24:47 amobile ollama[17467]: time=2025-07-15T12:24:47.443-04:00 level=INFO source=ggml.go:375 msg="offloaded 0/35 layers to GPU" Jul 15 12:24:47 amobile ollama[17467]: time=2025-07-15T12:24:47.443-04:00 level=INFO source=ggml.go:377 msg="model weights" buffer=CPU size="3.6 GiB" Jul 15 12:24:47 amobile ollama[17467]: time=2025-07-15T12:24:47.576-04:00 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server loading model" Jul 15 12:24:47 amobile ollama[17467]: time=2025-07-15T12:24:47.743-04:00 level=INFO source=ggml.go:666 msg="compute graph" backend=CPU buffer_type=CPU size="1.1 GiB" Jul 15 12:24:48 amobile ollama[17467]: time=2025-07-15T12:24:48.011-04:00 level=INFO source=ggml.go:666 msg="compute graph" backend=CPU buffer_type=CPU size="1.1 GiB" Jul 15 12:24:49 amobile ollama[17467]: time=2025-07-15T12:24:49.594-04:00 level=INFO source=server.go:637 msg="llama runner started in 2.27 seconds" Jul 15 12:24:49 amobile ollama[17467]: [GIN] 2025/07/15 - 12:24:49 | 200 | 2.646873422s | 127.0.0.1 | POST "/api/generate" Jul 15 12:28:09 amobile ollama[17467]: [GIN] 2025/07/15 - 12:28:09 | 200 | 2m41s | 127.0.0.1 | POST "/api/chat" Jul 15 12:30:20 amobile ollama[17467]: [GIN] 2025/07/15 - 12:30:20 | 200 | 46.245741541s | 127.0.0.1 | POST "/api/chat" Jul 15 12:31:18 amobile ollama[17467]: [GIN] 2025/07/15 - 12:31:18 | 200 | 40.434622179s | 127.0.0.1 | POST "/api/chat" Jul 15 12:32:04 amobile ollama[17467]: [GIN] 2025/07/15 - 12:32:04 | 200 | 11.571502803s | 127.0.0.1 | POST "/api/chat" Jul 15 14:05:59 amobile ollama[17467]: time=2025-07-15T14:05:59.853-04:00 level=INFO source=server.go:135 msg="system memory" total="31.2 GiB" free="27.3 GiB" free_swap="30.1 GiB" Jul 15 14:05:59 amobile ollama[17467]: time=2025-07-15T14:05:59.855-04:00 level=INFO source=server.go:175 msg=offload library=cpu layers.requested=-1 layers.model=35 layers.offload=0 layers.split="" memory.available="[27.3 GiB]" memory.gpu_overhead="0 B" memory.required.full="5.1 GiB" memory.required.partial="0 B" memory.required.kv="450.0 MiB" memory.required.allocations="[5.1 GiB]" memory.weights.total="2.3 GiB" memory.weights.repeating="1.8 GiB" memory.weights.nonrepeating="525.0 MiB" memory.graph.full="517.0 MiB" memory.graph.partial="1.0 GiB" projector.weights="795.9 MiB" projector.graph="1.0 GiB" Jul 15 14:05:59 amobile ollama[17467]: time=2025-07-15T14:05:59.955-04:00 level=INFO source=server.go:438 msg="starting llama server" cmd="/usr/local/bin/ollama runner --ollama-engine --model /usr/share/ollama/.ollama/models/blobs/sha256-aeda25e63ebd698fab8638ffb778e68bed908b960d39d0becc650fa981609d25 --ctx-size 8192 --batch-size 512 --threads 28 --no-mmap --parallel 2 --port 43873" Jul 15 14:05:59 amobile ollama[17467]: time=2025-07-15T14:05:59.955-04:00 level=INFO source=sched.go:483 msg="loaded runners" count=1 Jul 15 14:05:59 amobile ollama[17467]: time=2025-07-15T14:05:59.955-04:00 level=INFO source=server.go:598 msg="waiting for llama runner to start responding" Jul 15 14:05:59 amobile ollama[17467]: time=2025-07-15T14:05:59.955-04:00 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server not responding" Jul 15 14:05:59 amobile ollama[17467]: time=2025-07-15T14:05:59.975-04:00 level=INFO source=runner.go:925 msg="starting ollama engine" Jul 15 14:05:59 amobile ollama[17467]: time=2025-07-15T14:05:59.976-04:00 level=INFO source=runner.go:983 msg="Server listening on 127.0.0.1:43873" Jul 15 14:06:00 amobile ollama[17467]: time=2025-07-15T14:06:00.077-04:00 level=INFO source=ggml.go:92 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=883 num_key_values=36 Jul 15 14:06:00 amobile ollama[17467]: load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-haswell.so Jul 15 14:06:00 amobile ollama[17467]: time=2025-07-15T14:06:00.084-04:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc) Jul 15 14:06:00 amobile ollama[17467]: time=2025-07-15T14:06:00.090-04:00 level=INFO source=ggml.go:359 msg="offloading 0 repeating layers to GPU" Jul 15 14:06:00 amobile ollama[17467]: time=2025-07-15T14:06:00.090-04:00 level=INFO source=ggml.go:363 msg="offloading output layer to CPU" Jul 15 14:06:00 amobile ollama[17467]: time=2025-07-15T14:06:00.090-04:00 level=INFO source=ggml.go:375 msg="offloaded 0/35 layers to GPU" Jul 15 14:06:00 amobile ollama[17467]: time=2025-07-15T14:06:00.090-04:00 level=INFO source=ggml.go:377 msg="model weights" buffer=CPU size="3.6 GiB" Jul 15 14:06:00 amobile ollama[17467]: time=2025-07-15T14:06:00.207-04:00 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server loading model" Jul 15 14:06:00 amobile ollama[17467]: time=2025-07-15T14:06:00.381-04:00 level=INFO source=ggml.go:666 msg="compute graph" backend=CPU buffer_type=CPU size="1.1 GiB" Jul 15 14:06:00 amobile ollama[17467]: time=2025-07-15T14:06:00.559-04:00 level=INFO source=ggml.go:666 msg="compute graph" backend=CPU buffer_type=CPU size="1.1 GiB" Jul 15 14:06:01 amobile ollama[17467]: time=2025-07-15T14:06:01.977-04:00 level=INFO source=server.go:637 msg="llama runner started in 2.02 seconds" Jul 15 14:06:14 amobile ollama[17467]: [GIN] 2025/07/15 - 14:06:14 | 200 | 15.243695865s | 127.0.0.1 | POST "/api/chat" Jul 15 14:08:49 amobile ollama[17467]: [GIN] 2025/07/15 - 14:08:49 | 200 | 7.81688474s | 127.0.0.1 | POST "/api/chat" Jul 15 14:10:32 amobile ollama[17467]: [GIN] 2025/07/15 - 14:10:32 | 200 | 5.57302473s | 127.0.0.1 | POST "/api/chat" Jul 15 14:11:52 amobile ollama[17467]: [GIN] 2025/07/15 - 14:11:52 | 200 | 4.563861397s | 127.0.0.1 | POST "/api/chat" Jul 15 14:12:21 amobile ollama[17467]: [GIN] 2025/07/15 - 14:12:21 | 200 | 7.445482222s | 127.0.0.1 | POST "/api/chat" Jul 15 14:13:40 amobile ollama[17467]: [GIN] 2025/07/15 - 14:13:40 | 200 | 8.141572187s | 127.0.0.1 | POST "/api/chat" Jul 15 14:15:38 amobile ollama[17467]: [GIN] 2025/07/15 - 14:15:38 | 200 | 11.704692526s | 127.0.0.1 | POST "/api/chat" Jul 15 14:17:23 amobile ollama[17467]: [GIN] 2025/07/15 - 14:17:23 | 200 | 8.167364087s | 127.0.0.1 | POST "/api/chat" Jul 15 14:19:24 amobile ollama[17467]: [GIN] 2025/07/15 - 14:19:24 | 200 | 8.725596273s | 127.0.0.1 | POST "/api/chat" Jul 15 14:20:54 amobile ollama[17467]: [GIN] 2025/07/15 - 14:20:54 | 200 | 7.287166608s | 127.0.0.1 | POST "/api/chat" Jul 15 14:23:09 amobile ollama[17467]: [GIN] 2025/07/15 - 14:23:09 | 200 | 7.923928608s | 127.0.0.1 | POST "/api/chat" Jul 15 14:28:00 amobile ollama[17467]: [GIN] 2025/07/15 - 14:28:00 | 200 | 9.638210214s | 127.0.0.1 | POST "/api/chat" Jul 15 14:30:20 amobile ollama[17467]: [GIN] 2025/07/15 - 14:30:20 | 200 | 11.398374606s | 127.0.0.1 | POST "/api/chat" Jul 15 14:32:11 amobile ollama[17467]: [GIN] 2025/07/15 - 14:32:11 | 200 | 6.043563297s | 127.0.0.1 | POST "/api/chat" Jul 15 14:35:04 amobile ollama[17467]: [GIN] 2025/07/15 - 14:35:04 | 200 | 10.856669259s | 127.0.0.1 | POST "/api/chat" Jul 15 14:38:09 amobile ollama[17467]: [GIN] 2025/07/15 - 14:38:09 | 200 | 10.15686096s | 127.0.0.1 | POST "/api/chat" Jul 15 14:39:15 amobile ollama[17467]: [GIN] 2025/07/15 - 14:39:15 | 200 | 6.940100831s | 127.0.0.1 | POST "/api/chat" Jul 15 14:41:27 amobile ollama[17467]: [GIN] 2025/07/15 - 14:41:27 | 200 | 12.226914926s | 127.0.0.1 | POST "/api/chat" Jul 15 14:44:28 amobile ollama[17467]: [GIN] 2025/07/15 - 14:44:28 | 200 | 23.563901162s | 127.0.0.1 | POST "/api/chat" Jul 15 14:48:26 amobile ollama[17467]: [GIN] 2025/07/15 - 14:48:26 | 200 | 33.467552018s | 127.0.0.1 | POST "/api/chat" Jul 15 14:50:38 amobile ollama[17467]: [GIN] 2025/07/15 - 14:50:38 | 200 | 32.005724467s | 127.0.0.1 | POST "/api/chat" Jul 15 14:52:51 amobile ollama[17467]: [GIN] 2025/07/15 - 14:52:51 | 200 | 34.606237099s | 127.0.0.1 | POST "/api/chat" Jul 15 14:55:03 amobile ollama[17467]: [GIN] 2025/07/15 - 14:55:03 | 200 | 39.147461074s | 127.0.0.1 | POST "/api/chat" Jul 15 14:57:11 amobile ollama[17467]: [GIN] 2025/07/15 - 14:57:11 | 200 | 37.597258006s | 127.0.0.1 | POST "/api/chat" Jul 15 14:57:45 amobile ollama[17467]: [GIN] 2025/07/15 - 14:57:45 | 200 | 60.643µs | 127.0.0.1 | HEAD "/" Jul 15 14:57:45 amobile ollama[17467]: [GIN] 2025/07/15 - 14:57:45 | 200 | 75.829µs | 127.0.0.1 | GET "/api/ps" Jul 15 14:57:49 amobile ollama[17467]: [GIN] 2025/07/15 - 14:57:49 | 200 | 5.326923899s | 127.0.0.1 | POST "/api/chat" Jul 15 14:58:40 amobile ollama[17467]: [GIN] 2025/07/15 - 14:58:40 | 200 | 39.146µs | 127.0.0.1 | HEAD "/" Jul 15 14:58:40 amobile ollama[17467]: [GIN] 2025/07/15 - 14:58:40 | 200 | 37.34µs | 127.0.0.1 | GET "/api/ps" Jul 15 15:31:24 amobile ollama[17467]: [GIN] 2025/07/15 - 15:31:24 | 200 | 39.551µs | 127.0.0.1 | HEAD "/" Jul 15 15:31:24 amobile ollama[17467]: [GIN] 2025/07/15 - 15:31:24 | 200 | 193.354766ms | 127.0.0.1 | POST "/api/show" Jul 15 15:31:25 amobile ollama[17467]: time=2025-07-15T15:31:25.125-04:00 level=INFO source=server.go:135 msg="system memory" total="31.2 GiB" free="27.2 GiB" free_swap="30.1 GiB" Jul 15 15:31:25 amobile ollama[17467]: time=2025-07-15T15:31:25.127-04:00 level=INFO source=server.go:175 msg=offload library=cpu layers.requested=-1 layers.model=35 layers.offload=0 layers.split="" memory.available="[27.2 GiB]" memory.gpu_overhead="0 B" memory.required.full="5.1 GiB" memory.required.partial="0 B" memory.required.kv="450.0 MiB" memory.required.allocations="[5.1 GiB]" memory.weights.total="2.3 GiB" memory.weights.repeating="1.8 GiB" memory.weights.nonrepeating="525.0 MiB" memory.graph.full="517.0 MiB" memory.graph.partial="1.0 GiB" projector.weights="795.9 MiB" projector.graph="1.0 GiB" Jul 15 15:31:25 amobile ollama[17467]: time=2025-07-15T15:31:25.223-04:00 level=INFO source=server.go:438 msg="starting llama server" cmd="/usr/local/bin/ollama runner --ollama-engine --model /usr/share/ollama/.ollama/models/blobs/sha256-aeda25e63ebd698fab8638ffb778e68bed908b960d39d0becc650fa981609d25 --ctx-size 8192 --batch-size 512 --threads 28 --no-mmap --parallel 2 --port 35539" Jul 15 15:31:25 amobile ollama[17467]: time=2025-07-15T15:31:25.223-04:00 level=INFO source=sched.go:483 msg="loaded runners" count=1 Jul 15 15:31:25 amobile ollama[17467]: time=2025-07-15T15:31:25.223-04:00 level=INFO source=server.go:598 msg="waiting for llama runner to start responding" Jul 15 15:31:25 amobile ollama[17467]: time=2025-07-15T15:31:25.223-04:00 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server not responding" Jul 15 15:31:25 amobile ollama[17467]: time=2025-07-15T15:31:25.237-04:00 level=INFO source=runner.go:925 msg="starting ollama engine" Jul 15 15:31:25 amobile ollama[17467]: time=2025-07-15T15:31:25.237-04:00 level=INFO source=runner.go:983 msg="Server listening on 127.0.0.1:35539" Jul 15 15:31:25 amobile ollama[17467]: time=2025-07-15T15:31:25.331-04:00 level=INFO source=ggml.go:92 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=883 num_key_values=36 Jul 15 15:31:25 amobile ollama[17467]: load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-haswell.so Jul 15 15:31:25 amobile ollama[17467]: time=2025-07-15T15:31:25.337-04:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc) Jul 15 15:31:25 amobile ollama[17467]: time=2025-07-15T15:31:25.343-04:00 level=INFO source=ggml.go:359 msg="offloading 0 repeating layers to GPU" Jul 15 15:31:25 amobile ollama[17467]: time=2025-07-15T15:31:25.343-04:00 level=INFO source=ggml.go:363 msg="offloading output layer to CPU" Jul 15 15:31:25 amobile ollama[17467]: time=2025-07-15T15:31:25.343-04:00 level=INFO source=ggml.go:375 msg="offloaded 0/35 layers to GPU" Jul 15 15:31:25 amobile ollama[17467]: time=2025-07-15T15:31:25.343-04:00 level=INFO source=ggml.go:377 msg="model weights" buffer=CPU size="3.6 GiB" Jul 15 15:31:25 amobile ollama[17467]: time=2025-07-15T15:31:25.475-04:00 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server loading model" Jul 15 15:31:25 amobile ollama[17467]: time=2025-07-15T15:31:25.638-04:00 level=INFO source=ggml.go:666 msg="compute graph" backend=CPU buffer_type=CPU size="1.1 GiB" Jul 15 15:31:25 amobile ollama[17467]: time=2025-07-15T15:31:25.913-04:00 level=INFO source=ggml.go:666 msg="compute graph" backend=CPU buffer_type=CPU size="1.1 GiB" Jul 15 15:31:27 amobile ollama[17467]: time=2025-07-15T15:31:27.248-04:00 level=INFO source=server.go:637 msg="llama runner started in 2.02 seconds" Jul 15 15:31:27 amobile ollama[17467]: [GIN] 2025/07/15 - 15:31:27 | 200 | 2.458372036s | 127.0.0.1 | POST "/api/generate" Jul 15 15:32:06 amobile ollama[17467]: [GIN] 2025/07/15 - 15:32:06 | 200 | 12.149772723s | 127.0.0.1 | POST "/api/chat" Jul 15 15:32:50 amobile ollama[17467]: [GIN] 2025/07/15 - 15:32:50 | 200 | 19.018623689s | 127.0.0.1 | POST "/api/chat" Jul 15 15:33:40 amobile ollama[17467]: [GIN] 2025/07/15 - 15:33:40 | 200 | 19.08334206s | 127.0.0.1 | POST "/api/chat" Jul 15 15:34:59 amobile ollama[17467]: [GIN] 2025/07/15 - 15:34:59 | 200 | 26.413488556s | 127.0.0.1 | POST "/api/chat" Jul 15 15:37:22 amobile ollama[17467]: [GIN] 2025/07/15 - 15:37:22 | 200 | 27.871657486s | 127.0.0.1 | POST "/api/chat" Jul 15 15:38:35 amobile ollama[17467]: [GIN] 2025/07/15 - 15:38:35 | 200 | 27.072325445s | 127.0.0.1 | POST "/api/chat" Jul 15 15:40:25 amobile ollama[17467]: [GIN] 2025/07/15 - 15:40:25 | 200 | 38.748835117s | 127.0.0.1 | POST "/api/chat" Jul 15 15:42:57 amobile ollama[17467]: [GIN] 2025/07/15 - 15:42:57 | 200 | 3.887759995s | 127.0.0.1 | POST "/api/chat" Jul 15 15:47:15 amobile ollama[17467]: [GIN] 2025/07/15 - 15:47:15 | 200 | 32.823251665s | 127.0.0.1 | POST "/api/chat" Jul 15 15:47:48 amobile ollama[17467]: [GIN] 2025/07/15 - 15:47:48 | 200 | 8.797014247s | 127.0.0.1 | POST "/api/chat" Jul 15 15:48:16 amobile ollama[17467]: [GIN] 2025/07/15 - 15:48:16 | 200 | 23.792104523s | 127.0.0.1 | POST "/api/chat" Jul 15 15:48:38 amobile ollama[17467]: [GIN] 2025/07/15 - 15:48:38 | 200 | 9.21259348s | 127.0.0.1 | POST "/api/chat" Jul 15 15:49:31 amobile ollama[17467]: [GIN] 2025/07/15 - 15:49:31 | 200 | 21.710448987s | 127.0.0.1 | POST "/api/chat" Jul 15 15:49:35 amobile ollama[17467]: [GIN] 2025/07/15 - 15:49:35 | 200 | 32.825344864s | 127.0.0.1 | POST "/api/chat" Jul 15 15:49:40 amobile ollama[17467]: [GIN] 2025/07/15 - 15:49:40 | 200 | 7.626245474s | 127.0.0.1 | POST "/api/chat" Jul 15 15:50:47 amobile ollama[17467]: [GIN] 2025/07/15 - 15:50:47 | 200 | 6.319395124s | 127.0.0.1 | POST "/api/chat" Jul 15 15:50:55 amobile ollama[17467]: [GIN] 2025/07/15 - 15:50:55 | 200 | 16.482319877s | 127.0.0.1 | POST "/api/chat" Jul 15 16:00:37 amobile ollama[17467]: time=2025-07-15T16:00:37.011-04:00 level=INFO source=server.go:135 msg="system memory" total="31.2 GiB" free="27.1 GiB" free_swap="30.1 GiB" Jul 15 16:00:37 amobile ollama[17467]: time=2025-07-15T16:00:37.013-04:00 level=INFO source=server.go:175 msg=offload library=cpu layers.requested=-1 layers.model=35 layers.offload=0 layers.split="" memory.available="[27.1 GiB]" memory.gpu_overhead="0 B" memory.required.full="5.1 GiB" memory.required.partial="0 B" memory.required.kv="450.0 MiB" memory.required.allocations="[5.1 GiB]" memory.weights.total="2.3 GiB" memory.weights.repeating="1.8 GiB" memory.weights.nonrepeating="525.0 MiB" memory.graph.full="517.0 MiB" memory.graph.partial="1.0 GiB" projector.weights="795.9 MiB" projector.graph="1.0 GiB" Jul 15 16:00:37 amobile ollama[17467]: time=2025-07-15T16:00:37.108-04:00 level=INFO source=server.go:438 msg="starting llama server" cmd="/usr/local/bin/ollama runner --ollama-engine --model /usr/share/ollama/.ollama/models/blobs/sha256-aeda25e63ebd698fab8638ffb778e68bed908b960d39d0becc650fa981609d25 --ctx-size 8192 --batch-size 512 --threads 28 --no-mmap --parallel 2 --port 40683" Jul 15 16:00:37 amobile ollama[17467]: time=2025-07-15T16:00:37.108-04:00 level=INFO source=sched.go:483 msg="loaded runners" count=1 Jul 15 16:00:37 amobile ollama[17467]: time=2025-07-15T16:00:37.109-04:00 level=INFO source=server.go:598 msg="waiting for llama runner to start responding" Jul 15 16:00:37 amobile ollama[17467]: time=2025-07-15T16:00:37.109-04:00 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server not responding" Jul 15 16:00:37 amobile ollama[17467]: time=2025-07-15T16:00:37.136-04:00 level=INFO source=runner.go:925 msg="starting ollama engine" Jul 15 16:00:37 amobile ollama[17467]: time=2025-07-15T16:00:37.136-04:00 level=INFO source=runner.go:983 msg="Server listening on 127.0.0.1:40683" Jul 15 16:00:37 amobile ollama[17467]: time=2025-07-15T16:00:37.233-04:00 level=INFO source=ggml.go:92 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=883 num_key_values=36 Jul 15 16:00:37 amobile ollama[17467]: load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-haswell.so Jul 15 16:00:37 amobile ollama[17467]: time=2025-07-15T16:00:37.240-04:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc) Jul 15 16:00:37 amobile ollama[17467]: time=2025-07-15T16:00:37.245-04:00 level=INFO source=ggml.go:359 msg="offloading 0 repeating layers to GPU" Jul 15 16:00:37 amobile ollama[17467]: time=2025-07-15T16:00:37.245-04:00 level=INFO source=ggml.go:363 msg="offloading output layer to CPU" Jul 15 16:00:37 amobile ollama[17467]: time=2025-07-15T16:00:37.245-04:00 level=INFO source=ggml.go:375 msg="offloaded 0/35 layers to GPU" Jul 15 16:00:37 amobile ollama[17467]: time=2025-07-15T16:00:37.245-04:00 level=INFO source=ggml.go:377 msg="model weights" buffer=CPU size="3.6 GiB" Jul 15 16:00:37 amobile ollama[17467]: time=2025-07-15T16:00:37.361-04:00 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server loading model" Jul 15 16:00:37 amobile ollama[17467]: time=2025-07-15T16:00:37.537-04:00 level=INFO source=ggml.go:666 msg="compute graph" backend=CPU buffer_type=CPU size="1.1 GiB" Jul 15 16:00:37 amobile ollama[17467]: time=2025-07-15T16:00:37.714-04:00 level=INFO source=ggml.go:666 msg="compute graph" backend=CPU buffer_type=CPU size="1.1 GiB" Jul 15 16:00:39 amobile ollama[17467]: time=2025-07-15T16:00:39.125-04:00 level=INFO source=server.go:637 msg="llama runner started in 2.02 seconds" Jul 15 16:00:50 amobile ollama[17467]: [GIN] 2025/07/15 - 16:00:50 | 200 | 13.453382374s | 127.0.0.1 | POST "/api/chat" Jul 15 16:01:37 amobile ollama[17467]: [GIN] 2025/07/15 - 16:01:37 | 200 | 7.010397526s | 127.0.0.1 | POST "/api/chat" Jul 15 16:02:47 amobile ollama[17467]: [GIN] 2025/07/15 - 16:02:47 | 200 | 9.218088928s | 127.0.0.1 | POST "/api/chat" Jul 15 16:03:40 amobile ollama[17467]: [GIN] 2025/07/15 - 16:03:40 | 200 | 8.419104993s | 127.0.0.1 | POST "/api/chat" Jul 15 16:06:18 amobile ollama[17467]: [GIN] 2025/07/15 - 16:06:18 | 200 | 44.640608373s | 127.0.0.1 | POST "/api/chat" Jul 15 16:06:50 amobile ollama[17467]: [GIN] 2025/07/15 - 16:06:50 | 200 | 17.14751161s | 127.0.0.1 | POST "/api/chat" Jul 15 16:07:39 amobile ollama[17467]: [GIN] 2025/07/15 - 16:07:39 | 200 | 38.689371786s | 127.0.0.1 | POST "/api/chat" Jul 15 16:08:12 amobile ollama[17467]: [GIN] 2025/07/15 - 16:08:12 | 200 | 32.899070555s | 127.0.0.1 | POST "/api/chat" Jul 15 16:08:34 amobile ollama[17467]: time=2025-07-15T16:08:34.345-04:00 level=WARN source=runner.go:157 msg="truncating input prompt" limit=4096 prompt=19010 keep=4 new=4096 Jul 15 16:09:46 amobile ollama[17467]: [GIN] 2025/07/15 - 16:09:46 | 200 | 25.107300894s | 127.0.0.1 | POST "/api/chat" Jul 15 16:12:18 amobile ollama[17467]: [GIN] 2025/07/15 - 16:12:18 | 200 | 3m43s | 127.0.0.1 | POST "/api/chat" Jul 15 16:12:19 amobile ollama[17467]: [GIN] 2025/07/15 - 16:12:19 | 200 | 1m36s | 127.0.0.1 | POST "/api/chat" Jul 15 16:15:05 amobile ollama[17467]: [GIN] 2025/07/15 - 16:15:05 | 200 | 1m53s | 127.0.0.1 | POST "/api/chat" Jul 15 16:15:38 amobile ollama[17467]: [GIN] 2025/07/15 - 16:15:38 | 200 | 1m14s | 127.0.0.1 | POST "/api/chat" Jul 15 16:15:48 amobile ollama[17467]: [GIN] 2025/07/15 - 16:15:48 | 200 | 33.070491476s | 127.0.0.1 | POST "/api/chat" Jul 15 16:17:31 amobile ollama[17467]: [GIN] 2025/07/15 - 16:17:31 | 200 | 1m0s | 127.0.0.1 | POST "/api/chat" Jul 15 16:21:06 amobile ollama[17467]: [GIN] 2025/07/15 - 16:21:06 | 200 | 34.303595005s | 127.0.0.1 | POST "/api/chat" Jul 15 16:21:32 amobile ollama[17467]: panic: failed to decode batch: could not find a kv cache slot (cache: 2560 batch: 512) Jul 15 16:21:32 amobile ollama[17467]: goroutine 8 [running]: Jul 15 16:21:32 amobile ollama[17467]: github.com/ollama/ollama/runner/ollamarunner.(*Server).run(0xc0002b6900, {0x55867a095700, 0xc0000008c0}) Jul 15 16:21:32 amobile ollama[17467]: github.com/ollama/ollama/runner/ollamarunner/runner.go:364 +0x65 Jul 15 16:21:32 amobile ollama[17467]: created by github.com/ollama/ollama/runner/ollamarunner.Execute in goroutine 1 Jul 15 16:21:32 amobile ollama[17467]: github.com/ollama/ollama/runner/ollamarunner/runner.go:960 +0xa74 Jul 15 16:21:32 amobile ollama[17467]: time=2025-07-15T16:21:32.337-04:00 level=ERROR source=server.go:807 msg="post predict" error="Post \"http://127.0.0.1:40683/completion\": EOF" Jul 15 16:21:32 amobile ollama[17467]: [GIN] 2025/07/15 - 16:21:32 | 500 | 11.994152362s | 127.0.0.1 | POST "/api/chat" Jul 15 16:22:39 amobile ollama[17467]: time=2025-07-15T16:22:39.458-04:00 level=INFO source=server.go:135 msg="system memory" total="31.2 GiB" free="27.2 GiB" free_swap="30.1 GiB" Jul 15 16:22:39 amobile ollama[17467]: time=2025-07-15T16:22:39.461-04:00 level=INFO source=server.go:175 msg=offload library=cpu layers.requested=-1 layers.model=35 layers.offload=0 layers.split="" memory.available="[27.2 GiB]" memory.gpu_overhead="0 B" memory.required.full="5.1 GiB" memory.required.partial="0 B" memory.required.kv="450.0 MiB" memory.required.allocations="[5.1 GiB]" memory.weights.total="2.3 GiB" memory.weights.repeating="1.8 GiB" memory.weights.nonrepeating="525.0 MiB" memory.graph.full="517.0 MiB" memory.graph.partial="1.0 GiB" projector.weights="795.9 MiB" projector.graph="1.0 GiB" Jul 15 16:22:39 amobile ollama[17467]: time=2025-07-15T16:22:39.562-04:00 level=INFO source=server.go:438 msg="starting llama server" cmd="/usr/local/bin/ollama runner --ollama-engine --model /usr/share/ollama/.ollama/models/blobs/sha256-aeda25e63ebd698fab8638ffb778e68bed908b960d39d0becc650fa981609d25 --ctx-size 8192 --batch-size 512 --threads 28 --no-mmap --parallel 2 --port 45949" Jul 15 16:22:39 amobile ollama[17467]: time=2025-07-15T16:22:39.562-04:00 level=INFO source=sched.go:483 msg="loaded runners" count=1 Jul 15 16:22:39 amobile ollama[17467]: time=2025-07-15T16:22:39.562-04:00 level=INFO source=server.go:598 msg="waiting for llama runner to start responding" Jul 15 16:22:39 amobile ollama[17467]: time=2025-07-15T16:22:39.563-04:00 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server not responding" Jul 15 16:22:39 amobile ollama[17467]: time=2025-07-15T16:22:39.581-04:00 level=INFO source=runner.go:925 msg="starting ollama engine" Jul 15 16:22:39 amobile ollama[17467]: time=2025-07-15T16:22:39.582-04:00 level=INFO source=runner.go:983 msg="Server listening on 127.0.0.1:45949" Jul 15 16:22:39 amobile ollama[17467]: time=2025-07-15T16:22:39.678-04:00 level=INFO source=ggml.go:92 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=883 num_key_values=36 Jul 15 16:22:39 amobile ollama[17467]: load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-haswell.so Jul 15 16:22:39 amobile ollama[17467]: time=2025-07-15T16:22:39.684-04:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc) Jul 15 16:22:39 amobile ollama[17467]: time=2025-07-15T16:22:39.690-04:00 level=INFO source=ggml.go:359 msg="offloading 0 repeating layers to GPU" Jul 15 16:22:39 amobile ollama[17467]: time=2025-07-15T16:22:39.690-04:00 level=INFO source=ggml.go:363 msg="offloading output layer to CPU" Jul 15 16:22:39 amobile ollama[17467]: time=2025-07-15T16:22:39.690-04:00 level=INFO source=ggml.go:375 msg="offloaded 0/35 layers to GPU" Jul 15 16:22:39 amobile ollama[17467]: time=2025-07-15T16:22:39.690-04:00 level=INFO source=ggml.go:377 msg="model weights" buffer=CPU size="3.6 GiB" Jul 15 16:22:39 amobile ollama[17467]: time=2025-07-15T16:22:39.814-04:00 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server loading model" Jul 15 16:22:39 amobile ollama[17467]: time=2025-07-15T16:22:39.973-04:00 level=INFO source=ggml.go:666 msg="compute graph" backend=CPU buffer_type=CPU size="1.1 GiB" Jul 15 16:22:40 amobile ollama[17467]: time=2025-07-15T16:22:40.238-04:00 level=INFO source=ggml.go:666 msg="compute graph" backend=CPU buffer_type=CPU size="1.1 GiB" Jul 15 16:22:41 amobile ollama[17467]: time=2025-07-15T16:22:41.844-04:00 level=INFO source=server.go:637 msg="llama runner started in 2.28 seconds" Jul 15 16:23:21 amobile ollama[17467]: [GIN] 2025/07/15 - 16:23:21 | 200 | 42.358125818s | 127.0.0.1 | POST "/api/chat" Jul 15 16:35:49 amobile ollama[17467]: time=2025-07-15T16:35:49.962-04:00 level=INFO source=server.go:135 msg="system memory" total="31.2 GiB" free="27.2 GiB" free_swap="30.1 GiB" Jul 15 16:35:49 amobile ollama[17467]: time=2025-07-15T16:35:49.964-04:00 level=INFO source=server.go:175 msg=offload library=cpu layers.requested=-1 layers.model=35 layers.offload=0 layers.split="" memory.available="[27.2 GiB]" memory.gpu_overhead="0 B" memory.required.full="5.1 GiB" memory.required.partial="0 B" memory.required.kv="450.0 MiB" memory.required.allocations="[5.1 GiB]" memory.weights.total="2.3 GiB" memory.weights.repeating="1.8 GiB" memory.weights.nonrepeating="525.0 MiB" memory.graph.full="517.0 MiB" memory.graph.partial="1.0 GiB" projector.weights="795.9 MiB" projector.graph="1.0 GiB" Jul 15 16:35:50 amobile ollama[17467]: time=2025-07-15T16:35:50.062-04:00 level=INFO source=server.go:438 msg="starting llama server" cmd="/usr/local/bin/ollama runner --ollama-engine --model /usr/share/ollama/.ollama/models/blobs/sha256-aeda25e63ebd698fab8638ffb778e68bed908b960d39d0becc650fa981609d25 --ctx-size 8192 --batch-size 512 --threads 28 --no-mmap --parallel 2 --port 36385" Jul 15 16:35:50 amobile ollama[17467]: time=2025-07-15T16:35:50.062-04:00 level=INFO source=sched.go:483 msg="loaded runners" count=1 Jul 15 16:35:50 amobile ollama[17467]: time=2025-07-15T16:35:50.063-04:00 level=INFO source=server.go:598 msg="waiting for llama runner to start responding" Jul 15 16:35:50 amobile ollama[17467]: time=2025-07-15T16:35:50.063-04:00 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server not responding" Jul 15 16:35:50 amobile ollama[17467]: time=2025-07-15T16:35:50.080-04:00 level=INFO source=runner.go:925 msg="starting ollama engine" Jul 15 16:35:50 amobile ollama[17467]: time=2025-07-15T16:35:50.081-04:00 level=INFO source=runner.go:983 msg="Server listening on 127.0.0.1:36385" Jul 15 16:35:50 amobile ollama[17467]: time=2025-07-15T16:35:50.178-04:00 level=INFO source=ggml.go:92 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=883 num_key_values=36 Jul 15 16:35:50 amobile ollama[17467]: load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-haswell.so Jul 15 16:35:50 amobile ollama[17467]: time=2025-07-15T16:35:50.186-04:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc) Jul 15 16:35:50 amobile ollama[17467]: time=2025-07-15T16:35:50.193-04:00 level=INFO source=ggml.go:359 msg="offloading 0 repeating layers to GPU" Jul 15 16:35:50 amobile ollama[17467]: time=2025-07-15T16:35:50.193-04:00 level=INFO source=ggml.go:363 msg="offloading output layer to CPU" Jul 15 16:35:50 amobile ollama[17467]: time=2025-07-15T16:35:50.193-04:00 level=INFO source=ggml.go:375 msg="offloaded 0/35 layers to GPU" Jul 15 16:35:50 amobile ollama[17467]: time=2025-07-15T16:35:50.193-04:00 level=INFO source=ggml.go:377 msg="model weights" buffer=CPU size="3.6 GiB" Jul 15 16:35:50 amobile ollama[17467]: time=2025-07-15T16:35:50.314-04:00 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server loading model" Jul 15 16:35:50 amobile ollama[17467]: time=2025-07-15T16:35:50.481-04:00 level=INFO source=ggml.go:666 msg="compute graph" backend=CPU buffer_type=CPU size="1.1 GiB" Jul 15 16:35:50 amobile ollama[17467]: time=2025-07-15T16:35:50.656-04:00 level=INFO source=ggml.go:666 msg="compute graph" backend=CPU buffer_type=CPU size="1.1 GiB" Jul 15 16:35:52 amobile ollama[17467]: time=2025-07-15T16:35:52.094-04:00 level=INFO source=server.go:637 msg="llama runner started in 2.03 seconds" Jul 15 16:36:34 amobile ollama[17467]: [GIN] 2025/07/15 - 16:36:34 | 200 | 44.698539465s | 127.0.0.1 | POST "/api/chat" Jul 15 16:37:39 amobile ollama[17467]: [GIN] 2025/07/15 - 16:37:39 | 200 | 15.367562581s | 127.0.0.1 | POST "/api/chat" Jul 15 16:39:19 amobile ollama[17467]: [GIN] 2025/07/15 - 16:39:19 | 200 | 58.985695818s | 127.0.0.1 | POST "/api/chat" Jul 15 16:39:26 amobile ollama[17467]: [GIN] 2025/07/15 - 16:39:26 | 200 | 1m4s | 127.0.0.1 | POST "/api/chat" Jul 15 16:40:27 amobile ollama[17467]: [GIN] 2025/07/15 - 16:40:27 | 200 | 32.498111154s | 127.0.0.1 | POST "/api/chat" Jul 15 16:41:19 amobile ollama[17467]: [GIN] 2025/07/15 - 16:41:19 | 200 | 25.790679315s | 127.0.0.1 | POST "/api/chat" Jul 15 16:42:42 amobile ollama[17467]: [GIN] 2025/07/15 - 16:42:42 | 200 | 29.962983809s | 127.0.0.1 | POST "/api/chat" Jul 15 16:44:26 amobile ollama[17467]: [GIN] 2025/07/15 - 16:44:26 | 200 | 20.049622997s | 127.0.0.1 | POST "/api/chat" Jul 15 16:45:03 amobile ollama[17467]: [GIN] 2025/07/15 - 16:45:03 | 200 | 11.432715593s | 127.0.0.1 | POST "/api/chat" Jul 15 16:45:30 amobile ollama[17467]: [GIN] 2025/07/15 - 16:45:30 | 200 | 15.81967878s | 127.0.0.1 | POST "/api/chat" Jul 15 16:47:41 amobile ollama[17467]: [GIN] 2025/07/15 - 16:47:41 | 200 | 1m15s | 127.0.0.1 | POST "/api/chat" Jul 15 16:49:29 amobile ollama[17467]: [GIN] 2025/07/15 - 16:49:29 | 200 | 3m9s | 127.0.0.1 | POST "/api/chat" Jul 15 16:51:05 amobile ollama[17467]: [GIN] 2025/07/15 - 16:51:05 | 200 | 14.459985878s | 127.0.0.1 | POST "/api/chat" Jul 15 16:52:31 amobile ollama[17467]: panic: failed to decode batch: could not find a kv cache slot (cache: 2560 batch: 512) Jul 15 16:52:31 amobile ollama[17467]: goroutine 14 [running]: Jul 15 16:52:31 amobile ollama[17467]: github.com/ollama/ollama/runner/ollamarunner.(*Server).run(0xc00065e900, {0x555e3641c700, 0xc0001308c0}) Jul 15 16:52:31 amobile ollama[17467]: github.com/ollama/ollama/runner/ollamarunner/runner.go:364 +0x65 Jul 15 16:52:31 amobile ollama[17467]: created by github.com/ollama/ollama/runner/ollamarunner.Execute in goroutine 1 Jul 15 16:52:31 amobile ollama[17467]: github.com/ollama/ollama/runner/ollamarunner/runner.go:960 +0xa74 Jul 15 16:52:31 amobile ollama[17467]: time=2025-07-15T16:52:31.866-04:00 level=ERROR source=server.go:807 msg="post predict" error="Post \"http://127.0.0.1:36385/completion\": EOF" Jul 15 16:52:31 amobile ollama[17467]: [GIN] 2025/07/15 - 16:52:31 | 200 | 13.142628839s | 127.0.0.1 | POST "/api/chat" Jul 15 16:53:23 amobile ollama[17467]: time=2025-07-15T16:53:23.611-04:00 level=INFO source=server.go:135 msg="system memory" total="31.2 GiB" free="27.2 GiB" free_swap="30.1 GiB" Jul 15 16:53:23 amobile ollama[17467]: time=2025-07-15T16:53:23.613-04:00 level=INFO source=server.go:175 msg=offload library=cpu layers.requested=-1 layers.model=35 layers.offload=0 layers.split="" memory.available="[27.2 GiB]" memory.gpu_overhead="0 B" memory.required.full="5.1 GiB" memory.required.partial="0 B" memory.required.kv="450.0 MiB" memory.required.allocations="[5.1 GiB]" memory.weights.total="2.3 GiB" memory.weights.repeating="1.8 GiB" memory.weights.nonrepeating="525.0 MiB" memory.graph.full="517.0 MiB" memory.graph.partial="1.0 GiB" projector.weights="795.9 MiB" projector.graph="1.0 GiB" Jul 15 16:53:23 amobile ollama[17467]: time=2025-07-15T16:53:23.724-04:00 level=INFO source=server.go:438 msg="starting llama server" cmd="/usr/local/bin/ollama runner --ollama-engine --model /usr/share/ollama/.ollama/models/blobs/sha256-aeda25e63ebd698fab8638ffb778e68bed908b960d39d0becc650fa981609d25 --ctx-size 8192 --batch-size 512 --threads 28 --no-mmap --parallel 2 --port 39263" Jul 15 16:53:23 amobile ollama[17467]: time=2025-07-15T16:53:23.724-04:00 level=INFO source=sched.go:483 msg="loaded runners" count=1 Jul 15 16:53:23 amobile ollama[17467]: time=2025-07-15T16:53:23.724-04:00 level=INFO source=server.go:598 msg="waiting for llama runner to start responding" Jul 15 16:53:23 amobile ollama[17467]: time=2025-07-15T16:53:23.724-04:00 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server not responding" Jul 15 16:53:23 amobile ollama[17467]: time=2025-07-15T16:53:23.739-04:00 level=INFO source=runner.go:925 msg="starting ollama engine" Jul 15 16:53:23 amobile ollama[17467]: time=2025-07-15T16:53:23.740-04:00 level=INFO source=runner.go:983 msg="Server listening on 127.0.0.1:39263" Jul 15 16:53:23 amobile ollama[17467]: time=2025-07-15T16:53:23.836-04:00 level=INFO source=ggml.go:92 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=883 num_key_values=36 Jul 15 16:53:23 amobile ollama[17467]: load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-haswell.so Jul 15 16:53:23 amobile ollama[17467]: time=2025-07-15T16:53:23.842-04:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc) Jul 15 16:53:23 amobile ollama[17467]: time=2025-07-15T16:53:23.848-04:00 level=INFO source=ggml.go:359 msg="offloading 0 repeating layers to GPU" Jul 15 16:53:23 amobile ollama[17467]: time=2025-07-15T16:53:23.848-04:00 level=INFO source=ggml.go:363 msg="offloading output layer to CPU" Jul 15 16:53:23 amobile ollama[17467]: time=2025-07-15T16:53:23.848-04:00 level=INFO source=ggml.go:375 msg="offloaded 0/35 layers to GPU" Jul 15 16:53:23 amobile ollama[17467]: time=2025-07-15T16:53:23.848-04:00 level=INFO source=ggml.go:377 msg="model weights" buffer=CPU size="3.6 GiB" Jul 15 16:53:23 amobile ollama[17467]: time=2025-07-15T16:53:23.976-04:00 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server loading model" Jul 15 16:53:24 amobile ollama[17467]: time=2025-07-15T16:53:24.139-04:00 level=INFO source=ggml.go:666 msg="compute graph" backend=CPU buffer_type=CPU size="1.1 GiB" Jul 15 16:53:24 amobile ollama[17467]: time=2025-07-15T16:53:24.409-04:00 level=INFO source=ggml.go:666 msg="compute graph" backend=CPU buffer_type=CPU size="1.1 GiB" Jul 15 16:53:26 amobile ollama[17467]: time=2025-07-15T16:53:26.009-04:00 level=INFO source=server.go:637 msg="llama runner started in 2.28 seconds" Jul 15 16:54:38 amobile ollama[17467]: [GIN] 2025/07/15 - 16:54:38 | 200 | 1m15s | 127.0.0.1 | POST "/api/chat" Jul 15 16:56:20 amobile ollama[17467]: [GIN] 2025/07/15 - 16:56:20 | 200 | 1m9s | 127.0.0.1 | POST "/api/chat" Jul 15 16:59:05 amobile ollama[17467]: [GIN] 2025/07/15 - 16:59:05 | 200 | 1m34s | 127.0.0.1 | POST "/api/chat" Jul 15 17:00:40 amobile ollama[17467]: [GIN] 2025/07/15 - 17:00:40 | 200 | 1m33s | 127.0.0.1 | POST "/api/chat" Jul 15 17:01:10 amobile ollama[17467]: [GIN] 2025/07/15 - 17:01:10 | 200 | 140.204µs | 127.0.0.1 | GET "/api/version" Jul 15 17:02:26 amobile ollama[17467]: [GIN] 2025/07/15 - 17:02:26 | 200 | 1m37s | 127.0.0.1 | POST "/api/chat" Jul 15 17:35:05 amobile ollama[17467]: [GIN] 2025/07/15 - 17:35:05 | 200 | 52.773µs | 127.0.0.1 | HEAD "/" Jul 15 17:35:05 amobile ollama[17467]: [GIN] 2025/07/15 - 17:35:05 | 200 | 204.188719ms | 127.0.0.1 | POST "/api/show" Jul 15 17:35:05 amobile ollama[17467]: time=2025-07-15T17:35:05.674-04:00 level=INFO source=server.go:135 msg="system memory" total="31.2 GiB" free="27.2 GiB" free_swap="30.1 GiB" Jul 15 17:35:05 amobile ollama[17467]: time=2025-07-15T17:35:05.676-04:00 level=INFO source=server.go:175 msg=offload library=cpu layers.requested=-1 layers.model=35 layers.offload=0 layers.split="" memory.available="[27.2 GiB]" memory.gpu_overhead="0 B" memory.required.full="5.1 GiB" memory.required.partial="0 B" memory.required.kv="450.0 MiB" memory.required.allocations="[5.1 GiB]" memory.weights.total="2.3 GiB" memory.weights.repeating="1.8 GiB" memory.weights.nonrepeating="525.0 MiB" memory.graph.full="517.0 MiB" memory.graph.partial="1.0 GiB" projector.weights="795.9 MiB" projector.graph="1.0 GiB" Jul 15 17:35:05 amobile ollama[17467]: time=2025-07-15T17:35:05.774-04:00 level=INFO source=server.go:438 msg="starting llama server" cmd="/usr/local/bin/ollama runner --ollama-engine --model /usr/share/ollama/.ollama/models/blobs/sha256-aeda25e63ebd698fab8638ffb778e68bed908b960d39d0becc650fa981609d25 --ctx-size 8192 --batch-size 512 --threads 28 --no-mmap --parallel 2 --port 36275" Jul 15 17:35:05 amobile ollama[17467]: time=2025-07-15T17:35:05.774-04:00 level=INFO source=sched.go:483 msg="loaded runners" count=1 Jul 15 17:35:05 amobile ollama[17467]: time=2025-07-15T17:35:05.774-04:00 level=INFO source=server.go:598 msg="waiting for llama runner to start responding" Jul 15 17:35:05 amobile ollama[17467]: time=2025-07-15T17:35:05.774-04:00 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server not responding" Jul 15 17:35:05 amobile ollama[17467]: time=2025-07-15T17:35:05.788-04:00 level=INFO source=runner.go:925 msg="starting ollama engine" Jul 15 17:35:05 amobile ollama[17467]: time=2025-07-15T17:35:05.790-04:00 level=INFO source=runner.go:983 msg="Server listening on 127.0.0.1:36275" Jul 15 17:35:05 amobile ollama[17467]: time=2025-07-15T17:35:05.880-04:00 level=INFO source=ggml.go:92 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=883 num_key_values=36 Jul 15 17:35:05 amobile ollama[17467]: load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-haswell.so Jul 15 17:35:05 amobile ollama[17467]: time=2025-07-15T17:35:05.888-04:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc) Jul 15 17:35:05 amobile ollama[17467]: time=2025-07-15T17:35:05.893-04:00 level=INFO source=ggml.go:359 msg="offloading 0 repeating layers to GPU" Jul 15 17:35:05 amobile ollama[17467]: time=2025-07-15T17:35:05.893-04:00 level=INFO source=ggml.go:363 msg="offloading output layer to CPU" Jul 15 17:35:05 amobile ollama[17467]: time=2025-07-15T17:35:05.893-04:00 level=INFO source=ggml.go:375 msg="offloaded 0/35 layers to GPU" Jul 15 17:35:05 amobile ollama[17467]: time=2025-07-15T17:35:05.893-04:00 level=INFO source=ggml.go:377 msg="model weights" buffer=CPU size="3.6 GiB" Jul 15 17:35:06 amobile ollama[17467]: time=2025-07-15T17:35:06.025-04:00 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server loading model" Jul 15 17:35:06 amobile ollama[17467]: time=2025-07-15T17:35:06.174-04:00 level=INFO source=ggml.go:666 msg="compute graph" backend=CPU buffer_type=CPU size="1.1 GiB" Jul 15 17:35:06 amobile ollama[17467]: time=2025-07-15T17:35:06.447-04:00 level=INFO source=ggml.go:666 msg="compute graph" backend=CPU buffer_type=CPU size="1.1 GiB" Jul 15 17:35:08 amobile ollama[17467]: time=2025-07-15T17:35:08.049-04:00 level=INFO source=server.go:637 msg="llama runner started in 2.27 seconds" Jul 15 17:35:08 amobile ollama[17467]: [GIN] 2025/07/15 - 17:35:08 | 200 | 2.657473117s | 127.0.0.1 | POST "/api/generate" Jul 15 17:39:08 amobile ollama[17467]: [GIN] 2025/07/15 - 17:39:08 | 200 | 2m2s | 127.0.0.1 | POST "/api/chat" Jul 15 17:41:57 amobile ollama[17467]: [GIN] 2025/07/15 - 17:41:57 | 200 | 42.836773699s | 127.0.0.1 | POST "/api/chat" Jul 15 17:43:41 amobile ollama[17467]: [GIN] 2025/07/15 - 17:43:41 | 200 | 31.474378088s | 127.0.0.1 | POST "/api/chat" ```

GiteaMirror commented

2026-04-29 05:10:15 -05:00

@rick-github commented on GitHub (Jul 15, 2025):

Jul 15 16:21:32 amobile ollama[17467]: panic: failed to decode batch: could not find a kv cache slot (cache: 2560 batch: 512)
Jul 15 16:21:32 amobile ollama[17467]: goroutine 8 [running]:
Jul 15 16:21:32 amobile ollama[17467]: github.com/ollama/ollama/runner/ollamarunner.(*Server).run(0xc0002b6900, {0x55867a095700, 0xc0000008c0})
Jul 15 16:21:32 amobile ollama[17467]:         github.com/ollama/ollama/runner/ollamarunner/runner.go:364 +0x65
Jul 15 16:21:32 amobile ollama[17467]: created by github.com/ollama/ollama/runner/ollamarunner.Execute in goroutine 1
Jul 15 16:21:32 amobile ollama[17467]:         github.com/ollama/ollama/runner/ollamarunner/runner.go:960 +0xa74
Jul 15 16:21:32 amobile ollama[17467]: time=2025-07-15T16:21:32.337-04:00 level=ERROR source=server.go:807 msg="post predict" error="Post \"http://127.0.0.1:40683/completion\": EOF"

Could be #10127. A workaround is to set OLLAMA_NUM_PARALLEL=1.

@rick-github commented on GitHub (Jul 15, 2025): ``` Jul 15 16:21:32 amobile ollama[17467]: panic: failed to decode batch: could not find a kv cache slot (cache: 2560 batch: 512) Jul 15 16:21:32 amobile ollama[17467]: goroutine 8 [running]: Jul 15 16:21:32 amobile ollama[17467]: github.com/ollama/ollama/runner/ollamarunner.(*Server).run(0xc0002b6900, {0x55867a095700, 0xc0000008c0}) Jul 15 16:21:32 amobile ollama[17467]: github.com/ollama/ollama/runner/ollamarunner/runner.go:364 +0x65 Jul 15 16:21:32 amobile ollama[17467]: created by github.com/ollama/ollama/runner/ollamarunner.Execute in goroutine 1 Jul 15 16:21:32 amobile ollama[17467]: github.com/ollama/ollama/runner/ollamarunner/runner.go:960 +0xa74 Jul 15 16:21:32 amobile ollama[17467]: time=2025-07-15T16:21:32.337-04:00 level=ERROR source=server.go:807 msg="post predict" error="Post \"http://127.0.0.1:40683/completion\": EOF" ``` Could be #10127. A workaround is to set `OLLAMA_NUM_PARALLEL=1`.

Sign in to join this conversation.

Branches Tags

main

dhiltgen/ci

hoyyeva/editor-config-repair

parth-launch-codex-app

hoyyeva/fix-codex-model-metadata-warning

hoyyeva/qwen

hoyyeva/launch-backup-ux

parth/hide-claude-desktop-till-release

hoyyeva/opencode-image-modality

parth-mlx-decode-checkpoints

parth-add-claude-code-autoinstall

release_v0.22.0

pdevine/manifest-list

codex/fix-codex-model-metadata-warning

pdevine/addressable-manifest

brucemacd/launch-fetch-reccomended

jmorganca/llama-compat

launch-copilot-cli

hoyyeva/opencode-thinking

release_v0.20.7

parth-auto-save-backup

parth-test

jmorganca/gemma4-audio-replacements

fix-manifest-digest-on-pull

hoyyeva/vscode-improve

brucemacd/install-server-wait

brucemacd/download-before-remove

parth/update-claude-docs

parth-anthropic-reference-images-path

brucemac/start-ap-install

pdevine/mlx-update

pdevine/qwen35_vision

drifkin/api-show-fallback

mintlify/image-generation-1773352582

hoyyeva/server-context-length-local-config

jmorganca/faster-reptition-penalties

jmorganca/convert-nemotron

parth-pi-thinking

pdevine/sampling-penalties

jmorganca/fix-create-quantization-memory

dongchen/resumable_transfer_fix

pdevine/sampling-cache-error

jessegross/mlx-usage

hoyyeva/openclaw-config

hoyyeva/app-html

pdevine/qwen3next

brucemacd/sign-sh-install

brucemacd/tui-update

brucemacd/usage-api

jmorganca/launch-empty

fix-app-dist-embed

mxyng/mlx-compile

mxyng/mlx-quant

mxyng/mlx-glm4.7

mxyng/mlx

brucemacd/simplify-model-picker

jmorganca/qwen3-concurrent

fix-glm-4.7-flash-mla-config

drifkin/qwen3-coder-opening-tag

brucemacd/usage-cli

fix-cuda12-fattn-shmem

ollama-imagegen-docs

parth/fix-multiline-inputs

brucemacd/config-docs

mxyng/model-files

mxyng/simple-execute

fix-imagegen-ollama-models

mxyng/async-upload

jmorganca/lazy-no-dtype-changes

imagegen-auto-detect-create

parth/decrease-concurrent-download-hf

fix-mlx-quantize-init

jmorganca/x-cleanup

usage

imagegen-readme

jmorganca/glm-image

mlx-gpu-cd

jmorganca/imagegen-modelfile

parth/agent-skills

parth/agent-allowlist

parth/signed-in-offline

parth/agents

parth/fix-context-chopping

improve-cloud-flow

parth/add-models-websearch

parth/prompt-renderer-mcp

jmorganca/native-settings

jmorganca/download-stream-hash

jmorganca/client2-rebased

brucemacd/oai-chat-req-multipart

jessegross/multi_chunk_reserve

grace/additional-omit-empty

grace/mistral-3-large

mxyng/tokenizer2

mxyng/tokenizer

jessegross/flash

hoyyeva/windows-nacked-app

mxyng/cleanup-attention

grace/deepseek-parser

hoyyeva/remember-unsent-prompt

parth/add-lfs-pointer-error-conversion

parth/olmo2-test2

hoyyeva/ollama-launchagent-plist

nicole/olmo-model

parth/olmo-test

mxyng/remove-embedded

parth/render-template

jmorganca/intellect-3

parth/remove-prealloc-linter

jmorganca/cmd-eval

nicole/nomic-embed-text-fix

mxyng/lint-2

hoyyeva/add-gemini-3-pro-preview

hoyyeva/load-model-list

mxyng/expand-path

mxyng/environ-2

hoyyeva/deeplink-json-encoding

parth/improve-tool-calling-tests

hoyyeva/conversation

hoyyeva/assistant-edit-response

hoyyeva/thinking

origin/brucemacd/invalid-char-i-err

parth/improve-tool-calling

jmorganca/required-omitempty

grace/qwen3-vl-tests

mxyng/iter-client

parth/docs-readme

nicole/embed-test

pdevine/integration-benchstat

parth/remove-generate-cmd

parth/add-toolcall-id

mxyng/server-tests

jmorganca/glm-4.6

jmorganca/gin-h-compat

drifkin/stable-tool-args

pdevine/qwen3-more-thinking

parth/add-websearch-client

nicole/websearch_local

jmorganca/qwen3-coder-updates

grace/deepseek-v3-migration-tests

mxyng/fix-create

jmorganca/cloud-errors

pdevine/parser-tidy

revert-12233-parth/simplify-entrypoints-runner

parth/enable-so-gpt-oss

brucemacd/qwen3vl

jmorganca/readme-simplify

parth/gpt-oss-structured-outputs

revert-12039-jmorganca/tools-braces

mxyng/embeddings

mxyng/gguf

mxyng/benchmark

mxyng/types-null

parth/move-parsing

mxyng/gemma2

jmorganca/docs

mxyng/16-bit

mxyng/create-stdin

pdevine/authorizedkeys

mxyng/quant

parth/opt-in-error-context-window

brucemacd/cache-models

brucemacd/runner-completion

jmorganca/llama-update-6

brucemacd/benchmark-list

brucemacd/partial-read-caps

parth/deepseek-r1-tools

mxyng/omit-array

parth/tool-prefix-temp

brucemacd/runner-test

jmorganca/qwen25vl

brucemacd/model-forward-test-ext

parth/python-function-parsing

jmorganca/cuda-compression-none

drifkin/num-parallel

drifkin/chat-truncation-fix

jmorganca/sync

parth/python-tools-calling

drifkin/array-head-count

brucemacd/create-no-loop

parth/server-enable-content-stream-with-tools

qwen25omni

mxyng/v3

brucemacd/ropeconfig

jmorganca/silence-tokenizer

parth/sample-so-test

parth/sampling-structured-outputs

brucemacd/doc-go-engine

parth/constrained-sampling-json

jmorganca/mistral-wip

brucemacd/mistral-small-convert

parth/sample-unmarshal-json-for-params

brucemacd/jomorganca/mistral

pdevine/bfloat16

jmorganca/mistral

brucemacd/mistral

pdevine/logging

parth/sample-correctness-fix

parth/sample-fix-sorting

jmorgan/sample-fix-sorting-extras

jmorganca/temp-0-images

brucemacd/parallel-embed-models

brucemacd/shim-grammar

jmorganca/fix-gguf-error

bmizerany/nameswork

jmorganca/faster-releases

bmizerany/validatenames

brucemacd/err-no-vocab

brucemacd/rope-config

brucemacd/err-hint

brucemacd/qwen2_5

brucemacd/logprobs

brucemacd/new_runner_graph_bench

progress-flicker

brucemacd/forward-test

brucemacd/go_qwen2

pdevine/gemma2

jmorganca/add-missing-symlink-eval

mxyng/next-debug

parth/set-context-size-openai

brucemacd/next-bpe-bench

brucemacd/next-bpe-test

brucemacd/new_runner_e2e

brucemacd/new_runner_qwen2

pdevine/convert-cohere2

brucemacd/convert-cli

parth/log-probs

mxyng/next-mlx

mxyng/cmd-history

parth/templating

parth/tokenize-detokenize

brucemacd/check-key-register

bmizerany/grammar

jmorganca/vendor-081b29bd

mxyng/func-checks

jmorganca/fix-null-format

parth/fix-default-to-warn-json

jmorganca/qwen2vl

jmorganca/no-concat

parth/cmd-cleanup-SO

brucemacd/check-key-register-structured-err

parth/openai-stream-usage

parth/fix-referencing-so

stream-tools-stop

jmorganca/degin-1

brucemacd/install-path-clean

brucemacd/push-name-validation

brucemacd/browser-key-register

jmorganca/openai-fix-first-message

jmorganca/fix-proxy

jessegross/sample

parth/disallow-streaming-tools

dhiltgen/remove_submodule

jmorganca/ga

jmorganca/mllama

pdevine/newlines

pdevine/geems-2b

jmorganca/llama-bump

mxyng/modelname-7

mxyng/gin-slog

mxyng/modelname-6

jyan/convert-prog

jyan/quant5

paligemma-support

pdevine/import-docs

jmorganca/openai-context

jyan/paligemma

jyan/p2

jyan/palitest

bmizerany/embedspeedup

jmorganca/llama-vit

brucemacd/allow-ollama

royh/ep-methods

royh/whisper

mxyng/api-models

mxyng/fix-memory

jyan/q4_4/8

jyan/ollama-v

royh/stream-tools

roy-embed-parallel

bmizerany/hrm

revert-5963-revert-5924-mxyng/llama3.1-rope

royh/embed-viz

jyan/local2

jyan/auth

jyan/local

jyan/parse-temp

jmorganca/template-mistral

jyan/reord-g

royh-openai-suffixdocs

royh-imgembed

royh-embed-parallel

jyan/quant4

royh-precision

jyan/progress

pdevine/fix-template

jyan/quant3

pdevine/ggla

mxyng/update-registry-domain

jmorganca/ggml-static

mxyng/create-context

jyan/v0.146

mxyng/layers-from-files

build_dist

bmizerany/noseek

royh-ls

royh-name

timeout

mxyng/server-timestamp

bmizerany/nosillyggufslurps

royh-params

jmorganca/llama-cpp-7c26775

royh-openai-delete

royh-show-rigid

jmorganca/enable-fa

jmorganca/no-error-template

jyan/format

royh-testdelete

bmizerany/fastverify

language_support

pdevine/ps-glitches

brucemacd/tokenize

bruce/iq-quants

bmizerany/filepathwithcoloninhost

mxyng/split-bin

bmizerany/client-registry

jmorganca/if-none-match

native

jmorganca/native

jmorganca/batch-embeddings

jmorganca/initcmake

jmorganca/mm

pdevine/showggmlinfo

modenameenforcealphanum

bmizerany/modenameenforcealphanum

jmorganca/done-reason

jmorganca/llama-cpp-8960fe8

ollama.com

bmizerany/filepathnobuild

bmizerany/types/model/defaultfix

rmdisplaylong

nogogen

bmizerany/x

modelfile-readme

bmizerany/replacecolon

jmorganca/limit

jmorganca/execstack

jmorganca/replace-assets

mxyng/tune-concurrency

jmorganca/testing

whitespace-detection

jmorganca/options

upgrade-all

scratch

cuda-search

mattw/airenamer

mattw/allmodelsonhuggingface

mattw/quantcontext

mattw/whatneedstorun

brucemacd/llama-mem-calc

mattw/faq-context

mattw/communitylinks

mattw/noprune

mattw/python-functioncalling

rename

mxyng/install

pulse

remove-first

editor

mattw/selfqueryingretrieval

cgo

mattw/howtoquant

api

matt/streamingapi

format-config

mxyng/extra-args

shell

update-nous-hermes

cp-model

upload-progress

fix-unknown-model

fix-model-names

delete-fix

insecure-registry

ls

deletemodels

progressbar

readme-updates

license-layers

skip-list

list-models

modelpath

matt/examplemodelfiles

distribution

go-opts

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/ollama#54061