[GH-ISSUE #10688] Memory leak during inference using Gemma3 with structured output #69085

Closed
opened 2026-05-04 17:07:34 -05:00 by GiteaMirror · 9 comments
Owner

Originally created by @leokeba on GitHub (May 13, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/10688

Originally assigned to: @ParthSareen on GitHub.

What is the issue?

I am getting what looks like a big memory leak when using ollama on a debian host with structured output with gemma3-12b. It's running on a 3090Ti with default settings.

Memory (system RAM) usage grows very fast (about 30mb/sec) continuously until it saturates the host and hangs. I then need to restart the ollama server and the same thing happens as soon as I start inferencing again. If I stop the inference script the memory usage does not go down, I always have to kill the process to clear the RAM.

I'm attaching the log output from the server, is anything else needed to look into this ?

Relevant log output

leo@debian-nvidia:~$ ollama serve
2025/05/13 16:48:00 routes.go:1233: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/home/leo/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
time=2025-05-13T16:48:00.561+02:00 level=INFO source=images.go:463 msg="total blobs: 33"
time=2025-05-13T16:48:00.562+02:00 level=INFO source=images.go:470 msg="total unused blobs removed: 0"
time=2025-05-13T16:48:00.562+02:00 level=INFO source=routes.go:1300 msg="Listening on 127.0.0.1:11434 (version 0.6.8)"
time=2025-05-13T16:48:00.562+02:00 level=INFO source=gpu.go:217 msg="looking for compatible GPUs"
time=2025-05-13T16:48:00.784+02:00 level=INFO source=types.go:130 msg="inference compute" id=GPU-d744590e-2a3b-8e2e-f4bc-988c67d6c902 library=cuda variant=v12 compute=8.6 driver=12.8 name="NVIDIA GeForce RTX 3090 Ti" total="23.6 GiB" available="23.3 GiB"
[GIN] 2025/05/13 - 16:48:37 | 200 |     980.376µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:48:37.849+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-05-13T16:48:37.996+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-05-13T16:48:38.026+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-05-13T16:48:38.028+02:00 level=INFO source=sched.go:754 msg="new model will fit in available VRAM in single GPU, loading" model=/home/leo/.ollama/models/blobs/sha256-e8ad13eff07a78d89926e9e8b882317d082ef5bf9768ad7b50fcdbbcd63748de gpu=GPU-d744590e-2a3b-8e2e-f4bc-988c67d6c902 parallel=2 available=24974786560 required="11.0 GiB"
time=2025-05-13T16:48:38.137+02:00 level=INFO source=server.go:106 msg="system memory" total="31.2 GiB" free="29.3 GiB" free_swap="542.1 MiB"
time=2025-05-13T16:48:38.139+02:00 level=INFO source=server.go:139 msg=offload library=cuda layers.requested=-1 layers.model=49 layers.offload=49 layers.split="" memory.available="[23.3 GiB]" memory.gpu_overhead="0 B" memory.required.full="11.0 GiB" memory.required.partial="11.0 GiB" memory.required.kv="1.3 GiB" memory.required.allocations="[11.0 GiB]" memory.weights.total="6.8 GiB" memory.weights.repeating="6.0 GiB" memory.weights.nonrepeating="787.5 MiB" memory.graph.full="519.5 MiB" memory.graph.partial="1.3 GiB" projector.weights="795.9 MiB" projector.graph="1.0 GiB"
time=2025-05-13T16:48:38.192+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-05-13T16:48:38.193+02:00 level=WARN source=ggml.go:152 msg="key not found" key=tokenizer.ggml.add_eot_token default=false
time=2025-05-13T16:48:38.196+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.attention.layer_norm_rms_epsilon default=9.999999974752427e-07
time=2025-05-13T16:48:38.196+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.local.freq_base default=10000
time=2025-05-13T16:48:38.196+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06
time=2025-05-13T16:48:38.196+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.freq_scale default=1
time=2025-05-13T16:48:38.196+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.mm_tokens_per_image default=256
time=2025-05-13T16:48:38.197+02:00 level=INFO source=server.go:410 msg="starting llama server" cmd="/usr/local/bin/ollama runner --ollama-engine --model /home/leo/.ollama/models/blobs/sha256-e8ad13eff07a78d89926e9e8b882317d082ef5bf9768ad7b50fcdbbcd63748de --ctx-size 8192 --batch-size 512 --n-gpu-layers 49 --threads 10 --parallel 2 --port 33857"
time=2025-05-13T16:48:38.197+02:00 level=INFO source=sched.go:452 msg="loaded runners" count=1
time=2025-05-13T16:48:38.197+02:00 level=INFO source=server.go:589 msg="waiting for llama runner to start responding"
time=2025-05-13T16:48:38.197+02:00 level=INFO source=server.go:623 msg="waiting for server to become available" status="llm server not responding"
time=2025-05-13T16:48:38.205+02:00 level=INFO source=runner.go:851 msg="starting ollama engine"
time=2025-05-13T16:48:38.205+02:00 level=INFO source=runner.go:914 msg="Server listening on 127.0.0.1:33857"
time=2025-05-13T16:48:38.255+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-05-13T16:48:38.256+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.name default=""
time=2025-05-13T16:48:38.256+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.description default=""
time=2025-05-13T16:48:38.256+02:00 level=INFO source=ggml.go:72 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=1065 num_key_values=37
load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-haswell.so
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 3090 Ti, compute capability 8.6, VMM: yes
load_backend: loaded CUDA backend from /usr/local/lib/ollama/cuda_v12/libggml-cuda.so
time=2025-05-13T16:48:38.313+02:00 level=INFO source=ggml.go:103 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=500,600,610,700,750,800,860,870,890,900,1200 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc)
time=2025-05-13T16:48:38.385+02:00 level=INFO source=ggml.go:298 msg="model weights" buffer=CUDA0 size="7.6 GiB"
time=2025-05-13T16:48:38.385+02:00 level=INFO source=ggml.go:298 msg="model weights" buffer=CPU size="787.5 MiB"
time=2025-05-13T16:48:38.449+02:00 level=INFO source=server.go:623 msg="waiting for server to become available" status="llm server loading model"
time=2025-05-13T16:48:43.663+02:00 level=WARN source=ggml.go:152 msg="key not found" key=tokenizer.ggml.add_eot_token default=false
time=2025-05-13T16:48:43.667+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.attention.layer_norm_rms_epsilon default=9.999999974752427e-07
time=2025-05-13T16:48:43.667+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.local.freq_base default=10000
time=2025-05-13T16:48:43.667+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06
time=2025-05-13T16:48:43.667+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.freq_scale default=1
time=2025-05-13T16:48:43.667+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.mm_tokens_per_image default=256
time=2025-05-13T16:48:43.694+02:00 level=INFO source=ggml.go:553 msg="compute graph" backend=CUDA0 buffer_type=CUDA0 size="308.0 MiB"
time=2025-05-13T16:48:43.694+02:00 level=INFO source=ggml.go:553 msg="compute graph" backend=CPU buffer_type=CPU size="7.5 MiB"
time=2025-05-13T16:48:43.713+02:00 level=INFO source=server.go:628 msg="llama runner started in 5.52 seconds"
[GIN] 2025/05/13 - 16:48:45 | 200 |  7.544991459s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:48:45 | 200 |     858.796µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:48:45.387+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:48:46 | 200 |  990.816217ms |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:48:46 | 200 |     821.351µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:48:46.381+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:48:47 | 200 |  768.481138ms |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:48:47 | 200 |       828.9µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:48:47.153+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:48:47 | 200 |  762.318591ms |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:48:47 | 200 |     813.687µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:48:47.918+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:48:48 | 200 |  763.796017ms |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:48:48 | 200 |     935.834µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:48:48.686+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:48:49 | 200 |  762.516804ms |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:48:49 | 200 |     835.289µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:48:49.451+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:48:50 | 200 |  755.520694ms |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:48:50 | 200 |     843.444µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:48:50.209+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:48:50 | 200 |  754.848081ms |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:48:50 | 200 |     861.046µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:48:50.969+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:48:51 | 200 |  745.156696ms |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:48:51 | 200 |     863.531µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:48:51.716+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:48:52 | 200 |  1.054927393s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:48:52 | 200 |     855.916µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:48:52.775+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:48:53 | 200 |  994.745329ms |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:48:53 | 200 |     854.313µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:48:53.773+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:48:54 | 200 |  1.049445941s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:48:54 | 200 |     811.638µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:48:54.824+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:48:55 | 200 |  850.098955ms |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:48:55 | 200 |     792.085µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:48:55.678+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:48:56 | 200 |  993.567647ms |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:48:56 | 200 |     817.565µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:48:56.675+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:48:57 | 200 |  984.835547ms |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:48:57 | 200 |     816.258µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:48:57.663+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:48:58 | 200 |  1.027214846s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:48:58 | 200 |     815.524µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:48:58.693+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:48:59 | 200 |  1.056577493s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:48:59 | 200 |     932.081µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:48:59.754+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:49:00 | 200 |  988.791845ms |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:49:00 | 200 |     821.784µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:49:00.745+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:49:01 | 200 |  1.002698372s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:49:01 | 200 |     832.131µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:49:01.751+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:49:02 | 200 |  1.061438887s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:49:02 | 200 |     853.777µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:49:02.817+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:49:03 | 200 |  1.073778163s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:49:03 | 200 |     818.274µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:49:03.893+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:49:04 | 200 |  978.437141ms |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:49:04 | 200 |     820.064µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:49:04.875+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:49:05 | 200 |  984.538035ms |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:49:05 | 200 |     815.883µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:49:05.863+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:49:06 | 200 |  984.138888ms |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:49:06 | 200 |     828.386µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:49:06.849+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:49:07 | 200 |  984.296063ms |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:49:07 | 200 |     802.968µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:49:07.837+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:49:08 | 200 |  1.076913058s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:49:08 | 200 |      812.68µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:49:08.917+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:49:09 | 200 |  994.295375ms |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:49:09 | 200 |     829.005µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:49:09.915+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:49:10 | 200 |  925.360822ms |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:49:10 | 200 |     837.753µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:49:10.843+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:49:11 | 200 |  1.007319818s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:49:11 | 200 |      848.73µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:49:11.854+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:49:12 | 200 |  1.000909106s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:49:12 | 200 |      844.68µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:49:12.858+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:49:13 | 200 |  1.008300583s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:49:13 | 200 |     801.465µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:49:13.869+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:49:14 | 200 |  1.006693507s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:49:14 | 200 |     831.461µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:49:14.880+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:49:15 | 200 |  950.929392ms |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:49:15 | 200 |     841.006µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:49:15.833+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:49:16 | 200 |  952.581587ms |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:49:16 | 200 |     839.864µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:49:16.789+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:49:17 | 200 |   997.29827ms |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:49:17 | 200 |     820.078µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:49:17.790+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:49:18 | 200 |  1.056098038s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:49:18 | 200 |     807.002µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:49:18.849+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:49:19 | 200 |  994.453749ms |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:49:19 | 200 |     803.421µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:49:19.846+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:49:20 | 200 |   988.04637ms |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:49:20 | 200 |     837.535µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:49:20.838+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:49:21 | 200 |  932.301373ms |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:49:21 | 200 |     839.469µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:49:21.773+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:49:22 | 200 |  1.059917962s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:49:22 | 200 |      824.76µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:49:22.836+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:49:24 | 200 |  1.269081446s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:49:24 | 200 |     820.431µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:49:24.109+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:49:24 | 200 |   730.49696ms |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:49:24 | 200 |     837.096µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:49:24.842+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:49:25 | 200 |  749.720052ms |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:49:25 | 200 |     893.069µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:49:25.596+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:49:26 | 200 |    1.0075791s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:49:26 | 200 |     826.889µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:49:26.607+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:49:27 | 200 |  985.420873ms |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:49:27 | 200 |     809.116µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:49:27.595+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:49:28 | 200 |  1.078693931s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:49:28 | 200 |     836.868µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:49:28.676+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:49:29 | 200 |  1.026126364s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:49:29 | 200 |     814.288µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:49:29.707+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:49:30 | 200 |   984.94677ms |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:49:30 | 200 |     817.913µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:49:30.694+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:49:31 | 200 |  1.032658607s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:49:31 | 200 |     794.625µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:49:31.730+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:49:32 | 200 |  1.066533978s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:49:33 | 200 |     836.575µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:49:33.168+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:49:34 | 200 |  1.447059891s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:49:34 | 200 |     841.524µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:49:34.618+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:49:35 | 200 |  1.027703899s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:49:35 | 200 |     863.036µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:49:35.648+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:49:36 | 200 |  994.915362ms |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:49:36 | 200 |     803.869µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:49:36.647+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:49:38 | 200 |   1.58195298s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:49:38 | 200 |     884.533µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:49:38.232+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:49:39 | 200 |  1.142600246s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:49:39 | 200 |      825.11µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:49:39.377+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:49:40 | 200 |  1.080568844s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:49:40 | 200 |     815.512µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:49:40.462+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:49:41 | 200 |  1.146987692s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:49:41 | 200 |     811.222µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:49:41.611+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:49:42 | 200 |  979.825221ms |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:49:42 | 200 |     852.475µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:49:42.594+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:49:43 | 200 |  982.748158ms |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:49:43 | 200 |     820.496µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:49:43.581+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:49:44 | 200 |  1.100700606s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:49:44 | 200 |     862.996µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:49:44.684+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:49:46 | 200 |  1.497456592s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:49:46 | 200 |     853.133µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:49:46.185+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:49:47 | 200 |  1.064251921s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:49:47 | 200 |     890.363µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:49:47.253+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:49:48 | 200 |  919.330615ms |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:49:48 | 200 |     887.668µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:49:48.175+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:49:49 | 200 |  937.752028ms |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:49:49 | 200 |     826.643µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:49:49.116+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:49:50 | 200 |  991.541073ms |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:49:50 | 200 |      815.88µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:49:50.112+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:49:51 | 200 |  997.471176ms |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:49:51 | 200 |     835.182µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:49:51.112+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:49:52 | 200 |  1.051647686s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:49:52 | 200 |     817.042µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:49:52.166+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:49:53 | 200 |  1.104912045s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:49:53 | 200 |     834.291µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:49:53.275+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:49:54 | 200 |  945.281302ms |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:49:54 | 200 |    1.186419ms |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:49:54.224+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:49:55 | 200 |  898.399962ms |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:49:55 | 200 |     796.173µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:49:55.125+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:49:56 | 200 |  1.108278874s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:49:56 | 200 |     877.902µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:49:56.237+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:49:57 | 200 |  922.307376ms |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:49:57 | 200 |     818.859µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:49:57.162+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:49:58 | 200 |  1.093775179s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:49:58 | 200 |     826.919µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:49:58.259+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:49:59 | 200 |  937.066277ms |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:49:59 | 200 |     817.475µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:49:59.200+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:50:00 | 200 |  902.689181ms |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:50:00 | 200 |     830.142µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:50:00.105+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:50:01 | 200 |  1.583952845s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:50:01 | 200 |     827.615µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:50:01.692+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:50:02 | 200 |  1.105850744s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:50:02 | 200 |      842.06µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:50:02.802+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:50:03 | 200 |  928.793041ms |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:50:03 | 200 |     827.222µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:50:03.733+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:50:04 | 200 |  910.071569ms |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:50:04 | 200 |    1.226551ms |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:50:04.648+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:50:05 | 200 |  936.489608ms |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:50:05 | 200 |     833.971µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:50:05.590+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:50:06 | 200 |  1.164017222s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:50:06 | 200 |     832.653µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:50:06.754+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:50:07 | 200 |  1.022418572s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:50:07 | 200 |     837.561µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:50:07.781+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:50:08 | 200 |  969.502488ms |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:50:08 | 200 |     779.343µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:50:08.753+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:50:09 | 200 |  961.229721ms |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:50:09 | 200 |     821.739µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:50:09.717+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:50:10 | 200 |  1.200781498s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:50:10 | 200 |     838.319µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:50:10.921+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:50:11 | 200 |  1.050320821s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:50:11 | 200 |      814.57µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:50:11.975+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:50:12 | 200 |  985.961573ms |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:50:12 | 200 |     904.037µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:50:12.964+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:50:14 | 200 |  1.193615905s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:50:14 | 200 |     820.404µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:50:14.160+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:50:15 | 200 |  959.403609ms |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:50:15 | 200 |     831.935µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:50:15.124+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:50:16 | 200 |  1.176947939s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:50:16 | 200 |     831.985µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:50:16.303+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:50:17 | 200 |  923.669162ms |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:50:17 | 200 |     784.075µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:50:17.230+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:50:18 | 200 |   969.61224ms |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:50:18 | 200 |     806.076µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:50:18.203+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:50:19 | 200 |  1.001123719s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:50:19 | 200 |     812.874µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:50:19.207+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:50:20 | 200 |  948.493703ms |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:50:20 | 200 |     829.258µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:50:20.159+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:50:21 | 200 |  929.241283ms |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:50:21 | 200 |     857.089µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:50:21.092+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:50:22 | 200 |  1.107269829s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:50:22 | 200 |     805.287µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:50:22.202+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:50:23 | 200 |  1.089536435s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:50:23 | 200 |     836.418µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:50:23.295+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:50:23 | 500 |   494.37497ms |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:51:36 | 200 |     819.058µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:51:36.739+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-05-13T16:51:36.894+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-05-13T16:51:36.925+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-05-13T16:51:36.929+02:00 level=INFO source=sched.go:517 msg="updated VRAM based on existing loaded models" gpu=GPU-d744590e-2a3b-8e2e-f4bc-988c67d6c902 library=cuda total="23.6 GiB" available="12.6 GiB"
time=2025-05-13T16:51:36.929+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.vision.block_count default=0
time=2025-05-13T16:51:36.930+02:00 level=INFO source=sched.go:754 msg="new model will fit in available VRAM in single GPU, loading" model=/home/leo/.ollama/models/blobs/sha256-2fc97d5d1c3217ac90931d7992762a418e09b2c5c49c030c6c1f9e8fbb221013 gpu=GPU-d744590e-2a3b-8e2e-f4bc-988c67d6c902 parallel=2 available=13522255936 required="10.0 GiB"
time=2025-05-13T16:51:37.047+02:00 level=INFO source=server.go:106 msg="system memory" total="31.2 GiB" free="25.2 GiB" free_swap="542.6 MiB"
time=2025-05-13T16:51:37.047+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.vision.block_count default=0
time=2025-05-13T16:51:37.048+02:00 level=INFO source=server.go:139 msg=offload library=cuda layers.requested=-1 layers.model=49 layers.offload=49 layers.split="" memory.available="[12.6 GiB]" memory.gpu_overhead="0 B" memory.required.full="10.0 GiB" memory.required.partial="10.0 GiB" memory.required.kv="1.3 GiB" memory.required.allocations="[10.0 GiB]" memory.weights.total="7.6 GiB" memory.weights.repeating="6.8 GiB" memory.weights.nonrepeating="787.7 MiB" memory.graph.full="519.6 MiB" memory.graph.partial="1.3 GiB"
time=2025-05-13T16:51:37.098+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-05-13T16:51:37.099+02:00 level=WARN source=ggml.go:152 msg="key not found" key=tokenizer.ggml.add_eot_token default=false
time=2025-05-13T16:51:37.101+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.vision.image_size default=0
time=2025-05-13T16:51:37.101+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.vision.patch_size default=0
time=2025-05-13T16:51:37.101+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.vision.num_channels default=0
time=2025-05-13T16:51:37.101+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.vision.block_count default=0
time=2025-05-13T16:51:37.101+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.vision.embedding_length default=0
time=2025-05-13T16:51:37.101+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.vision.attention.head_count default=0
time=2025-05-13T16:51:37.101+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.vision.image_size default=0
time=2025-05-13T16:51:37.101+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.vision.patch_size default=0
time=2025-05-13T16:51:37.101+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.vision.attention.layer_norm_epsilon default=0
time=2025-05-13T16:51:37.103+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.freq_scale default=1
time=2025-05-13T16:51:37.103+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.mm_tokens_per_image default=256
time=2025-05-13T16:51:37.103+02:00 level=INFO source=server.go:410 msg="starting llama server" cmd="/usr/local/bin/ollama runner --ollama-engine --model /home/leo/.ollama/models/blobs/sha256-2fc97d5d1c3217ac90931d7992762a418e09b2c5c49c030c6c1f9e8fbb221013 --ctx-size 8192 --batch-size 512 --n-gpu-layers 49 --threads 10 --parallel 2 --port 42857"
time=2025-05-13T16:51:37.103+02:00 level=INFO source=sched.go:452 msg="loaded runners" count=2
time=2025-05-13T16:51:37.104+02:00 level=INFO source=server.go:589 msg="waiting for llama runner to start responding"
time=2025-05-13T16:51:37.104+02:00 level=INFO source=server.go:623 msg="waiting for server to become available" status="llm server not responding"
time=2025-05-13T16:51:37.111+02:00 level=INFO source=runner.go:851 msg="starting ollama engine"
time=2025-05-13T16:51:37.112+02:00 level=INFO source=runner.go:914 msg="Server listening on 127.0.0.1:42857"
time=2025-05-13T16:51:37.165+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-05-13T16:51:37.166+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.name default=""
time=2025-05-13T16:51:37.166+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.description default=""
time=2025-05-13T16:51:37.166+02:00 level=INFO source=ggml.go:72 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=1737 num_key_values=32
load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-haswell.so
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 3090 Ti, compute capability 8.6, VMM: yes
load_backend: loaded CUDA backend from /usr/local/lib/ollama/cuda_v12/libggml-cuda.so
time=2025-05-13T16:51:37.204+02:00 level=INFO source=ggml.go:103 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=500,600,610,700,750,800,860,870,890,900,1200 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc)
time=2025-05-13T16:51:37.285+02:00 level=INFO source=ggml.go:298 msg="model weights" buffer=CUDA0 size="8.4 GiB"
time=2025-05-13T16:51:37.285+02:00 level=INFO source=ggml.go:298 msg="model weights" buffer=CPU size="787.7 MiB"
time=2025-05-13T16:51:37.355+02:00 level=INFO source=server.go:623 msg="waiting for server to become available" status="llm server loading model"
time=2025-05-13T16:51:38.979+02:00 level=WARN source=ggml.go:152 msg="key not found" key=tokenizer.ggml.add_eot_token default=false
time=2025-05-13T16:51:38.981+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.vision.image_size default=0
time=2025-05-13T16:51:38.981+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.vision.patch_size default=0
time=2025-05-13T16:51:38.981+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.vision.num_channels default=0
time=2025-05-13T16:51:38.981+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.vision.block_count default=0
time=2025-05-13T16:51:38.981+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.vision.embedding_length default=0
time=2025-05-13T16:51:38.981+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.vision.attention.head_count default=0
time=2025-05-13T16:51:38.981+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.vision.image_size default=0
time=2025-05-13T16:51:38.981+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.vision.patch_size default=0
time=2025-05-13T16:51:38.981+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.vision.attention.layer_norm_epsilon default=0
time=2025-05-13T16:51:38.983+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.freq_scale default=1
time=2025-05-13T16:51:38.983+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.mm_tokens_per_image default=256
time=2025-05-13T16:51:39.009+02:00 level=INFO source=ggml.go:553 msg="compute graph" backend=CUDA0 buffer_type=CUDA0 size="308.0 MiB"
time=2025-05-13T16:51:39.009+02:00 level=INFO source=ggml.go:553 msg="compute graph" backend=CPU buffer_type=CPU size="7.5 MiB"
time=2025-05-13T16:51:39.109+02:00 level=INFO source=server.go:628 msg="llama runner started in 2.01 seconds"
[GIN] 2025/05/13 - 16:51:40 | 200 |  3.625931541s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:51:40 | 200 |    1.304686ms |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:51:40.368+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:51:41 | 200 |  1.173288276s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:51:41 | 200 |     884.953µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:51:41.545+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:51:42 | 200 |  1.093041205s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:51:42 | 200 |     876.112µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:51:42.641+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:51:43 | 200 |  1.190148307s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:51:43 | 200 |     927.104µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:51:43.835+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:51:44 | 200 |  1.117178999s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:51:44 | 200 |     805.534µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:51:44.955+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:51:46 | 200 |  1.188636665s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:51:46 | 200 |     1.21855ms |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:51:46.148+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:51:47 | 200 |  1.126001863s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:51:47 | 200 |     842.821µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:51:47.277+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:51:48 | 200 |  1.062130448s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:51:48 | 200 |      853.01µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:51:48.342+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:51:49 | 200 |  1.192205675s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:51:49 | 200 |     828.288µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:51:49.537+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:51:50 | 200 |  1.064617413s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:51:50 | 200 |     826.038µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:51:50.605+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:51:51 | 200 |  1.181298946s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:51:51 | 200 |     912.704µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:51:51.792+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:51:52 | 200 |  1.157731531s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:51:52 | 200 |     827.216µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:51:52.951+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:51:54 | 200 |  1.144311709s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:51:54 | 200 |     852.097µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:51:54.098+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:51:55 | 200 |  1.151178661s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:51:55 | 200 |     823.791µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:51:55.253+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:51:56 | 200 |  1.098051386s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:51:56 | 200 |     830.556µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:51:56.354+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:51:57 | 200 |  1.095882359s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:51:57 | 200 |     823.501µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:51:57.455+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:51:58 | 200 |  1.192077006s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:51:58 | 200 |     830.431µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:51:58.649+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:51:59 | 200 |  1.179899598s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:51:59 | 200 |     865.571µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:51:59.831+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:52:00 | 200 |  1.190354606s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:52:00 | 200 |     823.648µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:52:01.025+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:52:02 | 200 |  1.058573208s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:52:02 | 200 |     808.631µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:52:02.087+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:52:03 | 200 |  1.189769749s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:52:03 | 200 |     843.265µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:52:03.282+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:52:04 | 200 |  1.148332443s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:52:04 | 200 |     828.562µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:52:04.432+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:52:05 | 200 |  1.116938469s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:52:05 | 200 |     840.112µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:52:05.552+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:52:06 | 200 |  1.185360654s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:52:06 | 200 |     830.644µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:52:06.741+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:52:07 | 200 |  1.187190771s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:52:07 | 200 |     837.899µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:52:07.931+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:52:09 | 200 |  1.200790074s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:52:09 | 200 |     899.137µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:52:09.135+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:52:10 | 200 |  1.135805072s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:52:10 | 200 |     1.33389ms |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:52:10.275+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:52:11 | 200 |  1.178719063s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:52:11 | 200 |     792.231µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:52:11.457+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:52:12 | 200 |  1.068732647s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:52:12 | 200 |    1.358903ms |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:52:12.530+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:52:13 | 200 |  1.084584064s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:52:13 | 200 |     815.049µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:52:13.617+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:52:14 | 200 |  1.099273485s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:52:14 | 200 |     872.335µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:52:14.720+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:52:15 | 200 |  1.291627781s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:52:15 | 200 |     853.691µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:52:16.017+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:52:17 | 200 |  1.165006914s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:52:17 | 200 |     857.917µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:52:17.183+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:52:18 | 200 |  1.185455866s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:52:18 | 200 |      851.02µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:52:18.372+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:52:19 | 200 |  1.182260809s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:52:19 | 200 |     833.657µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:52:19.558+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:52:20 | 200 |  1.100934571s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:52:20 | 200 |     838.036µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:52:20.662+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:52:21 | 200 |  1.067580922s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:52:21 | 200 |      845.81µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:52:21.734+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:52:22 | 200 |  824.388844ms |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:52:22 | 200 |     851.607µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:52:22.561+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:52:23 | 200 |  1.157841226s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:52:23 | 200 |     808.772µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:52:23.722+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:52:24 | 200 |  1.226060831s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:52:24 | 200 |     1.02996ms |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:52:24.951+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:52:26 | 200 |  1.300700119s |       127.0.0.1 | POST     "/api/chat"

OS

Linux

GPU

Nvidia

CPU

Intel

Ollama version

0.6.8

Originally created by @leokeba on GitHub (May 13, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/10688 Originally assigned to: @ParthSareen on GitHub. ### What is the issue? I am getting what looks like a big memory leak when using ollama on a debian host with structured output with gemma3-12b. It's running on a 3090Ti with default settings. Memory (system RAM) usage grows very fast (about 30mb/sec) continuously until it saturates the host and hangs. I then need to restart the ollama server and the same thing happens as soon as I start inferencing again. If I stop the inference script the memory usage does not go down, I always have to kill the process to clear the RAM. I'm attaching the log output from the server, is anything else needed to look into this ? ### Relevant log output ```shell leo@debian-nvidia:~$ ollama serve 2025/05/13 16:48:00 routes.go:1233: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/home/leo/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]" time=2025-05-13T16:48:00.561+02:00 level=INFO source=images.go:463 msg="total blobs: 33" time=2025-05-13T16:48:00.562+02:00 level=INFO source=images.go:470 msg="total unused blobs removed: 0" time=2025-05-13T16:48:00.562+02:00 level=INFO source=routes.go:1300 msg="Listening on 127.0.0.1:11434 (version 0.6.8)" time=2025-05-13T16:48:00.562+02:00 level=INFO source=gpu.go:217 msg="looking for compatible GPUs" time=2025-05-13T16:48:00.784+02:00 level=INFO source=types.go:130 msg="inference compute" id=GPU-d744590e-2a3b-8e2e-f4bc-988c67d6c902 library=cuda variant=v12 compute=8.6 driver=12.8 name="NVIDIA GeForce RTX 3090 Ti" total="23.6 GiB" available="23.3 GiB" [GIN] 2025/05/13 - 16:48:37 | 200 | 980.376µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:48:37.849+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 time=2025-05-13T16:48:37.996+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 time=2025-05-13T16:48:38.026+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 time=2025-05-13T16:48:38.028+02:00 level=INFO source=sched.go:754 msg="new model will fit in available VRAM in single GPU, loading" model=/home/leo/.ollama/models/blobs/sha256-e8ad13eff07a78d89926e9e8b882317d082ef5bf9768ad7b50fcdbbcd63748de gpu=GPU-d744590e-2a3b-8e2e-f4bc-988c67d6c902 parallel=2 available=24974786560 required="11.0 GiB" time=2025-05-13T16:48:38.137+02:00 level=INFO source=server.go:106 msg="system memory" total="31.2 GiB" free="29.3 GiB" free_swap="542.1 MiB" time=2025-05-13T16:48:38.139+02:00 level=INFO source=server.go:139 msg=offload library=cuda layers.requested=-1 layers.model=49 layers.offload=49 layers.split="" memory.available="[23.3 GiB]" memory.gpu_overhead="0 B" memory.required.full="11.0 GiB" memory.required.partial="11.0 GiB" memory.required.kv="1.3 GiB" memory.required.allocations="[11.0 GiB]" memory.weights.total="6.8 GiB" memory.weights.repeating="6.0 GiB" memory.weights.nonrepeating="787.5 MiB" memory.graph.full="519.5 MiB" memory.graph.partial="1.3 GiB" projector.weights="795.9 MiB" projector.graph="1.0 GiB" time=2025-05-13T16:48:38.192+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 time=2025-05-13T16:48:38.193+02:00 level=WARN source=ggml.go:152 msg="key not found" key=tokenizer.ggml.add_eot_token default=false time=2025-05-13T16:48:38.196+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.attention.layer_norm_rms_epsilon default=9.999999974752427e-07 time=2025-05-13T16:48:38.196+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.local.freq_base default=10000 time=2025-05-13T16:48:38.196+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06 time=2025-05-13T16:48:38.196+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.freq_scale default=1 time=2025-05-13T16:48:38.196+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.mm_tokens_per_image default=256 time=2025-05-13T16:48:38.197+02:00 level=INFO source=server.go:410 msg="starting llama server" cmd="/usr/local/bin/ollama runner --ollama-engine --model /home/leo/.ollama/models/blobs/sha256-e8ad13eff07a78d89926e9e8b882317d082ef5bf9768ad7b50fcdbbcd63748de --ctx-size 8192 --batch-size 512 --n-gpu-layers 49 --threads 10 --parallel 2 --port 33857" time=2025-05-13T16:48:38.197+02:00 level=INFO source=sched.go:452 msg="loaded runners" count=1 time=2025-05-13T16:48:38.197+02:00 level=INFO source=server.go:589 msg="waiting for llama runner to start responding" time=2025-05-13T16:48:38.197+02:00 level=INFO source=server.go:623 msg="waiting for server to become available" status="llm server not responding" time=2025-05-13T16:48:38.205+02:00 level=INFO source=runner.go:851 msg="starting ollama engine" time=2025-05-13T16:48:38.205+02:00 level=INFO source=runner.go:914 msg="Server listening on 127.0.0.1:33857" time=2025-05-13T16:48:38.255+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 time=2025-05-13T16:48:38.256+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.name default="" time=2025-05-13T16:48:38.256+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.description default="" time=2025-05-13T16:48:38.256+02:00 level=INFO source=ggml.go:72 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=1065 num_key_values=37 load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-haswell.so ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 CUDA devices: Device 0: NVIDIA GeForce RTX 3090 Ti, compute capability 8.6, VMM: yes load_backend: loaded CUDA backend from /usr/local/lib/ollama/cuda_v12/libggml-cuda.so time=2025-05-13T16:48:38.313+02:00 level=INFO source=ggml.go:103 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=500,600,610,700,750,800,860,870,890,900,1200 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc) time=2025-05-13T16:48:38.385+02:00 level=INFO source=ggml.go:298 msg="model weights" buffer=CUDA0 size="7.6 GiB" time=2025-05-13T16:48:38.385+02:00 level=INFO source=ggml.go:298 msg="model weights" buffer=CPU size="787.5 MiB" time=2025-05-13T16:48:38.449+02:00 level=INFO source=server.go:623 msg="waiting for server to become available" status="llm server loading model" time=2025-05-13T16:48:43.663+02:00 level=WARN source=ggml.go:152 msg="key not found" key=tokenizer.ggml.add_eot_token default=false time=2025-05-13T16:48:43.667+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.attention.layer_norm_rms_epsilon default=9.999999974752427e-07 time=2025-05-13T16:48:43.667+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.local.freq_base default=10000 time=2025-05-13T16:48:43.667+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06 time=2025-05-13T16:48:43.667+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.freq_scale default=1 time=2025-05-13T16:48:43.667+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.mm_tokens_per_image default=256 time=2025-05-13T16:48:43.694+02:00 level=INFO source=ggml.go:553 msg="compute graph" backend=CUDA0 buffer_type=CUDA0 size="308.0 MiB" time=2025-05-13T16:48:43.694+02:00 level=INFO source=ggml.go:553 msg="compute graph" backend=CPU buffer_type=CPU size="7.5 MiB" time=2025-05-13T16:48:43.713+02:00 level=INFO source=server.go:628 msg="llama runner started in 5.52 seconds" [GIN] 2025/05/13 - 16:48:45 | 200 | 7.544991459s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:48:45 | 200 | 858.796µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:48:45.387+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:48:46 | 200 | 990.816217ms | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:48:46 | 200 | 821.351µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:48:46.381+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:48:47 | 200 | 768.481138ms | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:48:47 | 200 | 828.9µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:48:47.153+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:48:47 | 200 | 762.318591ms | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:48:47 | 200 | 813.687µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:48:47.918+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:48:48 | 200 | 763.796017ms | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:48:48 | 200 | 935.834µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:48:48.686+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:48:49 | 200 | 762.516804ms | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:48:49 | 200 | 835.289µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:48:49.451+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:48:50 | 200 | 755.520694ms | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:48:50 | 200 | 843.444µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:48:50.209+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:48:50 | 200 | 754.848081ms | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:48:50 | 200 | 861.046µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:48:50.969+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:48:51 | 200 | 745.156696ms | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:48:51 | 200 | 863.531µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:48:51.716+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:48:52 | 200 | 1.054927393s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:48:52 | 200 | 855.916µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:48:52.775+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:48:53 | 200 | 994.745329ms | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:48:53 | 200 | 854.313µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:48:53.773+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:48:54 | 200 | 1.049445941s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:48:54 | 200 | 811.638µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:48:54.824+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:48:55 | 200 | 850.098955ms | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:48:55 | 200 | 792.085µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:48:55.678+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:48:56 | 200 | 993.567647ms | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:48:56 | 200 | 817.565µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:48:56.675+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:48:57 | 200 | 984.835547ms | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:48:57 | 200 | 816.258µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:48:57.663+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:48:58 | 200 | 1.027214846s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:48:58 | 200 | 815.524µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:48:58.693+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:48:59 | 200 | 1.056577493s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:48:59 | 200 | 932.081µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:48:59.754+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:49:00 | 200 | 988.791845ms | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:49:00 | 200 | 821.784µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:49:00.745+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:49:01 | 200 | 1.002698372s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:49:01 | 200 | 832.131µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:49:01.751+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:49:02 | 200 | 1.061438887s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:49:02 | 200 | 853.777µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:49:02.817+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:49:03 | 200 | 1.073778163s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:49:03 | 200 | 818.274µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:49:03.893+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:49:04 | 200 | 978.437141ms | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:49:04 | 200 | 820.064µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:49:04.875+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:49:05 | 200 | 984.538035ms | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:49:05 | 200 | 815.883µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:49:05.863+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:49:06 | 200 | 984.138888ms | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:49:06 | 200 | 828.386µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:49:06.849+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:49:07 | 200 | 984.296063ms | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:49:07 | 200 | 802.968µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:49:07.837+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:49:08 | 200 | 1.076913058s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:49:08 | 200 | 812.68µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:49:08.917+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:49:09 | 200 | 994.295375ms | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:49:09 | 200 | 829.005µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:49:09.915+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:49:10 | 200 | 925.360822ms | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:49:10 | 200 | 837.753µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:49:10.843+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:49:11 | 200 | 1.007319818s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:49:11 | 200 | 848.73µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:49:11.854+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:49:12 | 200 | 1.000909106s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:49:12 | 200 | 844.68µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:49:12.858+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:49:13 | 200 | 1.008300583s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:49:13 | 200 | 801.465µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:49:13.869+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:49:14 | 200 | 1.006693507s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:49:14 | 200 | 831.461µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:49:14.880+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:49:15 | 200 | 950.929392ms | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:49:15 | 200 | 841.006µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:49:15.833+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:49:16 | 200 | 952.581587ms | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:49:16 | 200 | 839.864µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:49:16.789+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:49:17 | 200 | 997.29827ms | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:49:17 | 200 | 820.078µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:49:17.790+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:49:18 | 200 | 1.056098038s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:49:18 | 200 | 807.002µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:49:18.849+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:49:19 | 200 | 994.453749ms | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:49:19 | 200 | 803.421µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:49:19.846+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:49:20 | 200 | 988.04637ms | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:49:20 | 200 | 837.535µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:49:20.838+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:49:21 | 200 | 932.301373ms | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:49:21 | 200 | 839.469µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:49:21.773+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:49:22 | 200 | 1.059917962s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:49:22 | 200 | 824.76µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:49:22.836+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:49:24 | 200 | 1.269081446s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:49:24 | 200 | 820.431µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:49:24.109+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:49:24 | 200 | 730.49696ms | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:49:24 | 200 | 837.096µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:49:24.842+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:49:25 | 200 | 749.720052ms | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:49:25 | 200 | 893.069µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:49:25.596+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:49:26 | 200 | 1.0075791s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:49:26 | 200 | 826.889µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:49:26.607+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:49:27 | 200 | 985.420873ms | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:49:27 | 200 | 809.116µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:49:27.595+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:49:28 | 200 | 1.078693931s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:49:28 | 200 | 836.868µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:49:28.676+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:49:29 | 200 | 1.026126364s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:49:29 | 200 | 814.288µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:49:29.707+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:49:30 | 200 | 984.94677ms | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:49:30 | 200 | 817.913µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:49:30.694+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:49:31 | 200 | 1.032658607s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:49:31 | 200 | 794.625µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:49:31.730+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:49:32 | 200 | 1.066533978s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:49:33 | 200 | 836.575µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:49:33.168+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:49:34 | 200 | 1.447059891s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:49:34 | 200 | 841.524µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:49:34.618+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:49:35 | 200 | 1.027703899s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:49:35 | 200 | 863.036µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:49:35.648+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:49:36 | 200 | 994.915362ms | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:49:36 | 200 | 803.869µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:49:36.647+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:49:38 | 200 | 1.58195298s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:49:38 | 200 | 884.533µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:49:38.232+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:49:39 | 200 | 1.142600246s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:49:39 | 200 | 825.11µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:49:39.377+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:49:40 | 200 | 1.080568844s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:49:40 | 200 | 815.512µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:49:40.462+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:49:41 | 200 | 1.146987692s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:49:41 | 200 | 811.222µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:49:41.611+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:49:42 | 200 | 979.825221ms | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:49:42 | 200 | 852.475µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:49:42.594+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:49:43 | 200 | 982.748158ms | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:49:43 | 200 | 820.496µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:49:43.581+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:49:44 | 200 | 1.100700606s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:49:44 | 200 | 862.996µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:49:44.684+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:49:46 | 200 | 1.497456592s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:49:46 | 200 | 853.133µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:49:46.185+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:49:47 | 200 | 1.064251921s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:49:47 | 200 | 890.363µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:49:47.253+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:49:48 | 200 | 919.330615ms | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:49:48 | 200 | 887.668µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:49:48.175+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:49:49 | 200 | 937.752028ms | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:49:49 | 200 | 826.643µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:49:49.116+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:49:50 | 200 | 991.541073ms | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:49:50 | 200 | 815.88µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:49:50.112+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:49:51 | 200 | 997.471176ms | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:49:51 | 200 | 835.182µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:49:51.112+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:49:52 | 200 | 1.051647686s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:49:52 | 200 | 817.042µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:49:52.166+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:49:53 | 200 | 1.104912045s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:49:53 | 200 | 834.291µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:49:53.275+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:49:54 | 200 | 945.281302ms | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:49:54 | 200 | 1.186419ms | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:49:54.224+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:49:55 | 200 | 898.399962ms | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:49:55 | 200 | 796.173µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:49:55.125+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:49:56 | 200 | 1.108278874s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:49:56 | 200 | 877.902µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:49:56.237+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:49:57 | 200 | 922.307376ms | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:49:57 | 200 | 818.859µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:49:57.162+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:49:58 | 200 | 1.093775179s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:49:58 | 200 | 826.919µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:49:58.259+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:49:59 | 200 | 937.066277ms | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:49:59 | 200 | 817.475µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:49:59.200+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:50:00 | 200 | 902.689181ms | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:50:00 | 200 | 830.142µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:50:00.105+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:50:01 | 200 | 1.583952845s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:50:01 | 200 | 827.615µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:50:01.692+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:50:02 | 200 | 1.105850744s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:50:02 | 200 | 842.06µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:50:02.802+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:50:03 | 200 | 928.793041ms | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:50:03 | 200 | 827.222µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:50:03.733+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:50:04 | 200 | 910.071569ms | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:50:04 | 200 | 1.226551ms | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:50:04.648+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:50:05 | 200 | 936.489608ms | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:50:05 | 200 | 833.971µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:50:05.590+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:50:06 | 200 | 1.164017222s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:50:06 | 200 | 832.653µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:50:06.754+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:50:07 | 200 | 1.022418572s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:50:07 | 200 | 837.561µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:50:07.781+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:50:08 | 200 | 969.502488ms | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:50:08 | 200 | 779.343µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:50:08.753+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:50:09 | 200 | 961.229721ms | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:50:09 | 200 | 821.739µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:50:09.717+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:50:10 | 200 | 1.200781498s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:50:10 | 200 | 838.319µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:50:10.921+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:50:11 | 200 | 1.050320821s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:50:11 | 200 | 814.57µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:50:11.975+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:50:12 | 200 | 985.961573ms | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:50:12 | 200 | 904.037µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:50:12.964+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:50:14 | 200 | 1.193615905s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:50:14 | 200 | 820.404µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:50:14.160+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:50:15 | 200 | 959.403609ms | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:50:15 | 200 | 831.935µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:50:15.124+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:50:16 | 200 | 1.176947939s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:50:16 | 200 | 831.985µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:50:16.303+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:50:17 | 200 | 923.669162ms | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:50:17 | 200 | 784.075µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:50:17.230+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:50:18 | 200 | 969.61224ms | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:50:18 | 200 | 806.076µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:50:18.203+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:50:19 | 200 | 1.001123719s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:50:19 | 200 | 812.874µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:50:19.207+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:50:20 | 200 | 948.493703ms | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:50:20 | 200 | 829.258µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:50:20.159+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:50:21 | 200 | 929.241283ms | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:50:21 | 200 | 857.089µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:50:21.092+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:50:22 | 200 | 1.107269829s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:50:22 | 200 | 805.287µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:50:22.202+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:50:23 | 200 | 1.089536435s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:50:23 | 200 | 836.418µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:50:23.295+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:50:23 | 500 | 494.37497ms | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:51:36 | 200 | 819.058µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:51:36.739+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 time=2025-05-13T16:51:36.894+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 time=2025-05-13T16:51:36.925+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 time=2025-05-13T16:51:36.929+02:00 level=INFO source=sched.go:517 msg="updated VRAM based on existing loaded models" gpu=GPU-d744590e-2a3b-8e2e-f4bc-988c67d6c902 library=cuda total="23.6 GiB" available="12.6 GiB" time=2025-05-13T16:51:36.929+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.vision.block_count default=0 time=2025-05-13T16:51:36.930+02:00 level=INFO source=sched.go:754 msg="new model will fit in available VRAM in single GPU, loading" model=/home/leo/.ollama/models/blobs/sha256-2fc97d5d1c3217ac90931d7992762a418e09b2c5c49c030c6c1f9e8fbb221013 gpu=GPU-d744590e-2a3b-8e2e-f4bc-988c67d6c902 parallel=2 available=13522255936 required="10.0 GiB" time=2025-05-13T16:51:37.047+02:00 level=INFO source=server.go:106 msg="system memory" total="31.2 GiB" free="25.2 GiB" free_swap="542.6 MiB" time=2025-05-13T16:51:37.047+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.vision.block_count default=0 time=2025-05-13T16:51:37.048+02:00 level=INFO source=server.go:139 msg=offload library=cuda layers.requested=-1 layers.model=49 layers.offload=49 layers.split="" memory.available="[12.6 GiB]" memory.gpu_overhead="0 B" memory.required.full="10.0 GiB" memory.required.partial="10.0 GiB" memory.required.kv="1.3 GiB" memory.required.allocations="[10.0 GiB]" memory.weights.total="7.6 GiB" memory.weights.repeating="6.8 GiB" memory.weights.nonrepeating="787.7 MiB" memory.graph.full="519.6 MiB" memory.graph.partial="1.3 GiB" time=2025-05-13T16:51:37.098+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 time=2025-05-13T16:51:37.099+02:00 level=WARN source=ggml.go:152 msg="key not found" key=tokenizer.ggml.add_eot_token default=false time=2025-05-13T16:51:37.101+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.vision.image_size default=0 time=2025-05-13T16:51:37.101+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.vision.patch_size default=0 time=2025-05-13T16:51:37.101+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.vision.num_channels default=0 time=2025-05-13T16:51:37.101+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.vision.block_count default=0 time=2025-05-13T16:51:37.101+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.vision.embedding_length default=0 time=2025-05-13T16:51:37.101+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.vision.attention.head_count default=0 time=2025-05-13T16:51:37.101+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.vision.image_size default=0 time=2025-05-13T16:51:37.101+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.vision.patch_size default=0 time=2025-05-13T16:51:37.101+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.vision.attention.layer_norm_epsilon default=0 time=2025-05-13T16:51:37.103+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.freq_scale default=1 time=2025-05-13T16:51:37.103+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.mm_tokens_per_image default=256 time=2025-05-13T16:51:37.103+02:00 level=INFO source=server.go:410 msg="starting llama server" cmd="/usr/local/bin/ollama runner --ollama-engine --model /home/leo/.ollama/models/blobs/sha256-2fc97d5d1c3217ac90931d7992762a418e09b2c5c49c030c6c1f9e8fbb221013 --ctx-size 8192 --batch-size 512 --n-gpu-layers 49 --threads 10 --parallel 2 --port 42857" time=2025-05-13T16:51:37.103+02:00 level=INFO source=sched.go:452 msg="loaded runners" count=2 time=2025-05-13T16:51:37.104+02:00 level=INFO source=server.go:589 msg="waiting for llama runner to start responding" time=2025-05-13T16:51:37.104+02:00 level=INFO source=server.go:623 msg="waiting for server to become available" status="llm server not responding" time=2025-05-13T16:51:37.111+02:00 level=INFO source=runner.go:851 msg="starting ollama engine" time=2025-05-13T16:51:37.112+02:00 level=INFO source=runner.go:914 msg="Server listening on 127.0.0.1:42857" time=2025-05-13T16:51:37.165+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 time=2025-05-13T16:51:37.166+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.name default="" time=2025-05-13T16:51:37.166+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.description default="" time=2025-05-13T16:51:37.166+02:00 level=INFO source=ggml.go:72 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=1737 num_key_values=32 load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-haswell.so ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 CUDA devices: Device 0: NVIDIA GeForce RTX 3090 Ti, compute capability 8.6, VMM: yes load_backend: loaded CUDA backend from /usr/local/lib/ollama/cuda_v12/libggml-cuda.so time=2025-05-13T16:51:37.204+02:00 level=INFO source=ggml.go:103 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=500,600,610,700,750,800,860,870,890,900,1200 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc) time=2025-05-13T16:51:37.285+02:00 level=INFO source=ggml.go:298 msg="model weights" buffer=CUDA0 size="8.4 GiB" time=2025-05-13T16:51:37.285+02:00 level=INFO source=ggml.go:298 msg="model weights" buffer=CPU size="787.7 MiB" time=2025-05-13T16:51:37.355+02:00 level=INFO source=server.go:623 msg="waiting for server to become available" status="llm server loading model" time=2025-05-13T16:51:38.979+02:00 level=WARN source=ggml.go:152 msg="key not found" key=tokenizer.ggml.add_eot_token default=false time=2025-05-13T16:51:38.981+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.vision.image_size default=0 time=2025-05-13T16:51:38.981+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.vision.patch_size default=0 time=2025-05-13T16:51:38.981+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.vision.num_channels default=0 time=2025-05-13T16:51:38.981+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.vision.block_count default=0 time=2025-05-13T16:51:38.981+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.vision.embedding_length default=0 time=2025-05-13T16:51:38.981+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.vision.attention.head_count default=0 time=2025-05-13T16:51:38.981+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.vision.image_size default=0 time=2025-05-13T16:51:38.981+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.vision.patch_size default=0 time=2025-05-13T16:51:38.981+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.vision.attention.layer_norm_epsilon default=0 time=2025-05-13T16:51:38.983+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.freq_scale default=1 time=2025-05-13T16:51:38.983+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.mm_tokens_per_image default=256 time=2025-05-13T16:51:39.009+02:00 level=INFO source=ggml.go:553 msg="compute graph" backend=CUDA0 buffer_type=CUDA0 size="308.0 MiB" time=2025-05-13T16:51:39.009+02:00 level=INFO source=ggml.go:553 msg="compute graph" backend=CPU buffer_type=CPU size="7.5 MiB" time=2025-05-13T16:51:39.109+02:00 level=INFO source=server.go:628 msg="llama runner started in 2.01 seconds" [GIN] 2025/05/13 - 16:51:40 | 200 | 3.625931541s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:51:40 | 200 | 1.304686ms | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:51:40.368+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:51:41 | 200 | 1.173288276s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:51:41 | 200 | 884.953µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:51:41.545+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:51:42 | 200 | 1.093041205s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:51:42 | 200 | 876.112µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:51:42.641+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:51:43 | 200 | 1.190148307s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:51:43 | 200 | 927.104µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:51:43.835+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:51:44 | 200 | 1.117178999s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:51:44 | 200 | 805.534µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:51:44.955+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:51:46 | 200 | 1.188636665s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:51:46 | 200 | 1.21855ms | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:51:46.148+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:51:47 | 200 | 1.126001863s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:51:47 | 200 | 842.821µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:51:47.277+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:51:48 | 200 | 1.062130448s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:51:48 | 200 | 853.01µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:51:48.342+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:51:49 | 200 | 1.192205675s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:51:49 | 200 | 828.288µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:51:49.537+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:51:50 | 200 | 1.064617413s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:51:50 | 200 | 826.038µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:51:50.605+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:51:51 | 200 | 1.181298946s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:51:51 | 200 | 912.704µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:51:51.792+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:51:52 | 200 | 1.157731531s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:51:52 | 200 | 827.216µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:51:52.951+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:51:54 | 200 | 1.144311709s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:51:54 | 200 | 852.097µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:51:54.098+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:51:55 | 200 | 1.151178661s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:51:55 | 200 | 823.791µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:51:55.253+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:51:56 | 200 | 1.098051386s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:51:56 | 200 | 830.556µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:51:56.354+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:51:57 | 200 | 1.095882359s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:51:57 | 200 | 823.501µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:51:57.455+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:51:58 | 200 | 1.192077006s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:51:58 | 200 | 830.431µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:51:58.649+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:51:59 | 200 | 1.179899598s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:51:59 | 200 | 865.571µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:51:59.831+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:52:00 | 200 | 1.190354606s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:52:00 | 200 | 823.648µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:52:01.025+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:52:02 | 200 | 1.058573208s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:52:02 | 200 | 808.631µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:52:02.087+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:52:03 | 200 | 1.189769749s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:52:03 | 200 | 843.265µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:52:03.282+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:52:04 | 200 | 1.148332443s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:52:04 | 200 | 828.562µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:52:04.432+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:52:05 | 200 | 1.116938469s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:52:05 | 200 | 840.112µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:52:05.552+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:52:06 | 200 | 1.185360654s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:52:06 | 200 | 830.644µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:52:06.741+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:52:07 | 200 | 1.187190771s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:52:07 | 200 | 837.899µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:52:07.931+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:52:09 | 200 | 1.200790074s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:52:09 | 200 | 899.137µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:52:09.135+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:52:10 | 200 | 1.135805072s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:52:10 | 200 | 1.33389ms | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:52:10.275+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:52:11 | 200 | 1.178719063s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:52:11 | 200 | 792.231µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:52:11.457+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:52:12 | 200 | 1.068732647s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:52:12 | 200 | 1.358903ms | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:52:12.530+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:52:13 | 200 | 1.084584064s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:52:13 | 200 | 815.049µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:52:13.617+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:52:14 | 200 | 1.099273485s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:52:14 | 200 | 872.335µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:52:14.720+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:52:15 | 200 | 1.291627781s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:52:15 | 200 | 853.691µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:52:16.017+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:52:17 | 200 | 1.165006914s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:52:17 | 200 | 857.917µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:52:17.183+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:52:18 | 200 | 1.185455866s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:52:18 | 200 | 851.02µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:52:18.372+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:52:19 | 200 | 1.182260809s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:52:19 | 200 | 833.657µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:52:19.558+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:52:20 | 200 | 1.100934571s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:52:20 | 200 | 838.036µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:52:20.662+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:52:21 | 200 | 1.067580922s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:52:21 | 200 | 845.81µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:52:21.734+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:52:22 | 200 | 824.388844ms | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:52:22 | 200 | 851.607µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:52:22.561+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:52:23 | 200 | 1.157841226s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:52:23 | 200 | 808.772µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:52:23.722+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:52:24 | 200 | 1.226060831s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:52:24 | 200 | 1.02996ms | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:52:24.951+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:52:26 | 200 | 1.300700119s | 127.0.0.1 | POST "/api/chat" ``` ### OS Linux ### GPU Nvidia ### CPU Intel ### Ollama version 0.6.8
GiteaMirror added the bug label 2026-05-04 17:07:34 -05:00
Author
Owner

@leokeba commented on GitHub (May 13, 2025):

Here's a minimal script to reproduce the issue :

from pydantic import BaseModel
from enum import StrEnum
from typing import List
import ollama

class TweetSubject(StrEnum):
    REGLES_DE_CRISE_ADAPTATIONS = "Règles de crise / adaptations"
    MEDECINE_PROTECTION_SANITAIRE = "Médecine / protection sanitaire"
    VACCIN = "Vaccin"
    PENURIE_GESTION_MATERIEL = "Pénurie / gestion matériel"
    PASS_SANITAIRE = "Pass sanitaire"
    AUTRES = "Autres"
    NSP = "NSP"

subjects_description = {
    TweetSubject.REGLES_DE_CRISE_ADAPTATIONS: "règles édictées pour répondre à la situation de pandémie de covid19 et façons de s’y adapter. Aspect légal, autorisations, interdictions, etc.",
    TweetSubject.MEDECINE_PROTECTION_SANITAIRE: "sujets médicaux, liés au covid19 ou à d’autres maladies, aux risques qu’elles entraînent et aux façons de s’en protéger.",
    TweetSubject.VACCIN: "vaccin contre le covid19. Ne se combine pas avec un autre sujet sauf si plusieurs questions bien distinctes.",
    TweetSubject.PENURIE_GESTION_MATERIEL: "gestion par l’état du matériel nécessaire pour lutter contre le covid19, y compris les moyens humains des hôpitaux, le prix des moyens de protection et leur accessibilité.",
    TweetSubject.PASS_SANITAIRE: "pass sanitaire, passeport vaccinal ou toute différenciation de droit entre personnes vaccinées et personnes non-vaccinées, ou toute autorisation ou interdiction liée au résultat d’un test. Si le sujet « pass sanitaire » est présent dans le message, pas de combinaison avec un autre. Si les sujets « pass sanitaire » et « vaccin » sont présents, indiquer « pass sanitaire », sauf si plusieurs questions bien distinctes.",
    TweetSubject.AUTRES: "sujet qui ne rentre dans aucune des autres catégories.",
    TweetSubject.NSP: "s’il n’y a pas d’élément susceptible d’éclairer sur le sujet du tweet"
}

class TweetEmotion(StrEnum):
    MECONTENTEMENT_COLERE = "Mécontentement / colère"
    GRATITUDE_VALIDATION = "Gratitude / validation"
    NEUTRE = "Neutre"
    AUTRE = "Autre"
    NSP = "NSP"

emotions_description = {
    TweetEmotion.MECONTENTEMENT_COLERE: "expression d’une émotion pouvant aller d’un léger mécontentement à la colère la plus violente.",
    TweetEmotion.GRATITUDE_VALIDATION: "expression d’une gratitude ou validation de propos ou d’actions.",
    TweetEmotion.NEUTRE: "pas d’émotion particulière détectable.",
    TweetEmotion.AUTRE: "expression d’une émotion qui ne rentre dans aucune des autres catégories.",
    TweetEmotion.NSP: "s’il n’y a pas d’élément susceptible d’éclairer sur l’émotion du tweet."
}

class TweetAnalysis(BaseModel):
    subject: List[TweetSubject]
    emotion: List[TweetEmotion]

subject_list = [subject.value for subject in TweetSubject]
emotion_list = [emotion.value for emotion in TweetEmotion]

def build_analysis_prompt(tweet):
    subject_list_string = '''
    - '''.join(f"'{subject}' : {subjects_description[subject]}" for subject in subject_list)
    emotion_list_string = '''
    - '''.join(f"'{emotion}' : {emotions_description[emotion]}" for emotion in emotion_list)

    prompt = f"""Analyze the following tweet and return the subjects and emotions that best characterizes its contents as a json object.
                Do not return more than 2 subjects and 2 emotions.
    
                Tweet: {tweet}

                Subjects: 
                - {subject_list_string}

                Emotions: 
                - {emotion_list_string}"""
    return prompt

def get_ollama_structured_response(prompt: str, model: BaseModel):
        """
        Sends a prompt to the Ollama model and returns the response content.
        """
        response = ollama.chat(
            model='gemma3:12b', 
            messages=[
                {
                    'role': 'user',
                    'content': prompt,
                }
            ],
            format=model.model_json_schema(),
        )
        return model.model_validate_json(response['message']['content'])

tweet1 = '''Il n'y a pas que la Chine et l'Italie dont s'inspirer         
Taiwan semble exemplaire dans son traitement du #Coronavirus  #Onvousrépond 
https://t.co/kr8LE9C0vg'''

tweet2 = '@la_muse88 Ça dépendra des résultats du 1er tour...si LR est en tête pas de confinement avant le 2e tour, sinon....#OnVousRepond #France2'

tweet3 = '''1- 50% des malades en réanimation on moins de 65ans
2- Est-ce que le fait d'être fumeur est un facteur à risque ?
3-Pourquoi les bureaux de tabac restent ouverts (non alimentaire)? 
 #michelcimes #France2 #OnVousRepond #COVIDー19'''

while 1:
    response = get_ollama_structured_response(build_analysis_prompt(tweet2), TweetAnalysis)
    print(response)
    response = get_ollama_structured_response(build_analysis_prompt(tweet1), TweetAnalysis)
    print(response)
    response = get_ollama_structured_response(build_analysis_prompt(tweet3), TweetAnalysis)
    print(response)
<!-- gh-comment-id:2877187447 --> @leokeba commented on GitHub (May 13, 2025): Here's a minimal script to reproduce the issue : ``` from pydantic import BaseModel from enum import StrEnum from typing import List import ollama class TweetSubject(StrEnum): REGLES_DE_CRISE_ADAPTATIONS = "Règles de crise / adaptations" MEDECINE_PROTECTION_SANITAIRE = "Médecine / protection sanitaire" VACCIN = "Vaccin" PENURIE_GESTION_MATERIEL = "Pénurie / gestion matériel" PASS_SANITAIRE = "Pass sanitaire" AUTRES = "Autres" NSP = "NSP" subjects_description = { TweetSubject.REGLES_DE_CRISE_ADAPTATIONS: "règles édictées pour répondre à la situation de pandémie de covid19 et façons de s’y adapter. Aspect légal, autorisations, interdictions, etc.", TweetSubject.MEDECINE_PROTECTION_SANITAIRE: "sujets médicaux, liés au covid19 ou à d’autres maladies, aux risques qu’elles entraînent et aux façons de s’en protéger.", TweetSubject.VACCIN: "vaccin contre le covid19. Ne se combine pas avec un autre sujet sauf si plusieurs questions bien distinctes.", TweetSubject.PENURIE_GESTION_MATERIEL: "gestion par l’état du matériel nécessaire pour lutter contre le covid19, y compris les moyens humains des hôpitaux, le prix des moyens de protection et leur accessibilité.", TweetSubject.PASS_SANITAIRE: "pass sanitaire, passeport vaccinal ou toute différenciation de droit entre personnes vaccinées et personnes non-vaccinées, ou toute autorisation ou interdiction liée au résultat d’un test. Si le sujet « pass sanitaire » est présent dans le message, pas de combinaison avec un autre. Si les sujets « pass sanitaire » et « vaccin » sont présents, indiquer « pass sanitaire », sauf si plusieurs questions bien distinctes.", TweetSubject.AUTRES: "sujet qui ne rentre dans aucune des autres catégories.", TweetSubject.NSP: "s’il n’y a pas d’élément susceptible d’éclairer sur le sujet du tweet" } class TweetEmotion(StrEnum): MECONTENTEMENT_COLERE = "Mécontentement / colère" GRATITUDE_VALIDATION = "Gratitude / validation" NEUTRE = "Neutre" AUTRE = "Autre" NSP = "NSP" emotions_description = { TweetEmotion.MECONTENTEMENT_COLERE: "expression d’une émotion pouvant aller d’un léger mécontentement à la colère la plus violente.", TweetEmotion.GRATITUDE_VALIDATION: "expression d’une gratitude ou validation de propos ou d’actions.", TweetEmotion.NEUTRE: "pas d’émotion particulière détectable.", TweetEmotion.AUTRE: "expression d’une émotion qui ne rentre dans aucune des autres catégories.", TweetEmotion.NSP: "s’il n’y a pas d’élément susceptible d’éclairer sur l’émotion du tweet." } class TweetAnalysis(BaseModel): subject: List[TweetSubject] emotion: List[TweetEmotion] subject_list = [subject.value for subject in TweetSubject] emotion_list = [emotion.value for emotion in TweetEmotion] def build_analysis_prompt(tweet): subject_list_string = ''' - '''.join(f"'{subject}' : {subjects_description[subject]}" for subject in subject_list) emotion_list_string = ''' - '''.join(f"'{emotion}' : {emotions_description[emotion]}" for emotion in emotion_list) prompt = f"""Analyze the following tweet and return the subjects and emotions that best characterizes its contents as a json object. Do not return more than 2 subjects and 2 emotions. Tweet: {tweet} Subjects: - {subject_list_string} Emotions: - {emotion_list_string}""" return prompt def get_ollama_structured_response(prompt: str, model: BaseModel): """ Sends a prompt to the Ollama model and returns the response content. """ response = ollama.chat( model='gemma3:12b', messages=[ { 'role': 'user', 'content': prompt, } ], format=model.model_json_schema(), ) return model.model_validate_json(response['message']['content']) tweet1 = '''Il n'y a pas que la Chine et l'Italie dont s'inspirer Taiwan semble exemplaire dans son traitement du #Coronavirus #Onvousrépond https://t.co/kr8LE9C0vg''' tweet2 = '@la_muse88 Ça dépendra des résultats du 1er tour...si LR est en tête pas de confinement avant le 2e tour, sinon....#OnVousRepond #France2' tweet3 = '''1- 50% des malades en réanimation on moins de 65ans 2- Est-ce que le fait d'être fumeur est un facteur à risque ? 3-Pourquoi les bureaux de tabac restent ouverts (non alimentaire)? #michelcimes #France2 #OnVousRepond #COVIDー19''' while 1: response = get_ollama_structured_response(build_analysis_prompt(tweet2), TweetAnalysis) print(response) response = get_ollama_structured_response(build_analysis_prompt(tweet1), TweetAnalysis) print(response) response = get_ollama_structured_response(build_analysis_prompt(tweet3), TweetAnalysis) print(response) ```
Author
Owner

@rick-github commented on GitHub (May 13, 2025):

#!/usr/bin/env python3

from pydantic import BaseModel
import ollama

ollama_url = "http://localhost:11434"
model = "gemma3:12b"

class result(BaseModel):
    parmeter1: str

messages=[
    {"role": "system", "content": "Extract the information."},
    {"role": "user", "content": "parameter 1 is 'hello'."},
]

def f():
  response = ollama.chat(
      model=model,
      messages=messages,
      format=result.model_json_schema())
  r = result.model_validate_json(response.message.content)
  print(r)

f()

Image

Linear increase points to unreleased buffer,

<!-- gh-comment-id:2877249696 --> @rick-github commented on GitHub (May 13, 2025): ```python #!/usr/bin/env python3 from pydantic import BaseModel import ollama ollama_url = "http://localhost:11434" model = "gemma3:12b" class result(BaseModel): parmeter1: str messages=[ {"role": "system", "content": "Extract the information."}, {"role": "user", "content": "parameter 1 is 'hello'."}, ] def f(): response = ollama.chat( model=model, messages=messages, format=result.model_json_schema()) r = result.model_validate_json(response.message.content) print(r) f() ``` ![Image](https://github.com/user-attachments/assets/8b7f96fd-08ed-458a-9bd8-412cbed967a1) Linear increase points to unreleased buffer,
Author
Owner

@ParthSareen commented on GitHub (May 13, 2025):

Sorry about that! And thanks @leokeba @rick-github for the logs and the data. Will look into it

<!-- gh-comment-id:2877273456 --> @ParthSareen commented on GitHub (May 13, 2025): Sorry about that! And thanks @leokeba @rick-github for the logs and the data. Will look into it
Author
Owner

@rohanshad commented on GitHub (Jun 8, 2025):

Hi all,
I'm actually running into this issue issue even with v 0.9.0. RAM usage creeps up with each returned response of the chat api using gemma3 family of models till it saturates the system. There's minimal memory usage in the python script when profiled, and the log output is essentially the same as OP. With htop, I see multiple instances of ollama runner that stepwise increase MEM% usage as responses are processed.

<!-- gh-comment-id:2954212519 --> @rohanshad commented on GitHub (Jun 8, 2025): Hi all, I'm actually running into this issue issue even with v 0.9.0. RAM usage creeps up with each returned response of the chat api using gemma3 family of models till it saturates the system. There's minimal memory usage in the python script when profiled, and the log output is essentially the same as OP. With htop, I see multiple instances of ollama runner that stepwise increase MEM% usage as responses are processed.
Author
Owner

@rick-github commented on GitHub (Jun 8, 2025):

After some growth at the start, RSS was pretty constant over the course of an hour.

Image

It may be a function of the type of grammar that is being created. Can you provide a simple script like the one above that demonstrates the problem?

<!-- gh-comment-id:2954246723 --> @rick-github commented on GitHub (Jun 8, 2025): After some growth at the start, RSS was pretty constant over the course of an hour. ![Image](https://github.com/user-attachments/assets/2de66cc0-d594-49e1-b252-378cdb4c6c76) It may be a function of the type of grammar that is being created. Can you provide a simple script like the one above that demonstrates the problem?
Author
Owner

@rohanshad commented on GitHub (Jun 8, 2025):

This is what my script kinda looks like. Strucured json response with an instance of AsyncClient spawned on each GPU. I can give you the whole script if needed. I'm pretty convinced the asyncio wrappers themselves aren't the issue, workers are spawned and terminate appropriately with no obvious leak on profiling. I can send you the whole script separately too if need be.


        async def chat(self, report_text, host, gpu_index, client):
		'''
		Core interface to Ollama server via AsyncClient 
		System message is hard coded 
		Report text fed in via worker function
		'''
		try:
			# Make the API request using the specified host and model
			response = await client.chat(
				model=self.model,
				messages=[{"role":"user", "content":self.system_msg+report_text}],
				format="json"
			)
			if self.debug:
				print('RAW OLLAMA RESPONSE:')
				print(response["message"]["content"])

			# Extract the response content
			response_json = json.loads(response['message']['content'].replace('\n','').strip().removeprefix('```json').removesuffix('```'))
			total_time = response['total_duration']/1e+9
			report_tokens = response['prompt_eval_count'] 

			return response_json, total_time, report_tokens

		except Exception as e:
			print(f"Error on host {host}")
			print(e)


	async def worker(self, host, gpu_index, task_queue):
		'''
		Worker function to pass report into chat function and return labelled result 
		Writes to list to collect outputs from all workers with lock 
		'''

		#Init ollama async client one time per worker
		client = ollama.AsyncClient(host=host)

		while not task_queue.empty():
	                report_text = await task_queue.get()

			# Prep and parse inputs
			response, total_time, report_tokens = await self.chat(report_text, host, gpu_index, client)
			
			record = {
			'total_time': total_time,
			'report_tokens': report_tokens,
			'report_json': response,
			}
			print(f'Labelling complete:{filename} | {host}:GPU#{gpu_index} | {round(total_time,2)}s')

			with self.lock:
				self.worker_outputs.append(record)

			del report_text
			del response
			del record

			task_queue.task_done()

<!-- gh-comment-id:2954252219 --> @rohanshad commented on GitHub (Jun 8, 2025): This is what my script kinda looks like. Strucured json response with an instance of `AsyncClient` spawned on each GPU. I can give you the whole script if needed. I'm pretty convinced the asyncio wrappers themselves aren't the issue, workers are spawned and terminate appropriately with no obvious leak on profiling. I can send you the whole script separately too if need be. ```python async def chat(self, report_text, host, gpu_index, client): ''' Core interface to Ollama server via AsyncClient System message is hard coded Report text fed in via worker function ''' try: # Make the API request using the specified host and model response = await client.chat( model=self.model, messages=[{"role":"user", "content":self.system_msg+report_text}], format="json" ) if self.debug: print('RAW OLLAMA RESPONSE:') print(response["message"]["content"]) # Extract the response content response_json = json.loads(response['message']['content'].replace('\n','').strip().removeprefix('```json').removesuffix('```')) total_time = response['total_duration']/1e+9 report_tokens = response['prompt_eval_count'] return response_json, total_time, report_tokens except Exception as e: print(f"Error on host {host}") print(e) async def worker(self, host, gpu_index, task_queue): ''' Worker function to pass report into chat function and return labelled result Writes to list to collect outputs from all workers with lock ''' #Init ollama async client one time per worker client = ollama.AsyncClient(host=host) while not task_queue.empty(): report_text = await task_queue.get() # Prep and parse inputs response, total_time, report_tokens = await self.chat(report_text, host, gpu_index, client) record = { 'total_time': total_time, 'report_tokens': report_tokens, 'report_json': response, } print(f'Labelling complete:{filename} | {host}:GPU#{gpu_index} | {round(total_time,2)}s') with self.lock: self.worker_outputs.append(record) del report_text del response del record task_queue.task_done() ```
Author
Owner

@rick-github commented on GitHub (Jun 8, 2025):

I was unable to duplicate with this script fragment, so I think the whole script and some example input is required. Server logs with OLLAMA_DEBUG=1 may also add some insight.

<!-- gh-comment-id:2954278412 --> @rick-github commented on GitHub (Jun 8, 2025): I was unable to duplicate with this script fragment, so I think the whole script and some example input is required. [Server logs](https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md#how-to-troubleshoot-issues) with `OLLAMA_DEBUG=1` may also add some insight.
Author
Owner

@rohanshad commented on GitHub (Jun 15, 2025):

Will send and prep a reproducible script soon, but here's server log with OLLAMA_DEBUG=1:

time=2025-06-14T20:24:09.722-04:00 level=INFO source=routes.go:1234 msg="server config" env="map[CUDA_VISIBLE_DEVICES:1 GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:DEBUG OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://localhost:11433 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:2562047h47m16.854775807s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:2h0m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/usr/share/ollama/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
time=2025-06-14T20:24:09.723-04:00 level=INFO source=images.go:479 msg="total blobs: 21"
time=2025-06-14T20:24:09.723-04:00 level=INFO source=images.go:486 msg="total unused blobs removed: 0"
time=2025-06-14T20:24:09.723-04:00 level=INFO source=routes.go:1287 msg="Listening on 127.0.0.1:11433 (version 0.9.0)"
time=2025-06-14T20:24:09.724-04:00 level=DEBUG source=sched.go:108 msg="starting llm scheduler"
time=2025-06-14T20:24:09.724-04:00 level=INFO source=gpu.go:217 msg="looking for compatible GPUs"
time=2025-06-14T20:24:09.728-04:00 level=DEBUG source=gpu.go:98 msg="searching for GPU discovery libraries for NVIDIA"
time=2025-06-14T20:24:09.728-04:00 level=DEBUG source=gpu.go:501 msg="Searching for GPU library" name=libcuda.so*
time=2025-06-14T20:24:09.728-04:00 level=DEBUG source=gpu.go:525 msg="gpu library search" globs="[/usr/local/lib/ollama/libcuda.so* /home/rohanshad/Hiesinger Lab Dropbox/MRI ML/cmr_core/utils/libcuda.so* /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]"
time=2025-06-14T20:24:09.733-04:00 level=DEBUG source=gpu.go:558 msg="discovered GPU libraries" paths="[/usr/lib/i386-linux-gnu/libcuda.so.535.230.02 /usr/lib/x86_64-linux-gnu/libcuda.so.535.230.02]"
initializing /usr/lib/i386-linux-gnu/libcuda.so.535.230.02
library /usr/lib/i386-linux-gnu/libcuda.so.535.230.02 load err: /usr/lib/i386-linux-gnu/libcuda.so.535.230.02: wrong ELF class: ELFCLASS32
time=2025-06-14T20:24:09.733-04:00 level=DEBUG source=gpu.go:609 msg="skipping 32bit library" library=/usr/lib/i386-linux-gnu/libcuda.so.535.230.02
initializing /usr/lib/x86_64-linux-gnu/libcuda.so.535.230.02
dlsym: cuInit - 0x7ed2504c2470
dlsym: cuDriverGetVersion - 0x7ed2504c2490
dlsym: cuDeviceGetCount - 0x7ed2504c24d0
dlsym: cuDeviceGet - 0x7ed2504c24b0
dlsym: cuDeviceGetAttribute - 0x7ed2504c25b0
dlsym: cuDeviceGetUuid - 0x7ed2504c2510
dlsym: cuDeviceGetName - 0x7ed2504c24f0
dlsym: cuCtxCreate_v3 - 0x7ed2504ca170
dlsym: cuMemGetInfo_v2 - 0x7ed2504d5640
dlsym: cuCtxDestroy - 0x7ed250524640
calling cuInit
calling cuDriverGetVersion
raw version 0x2ef4
CUDA driver version: 12.2
calling cuDeviceGetCount
device count 1
time=2025-06-14T20:24:09.740-04:00 level=DEBUG source=gpu.go:125 msg="detected GPUs" count=1 library=/usr/lib/x86_64-linux-gnu/libcuda.so.535.230.02
[GPU-ac66182e-fbc0-516e-a4c6-d0231a67ad6d] CUDA totalMem 24245mb
[GPU-ac66182e-fbc0-516e-a4c6-d0231a67ad6d] CUDA freeMem 22415mb
[GPU-ac66182e-fbc0-516e-a4c6-d0231a67ad6d] Compute Capability 8.6
time=2025-06-14T20:24:09.852-04:00 level=DEBUG source=amd_linux.go:419 msg="amdgpu driver not detected /sys/module/amdgpu"
releasing cuda driver library
time=2025-06-14T20:24:09.852-04:00 level=INFO source=types.go:130 msg="inference compute" id=GPU-ac66182e-fbc0-516e-a4c6-d0231a67ad6d library=cuda variant=v12 compute=8.6 driver=12.2 name="NVIDIA RTX A5000" total="23.7 GiB" available="21.9 GiB"
time=2025-06-14T20:24:19.847-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=general.alignment default=32
time=2025-06-14T20:24:19.848-04:00 level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="125.6 GiB" before.free="114.4 GiB" before.free_swap="327.6 MiB" now.total="125.6 GiB" now.free="114.4 GiB" now.free_swap="327.6 MiB"
initializing /usr/lib/x86_64-linux-gnu/libcuda.so.535.230.02
dlsym: cuInit - 0x7ed2504c2470
dlsym: cuDriverGetVersion - 0x7ed2504c2490
dlsym: cuDeviceGetCount - 0x7ed2504c24d0
dlsym: cuDeviceGet - 0x7ed2504c24b0
dlsym: cuDeviceGetAttribute - 0x7ed2504c25b0
dlsym: cuDeviceGetUuid - 0x7ed2504c2510
dlsym: cuDeviceGetName - 0x7ed2504c24f0
dlsym: cuCtxCreate_v3 - 0x7ed2504ca170
dlsym: cuMemGetInfo_v2 - 0x7ed2504d5640
dlsym: cuCtxDestroy - 0x7ed250524640
calling cuInit
calling cuDriverGetVersion
raw version 0x2ef4
CUDA driver version: 12.2
calling cuDeviceGetCount
device count 1
time=2025-06-14T20:24:20.129-04:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-ac66182e-fbc0-516e-a4c6-d0231a67ad6d name="NVIDIA RTX A5000" overhead="0 B" before.total="23.7 GiB" before.free="21.9 GiB" now.total="23.7 GiB" now.free="21.9 GiB" now.used="1.8 GiB"
releasing cuda driver library
time=2025-06-14T20:24:20.129-04:00 level=DEBUG source=sched.go:185 msg="updating default concurrency" OLLAMA_MAX_LOADED_MODELS=3 gpu_count=1
time=2025-06-14T20:24:20.179-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=general.alignment default=32
time=2025-06-14T20:24:20.211-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=general.alignment default=32
time=2025-06-14T20:24:20.213-04:00 level=DEBUG source=sched.go:228 msg="loading first model" model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87
time=2025-06-14T20:24:20.213-04:00 level=DEBUG source=memory.go:111 msg=evaluating library=cuda gpu_count=1 available="[21.9 GiB]"
time=2025-06-14T20:24:20.217-04:00 level=INFO source=sched.go:788 msg="new model will fit in available VRAM in single GPU, loading" model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 gpu=GPU-ac66182e-fbc0-516e-a4c6-d0231a67ad6d parallel=1 available=23504683008 required="19.9 GiB"
time=2025-06-14T20:24:20.218-04:00 level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="125.6 GiB" before.free="114.4 GiB" before.free_swap="327.6 MiB" now.total="125.6 GiB" now.free="114.3 GiB" now.free_swap="327.6 MiB"
initializing /usr/lib/x86_64-linux-gnu/libcuda.so.535.230.02
dlsym: cuInit - 0x7ed2504c2470
dlsym: cuDriverGetVersion - 0x7ed2504c2490
dlsym: cuDeviceGetCount - 0x7ed2504c24d0
dlsym: cuDeviceGet - 0x7ed2504c24b0
dlsym: cuDeviceGetAttribute - 0x7ed2504c25b0
dlsym: cuDeviceGetUuid - 0x7ed2504c2510
dlsym: cuDeviceGetName - 0x7ed2504c24f0
dlsym: cuCtxCreate_v3 - 0x7ed2504ca170
dlsym: cuMemGetInfo_v2 - 0x7ed2504d5640
dlsym: cuCtxDestroy - 0x7ed250524640
calling cuInit
calling cuDriverGetVersion
raw version 0x2ef4
CUDA driver version: 12.2
calling cuDeviceGetCount
device count 1
time=2025-06-14T20:24:20.465-04:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-ac66182e-fbc0-516e-a4c6-d0231a67ad6d name="NVIDIA RTX A5000" overhead="0 B" before.total="23.7 GiB" before.free="21.9 GiB" now.total="23.7 GiB" now.free="21.9 GiB" now.used="1.8 GiB"
releasing cuda driver library
time=2025-06-14T20:24:20.465-04:00 level=INFO source=server.go:135 msg="system memory" total="125.6 GiB" free="114.3 GiB" free_swap="327.6 MiB"
time=2025-06-14T20:24:20.465-04:00 level=DEBUG source=memory.go:111 msg=evaluating library=cuda gpu_count=1 available="[21.9 GiB]"
time=2025-06-14T20:24:20.467-04:00 level=INFO source=server.go:168 msg=offload library=cuda layers.requested=-1 layers.model=63 layers.offload=63 layers.split="" memory.available="[21.9 GiB]" memory.gpu_overhead="0 B" memory.required.full="19.9 GiB" memory.required.partial="19.9 GiB" memory.required.kv="944.0 MiB" memory.required.allocations="[19.9 GiB]" memory.weights.total="16.0 GiB" memory.weights.repeating="13.4 GiB" memory.weights.nonrepeating="2.6 GiB" memory.graph.full="522.5 MiB" memory.graph.partial="1.6 GiB" projector.weights="806.2 MiB" projector.graph="1.0 GiB"
time=2025-06-14T20:24:20.467-04:00 level=DEBUG source=server.go:284 msg="compatible gpu libraries" compatible="[cuda_v12 cuda_v11]"
time=2025-06-14T20:24:20.539-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=general.alignment default=32
time=2025-06-14T20:24:20.541-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=tokenizer.ggml.eot_token_id default=106
time=2025-06-14T20:24:20.541-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}"
time=2025-06-14T20:24:20.544-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=gemma3.rope.local.freq_base default=10000
time=2025-06-14T20:24:20.544-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06
time=2025-06-14T20:24:20.544-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=gemma3.rope.freq_scale default=1
time=2025-06-14T20:24:20.544-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=gemma3.mm_tokens_per_image default=256
time=2025-06-14T20:24:20.544-04:00 level=DEBUG source=server.go:360 msg="adding gpu library" path=/usr/local/lib/ollama/cuda_v12
time=2025-06-14T20:24:20.544-04:00 level=DEBUG source=server.go:367 msg="adding gpu dependency paths" paths=[/usr/local/lib/ollama/cuda_v12]
time=2025-06-14T20:24:20.544-04:00 level=INFO source=server.go:431 msg="starting llama server" cmd="/usr/local/bin/ollama runner --ollama-engine --model /usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 --ctx-size 4096 --batch-size 512 --n-gpu-layers 63 --threads 32 --parallel 1 --port 43009"
time=2025-06-14T20:24:20.544-04:00 level=DEBUG source=server.go:432 msg=subprocess OLLAMA_LOAD_TIMEOUT=120m OLLAMA_HOST=http://localhost:11433 OLLAMA_DEBUG=1 CUDA_VISIBLE_DEVICES=GPU-ac66182e-fbc0-516e-a4c6-d0231a67ad6d OLLAMA_KEEP_ALIVE=-1 PATH=/home/rohanshad/.local/bin:/home/rohanshad/anaconda3/envs/cmr_dev/bin:/home/rohanshad/anaconda3/condabin:/home/rohanshad/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/snap/bin OLLAMA_MODELS=/usr/share/ollama/.ollama/models OLLAMA_FLASH_ATTENTION=0 OLLAMA_NUM_PARALLEL=1 OLLAMA_MAX_LOADED_MODELS=3 OLLAMA_LIBRARY_PATH=/usr/local/lib/ollama:/usr/local/lib/ollama/cuda_v12 LD_LIBRARY_PATH=/usr/local/lib/ollama/cuda_v12:/usr/local/lib/ollama/cuda_v12:/usr/local/lib/ollama:/usr/local/lib/ollama
time=2025-06-14T20:24:20.545-04:00 level=INFO source=sched.go:483 msg="loaded runners" count=1
time=2025-06-14T20:24:20.545-04:00 level=INFO source=server.go:591 msg="waiting for llama runner to start responding"
time=2025-06-14T20:24:20.545-04:00 level=INFO source=server.go:625 msg="waiting for server to become available" status="llm server not responding"
time=2025-06-14T20:24:20.561-04:00 level=INFO source=runner.go:925 msg="starting ollama engine"
time=2025-06-14T20:24:20.562-04:00 level=INFO source=runner.go:983 msg="Server listening on 127.0.0.1:43009"
time=2025-06-14T20:24:20.616-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=general.alignment default=32
time=2025-06-14T20:24:20.618-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=general.name default=""
time=2025-06-14T20:24:20.618-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=general.description default=""
time=2025-06-14T20:24:20.618-04:00 level=INFO source=ggml.go:92 msg="" architecture=gemma3 file_type=Q4_0 name="" description="" num_tensors=1247 num_key_values=40
time=2025-06-14T20:24:20.618-04:00 level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/local/lib/ollama
load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-haswell.so
time=2025-06-14T20:24:20.623-04:00 level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/local/lib/ollama/cuda_v12
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA RTX A5000, compute capability 8.6, VMM: yes
load_backend: loaded CUDA backend from /usr/local/lib/ollama/cuda_v12/libggml-cuda.so
time=2025-06-14T20:24:20.698-04:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=500,600,610,700,750,800,860,870,890,900,1200 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc)
time=2025-06-14T20:24:20.797-04:00 level=INFO source=server.go:625 msg="waiting for server to become available" status="llm server loading model"
time=2025-06-14T20:24:20.864-04:00 level=INFO source=ggml.go:351 msg="model weights" buffer=CUDA0 size="16.8 GiB"
time=2025-06-14T20:24:20.864-04:00 level=INFO source=ggml.go:351 msg="model weights" buffer=CPU size="2.6 GiB"
time=2025-06-14T20:24:20.864-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=tokenizer.ggml.eot_token_id default=106
time=2025-06-14T20:24:20.864-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}"
time=2025-06-14T20:24:20.867-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=gemma3.rope.local.freq_base default=10000
time=2025-06-14T20:24:20.867-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06
time=2025-06-14T20:24:20.867-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=gemma3.rope.freq_scale default=1
time=2025-06-14T20:24:20.867-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=gemma3.mm_tokens_per_image default=256
time=2025-06-14T20:24:21.106-04:00 level=DEBUG source=ggml.go:620 msg="compute graph" nodes=972 splits=1
time=2025-06-14T20:24:21.106-04:00 level=INFO source=ggml.go:638 msg="compute graph" backend=CUDA0 buffer_type=CUDA0 size="1.1 GiB"
time=2025-06-14T20:24:21.106-04:00 level=INFO source=ggml.go:638 msg="compute graph" backend=CPU buffer_type=CPU size="0 B"
time=2025-06-14T20:24:21.133-04:00 level=DEBUG source=ggml.go:620 msg="compute graph" nodes=2737 splits=2
time=2025-06-14T20:24:21.133-04:00 level=INFO source=ggml.go:638 msg="compute graph" backend=CUDA0 buffer_type=CUDA0 size="1.1 GiB"
time=2025-06-14T20:24:21.133-04:00 level=INFO source=ggml.go:638 msg="compute graph" backend=CPU buffer_type=CPU size="10.5 MiB"
time=2025-06-14T20:24:21.134-04:00 level=DEBUG source=runner.go:883 msg=memory allocated.InputWeights=2818572288A allocated.CPU.Graph=11010048A allocated.CUDA0.Weights="[232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 3676309632A]" allocated.CUDA0.Cache="[12582912A 12582912A 12582912A 12582912A 12582912A 33554432A 12582912A 12582912A 12582912A 12582912A 12582912A 33554432A 12582912A 12582912A 12582912A 12582912A 12582912A 33554432A 12582912A 12582912A 12582912A 12582912A 12582912A 33554432A 12582912A 12582912A 12582912A 12582912A 12582912A 33554432A 12582912A 12582912A 12582912A 12582912A 12582912A 33554432A 12582912A 12582912A 12582912A 12582912A 12582912A 33554432A 12582912A 12582912A 12582912A 12582912A 12582912A 33554432A 12582912A 12582912A 12582912A 12582912A 12582912A 33554432A 12582912A 12582912A 12582912A 12582912A 12582912A 33554432A 12582912A 12582912A 0U]" allocated.CUDA0.Graph=1190150144A
time=2025-06-14T20:24:21.299-04:00 level=DEBUG source=server.go:636 msg="model load progress 0.06"
time=2025-06-14T20:24:21.550-04:00 level=DEBUG source=server.go:636 msg="model load progress 0.15"
time=2025-06-14T20:24:21.801-04:00 level=DEBUG source=server.go:636 msg="model load progress 0.22"
time=2025-06-14T20:24:22.052-04:00 level=DEBUG source=server.go:636 msg="model load progress 0.30"
time=2025-06-14T20:24:22.303-04:00 level=DEBUG source=server.go:636 msg="model load progress 0.38"
time=2025-06-14T20:24:22.554-04:00 level=DEBUG source=server.go:636 msg="model load progress 0.46"
time=2025-06-14T20:24:22.805-04:00 level=DEBUG source=server.go:636 msg="model load progress 0.54"
time=2025-06-14T20:24:23.056-04:00 level=DEBUG source=server.go:636 msg="model load progress 0.61"
time=2025-06-14T20:24:23.307-04:00 level=DEBUG source=server.go:636 msg="model load progress 0.69"
time=2025-06-14T20:24:23.558-04:00 level=DEBUG source=server.go:636 msg="model load progress 0.77"
time=2025-06-14T20:24:23.809-04:00 level=DEBUG source=server.go:636 msg="model load progress 0.85"
time=2025-06-14T20:24:24.060-04:00 level=DEBUG source=server.go:636 msg="model load progress 0.87"
time=2025-06-14T20:24:24.311-04:00 level=DEBUG source=server.go:636 msg="model load progress 0.89"
time=2025-06-14T20:24:24.561-04:00 level=DEBUG source=server.go:636 msg="model load progress 0.91"
time=2025-06-14T20:24:24.812-04:00 level=DEBUG source=server.go:636 msg="model load progress 0.93"
time=2025-06-14T20:24:25.063-04:00 level=DEBUG source=server.go:636 msg="model load progress 0.95"
time=2025-06-14T20:24:25.313-04:00 level=DEBUG source=server.go:636 msg="model load progress 0.97"
time=2025-06-14T20:24:25.564-04:00 level=DEBUG source=server.go:636 msg="model load progress 0.99"
time=2025-06-14T20:24:25.815-04:00 level=INFO source=server.go:630 msg="llama runner started in 5.27 seconds"
time=2025-06-14T20:24:25.815-04:00 level=DEBUG source=sched.go:495 msg="finished setting up" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096
time=2025-06-14T20:24:25.815-04:00 level=DEBUG source=server.go:729 msg="completion request" images=0 prompt=3158 format="\"json\""
time=2025-06-14T20:24:26.077-04:00 level=DEBUG source=vocabulary.go:52 msg="adding bos token to prompt" id=[2]
time=2025-06-14T20:24:26.077-04:00 level=DEBUG source=cache.go:136 msg="loading cache slot" id=0 cache=0 prompt=791 used=0 remaining=791
[GIN] 2025/06/14 - 20:24:41 | 200 | 21.398847101s |       127.0.0.1 | POST     "/api/chat"
time=2025-06-14T20:24:41.213-04:00 level=DEBUG source=sched.go:503 msg="context for request finished"
time=2025-06-14T20:24:41.213-04:00 level=DEBUG source=sched.go:343 msg="runner with non-zero duration has gone idle, adding timer" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 duration=2562047h47m16.854775807s
time=2025-06-14T20:24:41.213-04:00 level=DEBUG source=sched.go:361 msg="after processing request finished event" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 refCount=0
time=2025-06-14T20:24:41.270-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=general.alignment default=32
time=2025-06-14T20:24:41.272-04:00 level=DEBUG source=sched.go:615 msg="evaluating already loaded" model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87
time=2025-06-14T20:24:41.272-04:00 level=DEBUG source=server.go:729 msg="completion request" images=0 prompt=2979 format="\"json\""
time=2025-06-14T20:24:41.513-04:00 level=DEBUG source=vocabulary.go:52 msg="adding bos token to prompt" id=[2]
time=2025-06-14T20:24:41.513-04:00 level=DEBUG source=cache.go:136 msg="loading cache slot" id=0 cache=1170 prompt=759 used=0 remaining=759
[GIN] 2025/06/14 - 20:24:56 | 200 | 15.389986769s |       127.0.0.1 | POST     "/api/chat"
time=2025-06-14T20:24:56.606-04:00 level=DEBUG source=sched.go:434 msg="context for request finished" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096
time=2025-06-14T20:24:56.606-04:00 level=DEBUG source=sched.go:343 msg="runner with non-zero duration has gone idle, adding timer" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 duration=2562047h47m16.854775807s
time=2025-06-14T20:24:56.606-04:00 level=DEBUG source=sched.go:361 msg="after processing request finished event" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 refCount=0
time=2025-06-14T20:24:56.650-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=general.alignment default=32
time=2025-06-14T20:24:56.652-04:00 level=DEBUG source=sched.go:615 msg="evaluating already loaded" model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87
time=2025-06-14T20:24:56.653-04:00 level=DEBUG source=server.go:729 msg="completion request" images=0 prompt=3331 format="\"json\""
time=2025-06-14T20:24:56.823-04:00 level=DEBUG source=vocabulary.go:52 msg="adding bos token to prompt" id=[2]
time=2025-06-14T20:24:56.824-04:00 level=DEBUG source=cache.go:136 msg="loading cache slot" id=0 cache=1138 prompt=870 used=0 remaining=870
[GIN] 2025/06/14 - 20:25:13 | 200 | 16.622287091s |       127.0.0.1 | POST     "/api/chat"
time=2025-06-14T20:25:13.232-04:00 level=DEBUG source=sched.go:434 msg="context for request finished" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096
time=2025-06-14T20:25:13.232-04:00 level=DEBUG source=sched.go:343 msg="runner with non-zero duration has gone idle, adding timer" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 duration=2562047h47m16.854775807s
time=2025-06-14T20:25:13.232-04:00 level=DEBUG source=sched.go:361 msg="after processing request finished event" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 refCount=0
time=2025-06-14T20:25:13.292-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=general.alignment default=32
time=2025-06-14T20:25:13.294-04:00 level=DEBUG source=sched.go:615 msg="evaluating already loaded" model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87
time=2025-06-14T20:25:13.294-04:00 level=DEBUG source=server.go:729 msg="completion request" images=0 prompt=4148 format="\"json\""
time=2025-06-14T20:25:13.527-04:00 level=DEBUG source=vocabulary.go:52 msg="adding bos token to prompt" id=[2]
time=2025-06-14T20:25:13.527-04:00 level=DEBUG source=cache.go:136 msg="loading cache slot" id=0 cache=1285 prompt=931 used=0 remaining=931
[GIN] 2025/06/14 - 20:25:30 | 200 |  16.87517067s |       127.0.0.1 | POST     "/api/chat"
time=2025-06-14T20:25:30.113-04:00 level=DEBUG source=sched.go:434 msg="context for request finished" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096
time=2025-06-14T20:25:30.113-04:00 level=DEBUG source=sched.go:343 msg="runner with non-zero duration has gone idle, adding timer" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 duration=2562047h47m16.854775807s
time=2025-06-14T20:25:30.113-04:00 level=DEBUG source=sched.go:361 msg="after processing request finished event" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 refCount=0
time=2025-06-14T20:25:30.172-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=general.alignment default=32
time=2025-06-14T20:25:30.174-04:00 level=DEBUG source=sched.go:615 msg="evaluating already loaded" model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87
time=2025-06-14T20:25:30.174-04:00 level=DEBUG source=server.go:729 msg="completion request" images=0 prompt=2802 format="\"json\""
time=2025-06-14T20:25:30.407-04:00 level=DEBUG source=vocabulary.go:52 msg="adding bos token to prompt" id=[2]
time=2025-06-14T20:25:30.407-04:00 level=DEBUG source=cache.go:136 msg="loading cache slot" id=0 cache=1346 prompt=721 used=0 remaining=721
[GIN] 2025/06/14 - 20:25:43 | 200 | 13.766561764s |       127.0.0.1 | POST     "/api/chat"
time=2025-06-14T20:25:43.884-04:00 level=DEBUG source=sched.go:434 msg="context for request finished" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096
time=2025-06-14T20:25:43.884-04:00 level=DEBUG source=sched.go:343 msg="runner with non-zero duration has gone idle, adding timer" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 duration=2562047h47m16.854775807s
time=2025-06-14T20:25:43.884-04:00 level=DEBUG source=sched.go:361 msg="after processing request finished event" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 refCount=0
time=2025-06-14T20:25:43.940-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=general.alignment default=32
time=2025-06-14T20:25:43.943-04:00 level=DEBUG source=sched.go:615 msg="evaluating already loaded" model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87
time=2025-06-14T20:25:43.944-04:00 level=DEBUG source=server.go:729 msg="completion request" images=0 prompt=2965 format="\"json\""
time=2025-06-14T20:25:44.179-04:00 level=DEBUG source=vocabulary.go:52 msg="adding bos token to prompt" id=[2]
time=2025-06-14T20:25:44.180-04:00 level=DEBUG source=cache.go:136 msg="loading cache slot" id=0 cache=1061 prompt=753 used=0 remaining=753
[GIN] 2025/06/14 - 20:26:00 | 200 | 16.590631273s |       127.0.0.1 | POST     "/api/chat"
time=2025-06-14T20:26:00.479-04:00 level=DEBUG source=sched.go:434 msg="context for request finished" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096
time=2025-06-14T20:26:00.479-04:00 level=DEBUG source=sched.go:343 msg="runner with non-zero duration has gone idle, adding timer" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 duration=2562047h47m16.854775807s
time=2025-06-14T20:26:00.479-04:00 level=DEBUG source=sched.go:361 msg="after processing request finished event" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 refCount=0
time=2025-06-14T20:26:00.534-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=general.alignment default=32
time=2025-06-14T20:26:00.536-04:00 level=DEBUG source=sched.go:615 msg="evaluating already loaded" model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87
time=2025-06-14T20:26:00.536-04:00 level=DEBUG source=server.go:729 msg="completion request" images=0 prompt=3681 format="\"json\""
time=2025-06-14T20:26:00.756-04:00 level=DEBUG source=vocabulary.go:52 msg="adding bos token to prompt" id=[2]
time=2025-06-14T20:26:00.756-04:00 level=DEBUG source=cache.go:136 msg="loading cache slot" id=0 cache=1168 prompt=902 used=0 remaining=902
[GIN] 2025/06/14 - 20:26:17 | 200 | 16.851443096s |       127.0.0.1 | POST     "/api/chat"
time=2025-06-14T20:26:17.335-04:00 level=DEBUG source=sched.go:434 msg="context for request finished" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096
time=2025-06-14T20:26:17.335-04:00 level=DEBUG source=sched.go:343 msg="runner with non-zero duration has gone idle, adding timer" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 duration=2562047h47m16.854775807s
time=2025-06-14T20:26:17.335-04:00 level=DEBUG source=sched.go:361 msg="after processing request finished event" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 refCount=0
time=2025-06-14T20:26:17.391-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=general.alignment default=32
time=2025-06-14T20:26:17.393-04:00 level=DEBUG source=sched.go:615 msg="evaluating already loaded" model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87
time=2025-06-14T20:26:17.394-04:00 level=DEBUG source=server.go:729 msg="completion request" images=0 prompt=4189 format="\"json\""
time=2025-06-14T20:26:17.617-04:00 level=DEBUG source=vocabulary.go:52 msg="adding bos token to prompt" id=[2]
time=2025-06-14T20:26:17.617-04:00 level=DEBUG source=cache.go:136 msg="loading cache slot" id=0 cache=1317 prompt=1121 used=0 remaining=1121
[GIN] 2025/06/14 - 20:26:33 | 200 | 15.704443348s |       127.0.0.1 | POST     "/api/chat"
time=2025-06-14T20:26:33.045-04:00 level=DEBUG source=sched.go:434 msg="context for request finished" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096
time=2025-06-14T20:26:33.045-04:00 level=DEBUG source=sched.go:343 msg="runner with non-zero duration has gone idle, adding timer" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 duration=2562047h47m16.854775807s
time=2025-06-14T20:26:33.045-04:00 level=DEBUG source=sched.go:361 msg="after processing request finished event" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 refCount=0
time=2025-06-14T20:26:33.104-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=general.alignment default=32
time=2025-06-14T20:26:33.106-04:00 level=DEBUG source=sched.go:615 msg="evaluating already loaded" model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87
time=2025-06-14T20:26:33.106-04:00 level=DEBUG source=server.go:729 msg="completion request" images=0 prompt=3249 format="\"json\""
time=2025-06-14T20:26:33.319-04:00 level=DEBUG source=vocabulary.go:52 msg="adding bos token to prompt" id=[2]
time=2025-06-14T20:26:33.319-04:00 level=DEBUG source=cache.go:136 msg="loading cache slot" id=0 cache=1500 prompt=832 used=0 remaining=832
[GIN] 2025/06/14 - 20:26:48 | 200 | 15.555706737s |       127.0.0.1 | POST     "/api/chat"
time=2025-06-14T20:26:48.605-04:00 level=DEBUG source=sched.go:434 msg="context for request finished" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096
time=2025-06-14T20:26:48.605-04:00 level=DEBUG source=sched.go:343 msg="runner with non-zero duration has gone idle, adding timer" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 duration=2562047h47m16.854775807s
time=2025-06-14T20:26:48.605-04:00 level=DEBUG source=sched.go:361 msg="after processing request finished event" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 refCount=0
time=2025-06-14T20:26:48.642-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=general.alignment default=32
time=2025-06-14T20:26:48.644-04:00 level=DEBUG source=sched.go:615 msg="evaluating already loaded" model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87
time=2025-06-14T20:26:48.644-04:00 level=DEBUG source=server.go:729 msg="completion request" images=0 prompt=6622 format="\"json\""
time=2025-06-14T20:26:48.878-04:00 level=DEBUG source=vocabulary.go:52 msg="adding bos token to prompt" id=[2]
time=2025-06-14T20:26:48.878-04:00 level=DEBUG source=cache.go:136 msg="loading cache slot" id=0 cache=1211 prompt=1567 used=0 remaining=1567
[GIN] 2025/06/14 - 20:27:05 | 200 | 16.591033158s |       127.0.0.1 | POST     "/api/chat"
time=2025-06-14T20:27:05.200-04:00 level=DEBUG source=sched.go:434 msg="context for request finished" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096
time=2025-06-14T20:27:05.201-04:00 level=DEBUG source=sched.go:343 msg="runner with non-zero duration has gone idle, adding timer" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 duration=2562047h47m16.854775807s
time=2025-06-14T20:27:05.201-04:00 level=DEBUG source=sched.go:361 msg="after processing request finished event" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 refCount=0
time=2025-06-14T20:27:05.255-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=general.alignment default=32
time=2025-06-14T20:27:05.257-04:00 level=DEBUG source=sched.go:615 msg="evaluating already loaded" model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87
time=2025-06-14T20:27:05.257-04:00 level=DEBUG source=server.go:729 msg="completion request" images=0 prompt=3115 format="\"json\""
time=2025-06-14T20:27:05.464-04:00 level=DEBUG source=vocabulary.go:52 msg="adding bos token to prompt" id=[2]
time=2025-06-14T20:27:05.464-04:00 level=DEBUG source=cache.go:136 msg="loading cache slot" id=0 cache=1946 prompt=780 used=0 remaining=780
[GIN] 2025/06/14 - 20:27:20 | 200 | 15.367599033s |       127.0.0.1 | POST     "/api/chat"
time=2025-06-14T20:27:20.573-04:00 level=DEBUG source=sched.go:434 msg="context for request finished" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096
time=2025-06-14T20:27:20.573-04:00 level=DEBUG source=sched.go:343 msg="runner with non-zero duration has gone idle, adding timer" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 duration=2562047h47m16.854775807s
time=2025-06-14T20:27:20.573-04:00 level=DEBUG source=sched.go:361 msg="after processing request finished event" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 refCount=0
time=2025-06-14T20:27:20.631-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=general.alignment default=32
time=2025-06-14T20:27:20.633-04:00 level=DEBUG source=sched.go:615 msg="evaluating already loaded" model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87
time=2025-06-14T20:27:20.633-04:00 level=DEBUG source=server.go:729 msg="completion request" images=0 prompt=3543 format="\"json\""
time=2025-06-14T20:27:20.876-04:00 level=DEBUG source=vocabulary.go:52 msg="adding bos token to prompt" id=[2]
time=2025-06-14T20:27:20.876-04:00 level=DEBUG source=cache.go:136 msg="loading cache slot" id=0 cache=1159 prompt=815 used=0 remaining=815
[GIN] 2025/06/14 - 20:27:35 | 200 | 15.308709032s |       127.0.0.1 | POST     "/api/chat"
time=2025-06-14T20:27:35.886-04:00 level=DEBUG source=sched.go:434 msg="context for request finished" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096
time=2025-06-14T20:27:35.886-04:00 level=DEBUG source=sched.go:343 msg="runner with non-zero duration has gone idle, adding timer" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 duration=2562047h47m16.854775807s
time=2025-06-14T20:27:35.886-04:00 level=DEBUG source=sched.go:361 msg="after processing request finished event" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 refCount=0
time=2025-06-14T20:27:35.921-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=general.alignment default=32
time=2025-06-14T20:27:35.922-04:00 level=DEBUG source=sched.go:615 msg="evaluating already loaded" model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87
time=2025-06-14T20:27:35.923-04:00 level=DEBUG source=server.go:729 msg="completion request" images=0 prompt=4373 format="\"json\""
time=2025-06-14T20:27:36.118-04:00 level=DEBUG source=vocabulary.go:52 msg="adding bos token to prompt" id=[2]
time=2025-06-14T20:27:36.118-04:00 level=DEBUG source=cache.go:136 msg="loading cache slot" id=0 cache=1194 prompt=1036 used=0 remaining=1036
[GIN] 2025/06/14 - 20:27:51 | 200 | 15.496595172s |       127.0.0.1 | POST     "/api/chat"
time=2025-06-14T20:27:51.388-04:00 level=DEBUG source=sched.go:434 msg="context for request finished" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096
time=2025-06-14T20:27:51.388-04:00 level=DEBUG source=sched.go:343 msg="runner with non-zero duration has gone idle, adding timer" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 duration=2562047h47m16.854775807s
time=2025-06-14T20:27:51.388-04:00 level=DEBUG source=sched.go:361 msg="after processing request finished event" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 refCount=0
time=2025-06-14T20:27:51.561-04:00 level=DEBUG source=sched.go:872 msg="shutting down runner" model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87
time=2025-06-14T20:27:51.561-04:00 level=DEBUG source=server.go:1023 msg="stopping llama server" pid=3566569
time=2025-06-14T20:27:51.561-04:00 level=DEBUG source=server.go:1029 msg="waiting for llama server to exit" pid=3566569
time=2025-06-14T20:27:51.561-04:00 level=DEBUG source=sched.go:122 msg="shutting down scheduler pending loop"
time=2025-06-14T20:27:51.561-04:00 level=DEBUG source=sched.go:322 msg="shutting down scheduler completed loop"
time=2025-06-14T20:27:52.053-04:00 level=DEBUG source=server.go:1033 msg="llama server stopped" pid=3566569

<!-- gh-comment-id:2973393772 --> @rohanshad commented on GitHub (Jun 15, 2025): Will send and prep a reproducible script soon, but here's server log with `OLLAMA_DEBUG=1`: ``` time=2025-06-14T20:24:09.722-04:00 level=INFO source=routes.go:1234 msg="server config" env="map[CUDA_VISIBLE_DEVICES:1 GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:DEBUG OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://localhost:11433 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:2562047h47m16.854775807s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:2h0m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/usr/share/ollama/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]" time=2025-06-14T20:24:09.723-04:00 level=INFO source=images.go:479 msg="total blobs: 21" time=2025-06-14T20:24:09.723-04:00 level=INFO source=images.go:486 msg="total unused blobs removed: 0" time=2025-06-14T20:24:09.723-04:00 level=INFO source=routes.go:1287 msg="Listening on 127.0.0.1:11433 (version 0.9.0)" time=2025-06-14T20:24:09.724-04:00 level=DEBUG source=sched.go:108 msg="starting llm scheduler" time=2025-06-14T20:24:09.724-04:00 level=INFO source=gpu.go:217 msg="looking for compatible GPUs" time=2025-06-14T20:24:09.728-04:00 level=DEBUG source=gpu.go:98 msg="searching for GPU discovery libraries for NVIDIA" time=2025-06-14T20:24:09.728-04:00 level=DEBUG source=gpu.go:501 msg="Searching for GPU library" name=libcuda.so* time=2025-06-14T20:24:09.728-04:00 level=DEBUG source=gpu.go:525 msg="gpu library search" globs="[/usr/local/lib/ollama/libcuda.so* /home/rohanshad/Hiesinger Lab Dropbox/MRI ML/cmr_core/utils/libcuda.so* /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]" time=2025-06-14T20:24:09.733-04:00 level=DEBUG source=gpu.go:558 msg="discovered GPU libraries" paths="[/usr/lib/i386-linux-gnu/libcuda.so.535.230.02 /usr/lib/x86_64-linux-gnu/libcuda.so.535.230.02]" initializing /usr/lib/i386-linux-gnu/libcuda.so.535.230.02 library /usr/lib/i386-linux-gnu/libcuda.so.535.230.02 load err: /usr/lib/i386-linux-gnu/libcuda.so.535.230.02: wrong ELF class: ELFCLASS32 time=2025-06-14T20:24:09.733-04:00 level=DEBUG source=gpu.go:609 msg="skipping 32bit library" library=/usr/lib/i386-linux-gnu/libcuda.so.535.230.02 initializing /usr/lib/x86_64-linux-gnu/libcuda.so.535.230.02 dlsym: cuInit - 0x7ed2504c2470 dlsym: cuDriverGetVersion - 0x7ed2504c2490 dlsym: cuDeviceGetCount - 0x7ed2504c24d0 dlsym: cuDeviceGet - 0x7ed2504c24b0 dlsym: cuDeviceGetAttribute - 0x7ed2504c25b0 dlsym: cuDeviceGetUuid - 0x7ed2504c2510 dlsym: cuDeviceGetName - 0x7ed2504c24f0 dlsym: cuCtxCreate_v3 - 0x7ed2504ca170 dlsym: cuMemGetInfo_v2 - 0x7ed2504d5640 dlsym: cuCtxDestroy - 0x7ed250524640 calling cuInit calling cuDriverGetVersion raw version 0x2ef4 CUDA driver version: 12.2 calling cuDeviceGetCount device count 1 time=2025-06-14T20:24:09.740-04:00 level=DEBUG source=gpu.go:125 msg="detected GPUs" count=1 library=/usr/lib/x86_64-linux-gnu/libcuda.so.535.230.02 [GPU-ac66182e-fbc0-516e-a4c6-d0231a67ad6d] CUDA totalMem 24245mb [GPU-ac66182e-fbc0-516e-a4c6-d0231a67ad6d] CUDA freeMem 22415mb [GPU-ac66182e-fbc0-516e-a4c6-d0231a67ad6d] Compute Capability 8.6 time=2025-06-14T20:24:09.852-04:00 level=DEBUG source=amd_linux.go:419 msg="amdgpu driver not detected /sys/module/amdgpu" releasing cuda driver library time=2025-06-14T20:24:09.852-04:00 level=INFO source=types.go:130 msg="inference compute" id=GPU-ac66182e-fbc0-516e-a4c6-d0231a67ad6d library=cuda variant=v12 compute=8.6 driver=12.2 name="NVIDIA RTX A5000" total="23.7 GiB" available="21.9 GiB" time=2025-06-14T20:24:19.847-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=general.alignment default=32 time=2025-06-14T20:24:19.848-04:00 level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="125.6 GiB" before.free="114.4 GiB" before.free_swap="327.6 MiB" now.total="125.6 GiB" now.free="114.4 GiB" now.free_swap="327.6 MiB" initializing /usr/lib/x86_64-linux-gnu/libcuda.so.535.230.02 dlsym: cuInit - 0x7ed2504c2470 dlsym: cuDriverGetVersion - 0x7ed2504c2490 dlsym: cuDeviceGetCount - 0x7ed2504c24d0 dlsym: cuDeviceGet - 0x7ed2504c24b0 dlsym: cuDeviceGetAttribute - 0x7ed2504c25b0 dlsym: cuDeviceGetUuid - 0x7ed2504c2510 dlsym: cuDeviceGetName - 0x7ed2504c24f0 dlsym: cuCtxCreate_v3 - 0x7ed2504ca170 dlsym: cuMemGetInfo_v2 - 0x7ed2504d5640 dlsym: cuCtxDestroy - 0x7ed250524640 calling cuInit calling cuDriverGetVersion raw version 0x2ef4 CUDA driver version: 12.2 calling cuDeviceGetCount device count 1 time=2025-06-14T20:24:20.129-04:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-ac66182e-fbc0-516e-a4c6-d0231a67ad6d name="NVIDIA RTX A5000" overhead="0 B" before.total="23.7 GiB" before.free="21.9 GiB" now.total="23.7 GiB" now.free="21.9 GiB" now.used="1.8 GiB" releasing cuda driver library time=2025-06-14T20:24:20.129-04:00 level=DEBUG source=sched.go:185 msg="updating default concurrency" OLLAMA_MAX_LOADED_MODELS=3 gpu_count=1 time=2025-06-14T20:24:20.179-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=general.alignment default=32 time=2025-06-14T20:24:20.211-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=general.alignment default=32 time=2025-06-14T20:24:20.213-04:00 level=DEBUG source=sched.go:228 msg="loading first model" model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 time=2025-06-14T20:24:20.213-04:00 level=DEBUG source=memory.go:111 msg=evaluating library=cuda gpu_count=1 available="[21.9 GiB]" time=2025-06-14T20:24:20.217-04:00 level=INFO source=sched.go:788 msg="new model will fit in available VRAM in single GPU, loading" model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 gpu=GPU-ac66182e-fbc0-516e-a4c6-d0231a67ad6d parallel=1 available=23504683008 required="19.9 GiB" time=2025-06-14T20:24:20.218-04:00 level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="125.6 GiB" before.free="114.4 GiB" before.free_swap="327.6 MiB" now.total="125.6 GiB" now.free="114.3 GiB" now.free_swap="327.6 MiB" initializing /usr/lib/x86_64-linux-gnu/libcuda.so.535.230.02 dlsym: cuInit - 0x7ed2504c2470 dlsym: cuDriverGetVersion - 0x7ed2504c2490 dlsym: cuDeviceGetCount - 0x7ed2504c24d0 dlsym: cuDeviceGet - 0x7ed2504c24b0 dlsym: cuDeviceGetAttribute - 0x7ed2504c25b0 dlsym: cuDeviceGetUuid - 0x7ed2504c2510 dlsym: cuDeviceGetName - 0x7ed2504c24f0 dlsym: cuCtxCreate_v3 - 0x7ed2504ca170 dlsym: cuMemGetInfo_v2 - 0x7ed2504d5640 dlsym: cuCtxDestroy - 0x7ed250524640 calling cuInit calling cuDriverGetVersion raw version 0x2ef4 CUDA driver version: 12.2 calling cuDeviceGetCount device count 1 time=2025-06-14T20:24:20.465-04:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-ac66182e-fbc0-516e-a4c6-d0231a67ad6d name="NVIDIA RTX A5000" overhead="0 B" before.total="23.7 GiB" before.free="21.9 GiB" now.total="23.7 GiB" now.free="21.9 GiB" now.used="1.8 GiB" releasing cuda driver library time=2025-06-14T20:24:20.465-04:00 level=INFO source=server.go:135 msg="system memory" total="125.6 GiB" free="114.3 GiB" free_swap="327.6 MiB" time=2025-06-14T20:24:20.465-04:00 level=DEBUG source=memory.go:111 msg=evaluating library=cuda gpu_count=1 available="[21.9 GiB]" time=2025-06-14T20:24:20.467-04:00 level=INFO source=server.go:168 msg=offload library=cuda layers.requested=-1 layers.model=63 layers.offload=63 layers.split="" memory.available="[21.9 GiB]" memory.gpu_overhead="0 B" memory.required.full="19.9 GiB" memory.required.partial="19.9 GiB" memory.required.kv="944.0 MiB" memory.required.allocations="[19.9 GiB]" memory.weights.total="16.0 GiB" memory.weights.repeating="13.4 GiB" memory.weights.nonrepeating="2.6 GiB" memory.graph.full="522.5 MiB" memory.graph.partial="1.6 GiB" projector.weights="806.2 MiB" projector.graph="1.0 GiB" time=2025-06-14T20:24:20.467-04:00 level=DEBUG source=server.go:284 msg="compatible gpu libraries" compatible="[cuda_v12 cuda_v11]" time=2025-06-14T20:24:20.539-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=general.alignment default=32 time=2025-06-14T20:24:20.541-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=tokenizer.ggml.eot_token_id default=106 time=2025-06-14T20:24:20.541-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}" time=2025-06-14T20:24:20.544-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=gemma3.rope.local.freq_base default=10000 time=2025-06-14T20:24:20.544-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06 time=2025-06-14T20:24:20.544-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=gemma3.rope.freq_scale default=1 time=2025-06-14T20:24:20.544-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=gemma3.mm_tokens_per_image default=256 time=2025-06-14T20:24:20.544-04:00 level=DEBUG source=server.go:360 msg="adding gpu library" path=/usr/local/lib/ollama/cuda_v12 time=2025-06-14T20:24:20.544-04:00 level=DEBUG source=server.go:367 msg="adding gpu dependency paths" paths=[/usr/local/lib/ollama/cuda_v12] time=2025-06-14T20:24:20.544-04:00 level=INFO source=server.go:431 msg="starting llama server" cmd="/usr/local/bin/ollama runner --ollama-engine --model /usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 --ctx-size 4096 --batch-size 512 --n-gpu-layers 63 --threads 32 --parallel 1 --port 43009" time=2025-06-14T20:24:20.544-04:00 level=DEBUG source=server.go:432 msg=subprocess OLLAMA_LOAD_TIMEOUT=120m OLLAMA_HOST=http://localhost:11433 OLLAMA_DEBUG=1 CUDA_VISIBLE_DEVICES=GPU-ac66182e-fbc0-516e-a4c6-d0231a67ad6d OLLAMA_KEEP_ALIVE=-1 PATH=/home/rohanshad/.local/bin:/home/rohanshad/anaconda3/envs/cmr_dev/bin:/home/rohanshad/anaconda3/condabin:/home/rohanshad/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/snap/bin OLLAMA_MODELS=/usr/share/ollama/.ollama/models OLLAMA_FLASH_ATTENTION=0 OLLAMA_NUM_PARALLEL=1 OLLAMA_MAX_LOADED_MODELS=3 OLLAMA_LIBRARY_PATH=/usr/local/lib/ollama:/usr/local/lib/ollama/cuda_v12 LD_LIBRARY_PATH=/usr/local/lib/ollama/cuda_v12:/usr/local/lib/ollama/cuda_v12:/usr/local/lib/ollama:/usr/local/lib/ollama time=2025-06-14T20:24:20.545-04:00 level=INFO source=sched.go:483 msg="loaded runners" count=1 time=2025-06-14T20:24:20.545-04:00 level=INFO source=server.go:591 msg="waiting for llama runner to start responding" time=2025-06-14T20:24:20.545-04:00 level=INFO source=server.go:625 msg="waiting for server to become available" status="llm server not responding" time=2025-06-14T20:24:20.561-04:00 level=INFO source=runner.go:925 msg="starting ollama engine" time=2025-06-14T20:24:20.562-04:00 level=INFO source=runner.go:983 msg="Server listening on 127.0.0.1:43009" time=2025-06-14T20:24:20.616-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=general.alignment default=32 time=2025-06-14T20:24:20.618-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=general.name default="" time=2025-06-14T20:24:20.618-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=general.description default="" time=2025-06-14T20:24:20.618-04:00 level=INFO source=ggml.go:92 msg="" architecture=gemma3 file_type=Q4_0 name="" description="" num_tensors=1247 num_key_values=40 time=2025-06-14T20:24:20.618-04:00 level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/local/lib/ollama load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-haswell.so time=2025-06-14T20:24:20.623-04:00 level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/local/lib/ollama/cuda_v12 ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 CUDA devices: Device 0: NVIDIA RTX A5000, compute capability 8.6, VMM: yes load_backend: loaded CUDA backend from /usr/local/lib/ollama/cuda_v12/libggml-cuda.so time=2025-06-14T20:24:20.698-04:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=500,600,610,700,750,800,860,870,890,900,1200 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc) time=2025-06-14T20:24:20.797-04:00 level=INFO source=server.go:625 msg="waiting for server to become available" status="llm server loading model" time=2025-06-14T20:24:20.864-04:00 level=INFO source=ggml.go:351 msg="model weights" buffer=CUDA0 size="16.8 GiB" time=2025-06-14T20:24:20.864-04:00 level=INFO source=ggml.go:351 msg="model weights" buffer=CPU size="2.6 GiB" time=2025-06-14T20:24:20.864-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=tokenizer.ggml.eot_token_id default=106 time=2025-06-14T20:24:20.864-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}" time=2025-06-14T20:24:20.867-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=gemma3.rope.local.freq_base default=10000 time=2025-06-14T20:24:20.867-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06 time=2025-06-14T20:24:20.867-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=gemma3.rope.freq_scale default=1 time=2025-06-14T20:24:20.867-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=gemma3.mm_tokens_per_image default=256 time=2025-06-14T20:24:21.106-04:00 level=DEBUG source=ggml.go:620 msg="compute graph" nodes=972 splits=1 time=2025-06-14T20:24:21.106-04:00 level=INFO source=ggml.go:638 msg="compute graph" backend=CUDA0 buffer_type=CUDA0 size="1.1 GiB" time=2025-06-14T20:24:21.106-04:00 level=INFO source=ggml.go:638 msg="compute graph" backend=CPU buffer_type=CPU size="0 B" time=2025-06-14T20:24:21.133-04:00 level=DEBUG source=ggml.go:620 msg="compute graph" nodes=2737 splits=2 time=2025-06-14T20:24:21.133-04:00 level=INFO source=ggml.go:638 msg="compute graph" backend=CUDA0 buffer_type=CUDA0 size="1.1 GiB" time=2025-06-14T20:24:21.133-04:00 level=INFO source=ggml.go:638 msg="compute graph" backend=CPU buffer_type=CPU size="10.5 MiB" time=2025-06-14T20:24:21.134-04:00 level=DEBUG source=runner.go:883 msg=memory allocated.InputWeights=2818572288A allocated.CPU.Graph=11010048A allocated.CUDA0.Weights="[232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 3676309632A]" allocated.CUDA0.Cache="[12582912A 12582912A 12582912A 12582912A 12582912A 33554432A 12582912A 12582912A 12582912A 12582912A 12582912A 33554432A 12582912A 12582912A 12582912A 12582912A 12582912A 33554432A 12582912A 12582912A 12582912A 12582912A 12582912A 33554432A 12582912A 12582912A 12582912A 12582912A 12582912A 33554432A 12582912A 12582912A 12582912A 12582912A 12582912A 33554432A 12582912A 12582912A 12582912A 12582912A 12582912A 33554432A 12582912A 12582912A 12582912A 12582912A 12582912A 33554432A 12582912A 12582912A 12582912A 12582912A 12582912A 33554432A 12582912A 12582912A 12582912A 12582912A 12582912A 33554432A 12582912A 12582912A 0U]" allocated.CUDA0.Graph=1190150144A time=2025-06-14T20:24:21.299-04:00 level=DEBUG source=server.go:636 msg="model load progress 0.06" time=2025-06-14T20:24:21.550-04:00 level=DEBUG source=server.go:636 msg="model load progress 0.15" time=2025-06-14T20:24:21.801-04:00 level=DEBUG source=server.go:636 msg="model load progress 0.22" time=2025-06-14T20:24:22.052-04:00 level=DEBUG source=server.go:636 msg="model load progress 0.30" time=2025-06-14T20:24:22.303-04:00 level=DEBUG source=server.go:636 msg="model load progress 0.38" time=2025-06-14T20:24:22.554-04:00 level=DEBUG source=server.go:636 msg="model load progress 0.46" time=2025-06-14T20:24:22.805-04:00 level=DEBUG source=server.go:636 msg="model load progress 0.54" time=2025-06-14T20:24:23.056-04:00 level=DEBUG source=server.go:636 msg="model load progress 0.61" time=2025-06-14T20:24:23.307-04:00 level=DEBUG source=server.go:636 msg="model load progress 0.69" time=2025-06-14T20:24:23.558-04:00 level=DEBUG source=server.go:636 msg="model load progress 0.77" time=2025-06-14T20:24:23.809-04:00 level=DEBUG source=server.go:636 msg="model load progress 0.85" time=2025-06-14T20:24:24.060-04:00 level=DEBUG source=server.go:636 msg="model load progress 0.87" time=2025-06-14T20:24:24.311-04:00 level=DEBUG source=server.go:636 msg="model load progress 0.89" time=2025-06-14T20:24:24.561-04:00 level=DEBUG source=server.go:636 msg="model load progress 0.91" time=2025-06-14T20:24:24.812-04:00 level=DEBUG source=server.go:636 msg="model load progress 0.93" time=2025-06-14T20:24:25.063-04:00 level=DEBUG source=server.go:636 msg="model load progress 0.95" time=2025-06-14T20:24:25.313-04:00 level=DEBUG source=server.go:636 msg="model load progress 0.97" time=2025-06-14T20:24:25.564-04:00 level=DEBUG source=server.go:636 msg="model load progress 0.99" time=2025-06-14T20:24:25.815-04:00 level=INFO source=server.go:630 msg="llama runner started in 5.27 seconds" time=2025-06-14T20:24:25.815-04:00 level=DEBUG source=sched.go:495 msg="finished setting up" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 time=2025-06-14T20:24:25.815-04:00 level=DEBUG source=server.go:729 msg="completion request" images=0 prompt=3158 format="\"json\"" time=2025-06-14T20:24:26.077-04:00 level=DEBUG source=vocabulary.go:52 msg="adding bos token to prompt" id=[2] time=2025-06-14T20:24:26.077-04:00 level=DEBUG source=cache.go:136 msg="loading cache slot" id=0 cache=0 prompt=791 used=0 remaining=791 [GIN] 2025/06/14 - 20:24:41 | 200 | 21.398847101s | 127.0.0.1 | POST "/api/chat" time=2025-06-14T20:24:41.213-04:00 level=DEBUG source=sched.go:503 msg="context for request finished" time=2025-06-14T20:24:41.213-04:00 level=DEBUG source=sched.go:343 msg="runner with non-zero duration has gone idle, adding timer" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 duration=2562047h47m16.854775807s time=2025-06-14T20:24:41.213-04:00 level=DEBUG source=sched.go:361 msg="after processing request finished event" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 refCount=0 time=2025-06-14T20:24:41.270-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=general.alignment default=32 time=2025-06-14T20:24:41.272-04:00 level=DEBUG source=sched.go:615 msg="evaluating already loaded" model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 time=2025-06-14T20:24:41.272-04:00 level=DEBUG source=server.go:729 msg="completion request" images=0 prompt=2979 format="\"json\"" time=2025-06-14T20:24:41.513-04:00 level=DEBUG source=vocabulary.go:52 msg="adding bos token to prompt" id=[2] time=2025-06-14T20:24:41.513-04:00 level=DEBUG source=cache.go:136 msg="loading cache slot" id=0 cache=1170 prompt=759 used=0 remaining=759 [GIN] 2025/06/14 - 20:24:56 | 200 | 15.389986769s | 127.0.0.1 | POST "/api/chat" time=2025-06-14T20:24:56.606-04:00 level=DEBUG source=sched.go:434 msg="context for request finished" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 time=2025-06-14T20:24:56.606-04:00 level=DEBUG source=sched.go:343 msg="runner with non-zero duration has gone idle, adding timer" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 duration=2562047h47m16.854775807s time=2025-06-14T20:24:56.606-04:00 level=DEBUG source=sched.go:361 msg="after processing request finished event" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 refCount=0 time=2025-06-14T20:24:56.650-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=general.alignment default=32 time=2025-06-14T20:24:56.652-04:00 level=DEBUG source=sched.go:615 msg="evaluating already loaded" model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 time=2025-06-14T20:24:56.653-04:00 level=DEBUG source=server.go:729 msg="completion request" images=0 prompt=3331 format="\"json\"" time=2025-06-14T20:24:56.823-04:00 level=DEBUG source=vocabulary.go:52 msg="adding bos token to prompt" id=[2] time=2025-06-14T20:24:56.824-04:00 level=DEBUG source=cache.go:136 msg="loading cache slot" id=0 cache=1138 prompt=870 used=0 remaining=870 [GIN] 2025/06/14 - 20:25:13 | 200 | 16.622287091s | 127.0.0.1 | POST "/api/chat" time=2025-06-14T20:25:13.232-04:00 level=DEBUG source=sched.go:434 msg="context for request finished" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 time=2025-06-14T20:25:13.232-04:00 level=DEBUG source=sched.go:343 msg="runner with non-zero duration has gone idle, adding timer" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 duration=2562047h47m16.854775807s time=2025-06-14T20:25:13.232-04:00 level=DEBUG source=sched.go:361 msg="after processing request finished event" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 refCount=0 time=2025-06-14T20:25:13.292-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=general.alignment default=32 time=2025-06-14T20:25:13.294-04:00 level=DEBUG source=sched.go:615 msg="evaluating already loaded" model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 time=2025-06-14T20:25:13.294-04:00 level=DEBUG source=server.go:729 msg="completion request" images=0 prompt=4148 format="\"json\"" time=2025-06-14T20:25:13.527-04:00 level=DEBUG source=vocabulary.go:52 msg="adding bos token to prompt" id=[2] time=2025-06-14T20:25:13.527-04:00 level=DEBUG source=cache.go:136 msg="loading cache slot" id=0 cache=1285 prompt=931 used=0 remaining=931 [GIN] 2025/06/14 - 20:25:30 | 200 | 16.87517067s | 127.0.0.1 | POST "/api/chat" time=2025-06-14T20:25:30.113-04:00 level=DEBUG source=sched.go:434 msg="context for request finished" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 time=2025-06-14T20:25:30.113-04:00 level=DEBUG source=sched.go:343 msg="runner with non-zero duration has gone idle, adding timer" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 duration=2562047h47m16.854775807s time=2025-06-14T20:25:30.113-04:00 level=DEBUG source=sched.go:361 msg="after processing request finished event" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 refCount=0 time=2025-06-14T20:25:30.172-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=general.alignment default=32 time=2025-06-14T20:25:30.174-04:00 level=DEBUG source=sched.go:615 msg="evaluating already loaded" model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 time=2025-06-14T20:25:30.174-04:00 level=DEBUG source=server.go:729 msg="completion request" images=0 prompt=2802 format="\"json\"" time=2025-06-14T20:25:30.407-04:00 level=DEBUG source=vocabulary.go:52 msg="adding bos token to prompt" id=[2] time=2025-06-14T20:25:30.407-04:00 level=DEBUG source=cache.go:136 msg="loading cache slot" id=0 cache=1346 prompt=721 used=0 remaining=721 [GIN] 2025/06/14 - 20:25:43 | 200 | 13.766561764s | 127.0.0.1 | POST "/api/chat" time=2025-06-14T20:25:43.884-04:00 level=DEBUG source=sched.go:434 msg="context for request finished" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 time=2025-06-14T20:25:43.884-04:00 level=DEBUG source=sched.go:343 msg="runner with non-zero duration has gone idle, adding timer" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 duration=2562047h47m16.854775807s time=2025-06-14T20:25:43.884-04:00 level=DEBUG source=sched.go:361 msg="after processing request finished event" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 refCount=0 time=2025-06-14T20:25:43.940-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=general.alignment default=32 time=2025-06-14T20:25:43.943-04:00 level=DEBUG source=sched.go:615 msg="evaluating already loaded" model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 time=2025-06-14T20:25:43.944-04:00 level=DEBUG source=server.go:729 msg="completion request" images=0 prompt=2965 format="\"json\"" time=2025-06-14T20:25:44.179-04:00 level=DEBUG source=vocabulary.go:52 msg="adding bos token to prompt" id=[2] time=2025-06-14T20:25:44.180-04:00 level=DEBUG source=cache.go:136 msg="loading cache slot" id=0 cache=1061 prompt=753 used=0 remaining=753 [GIN] 2025/06/14 - 20:26:00 | 200 | 16.590631273s | 127.0.0.1 | POST "/api/chat" time=2025-06-14T20:26:00.479-04:00 level=DEBUG source=sched.go:434 msg="context for request finished" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 time=2025-06-14T20:26:00.479-04:00 level=DEBUG source=sched.go:343 msg="runner with non-zero duration has gone idle, adding timer" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 duration=2562047h47m16.854775807s time=2025-06-14T20:26:00.479-04:00 level=DEBUG source=sched.go:361 msg="after processing request finished event" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 refCount=0 time=2025-06-14T20:26:00.534-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=general.alignment default=32 time=2025-06-14T20:26:00.536-04:00 level=DEBUG source=sched.go:615 msg="evaluating already loaded" model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 time=2025-06-14T20:26:00.536-04:00 level=DEBUG source=server.go:729 msg="completion request" images=0 prompt=3681 format="\"json\"" time=2025-06-14T20:26:00.756-04:00 level=DEBUG source=vocabulary.go:52 msg="adding bos token to prompt" id=[2] time=2025-06-14T20:26:00.756-04:00 level=DEBUG source=cache.go:136 msg="loading cache slot" id=0 cache=1168 prompt=902 used=0 remaining=902 [GIN] 2025/06/14 - 20:26:17 | 200 | 16.851443096s | 127.0.0.1 | POST "/api/chat" time=2025-06-14T20:26:17.335-04:00 level=DEBUG source=sched.go:434 msg="context for request finished" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 time=2025-06-14T20:26:17.335-04:00 level=DEBUG source=sched.go:343 msg="runner with non-zero duration has gone idle, adding timer" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 duration=2562047h47m16.854775807s time=2025-06-14T20:26:17.335-04:00 level=DEBUG source=sched.go:361 msg="after processing request finished event" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 refCount=0 time=2025-06-14T20:26:17.391-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=general.alignment default=32 time=2025-06-14T20:26:17.393-04:00 level=DEBUG source=sched.go:615 msg="evaluating already loaded" model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 time=2025-06-14T20:26:17.394-04:00 level=DEBUG source=server.go:729 msg="completion request" images=0 prompt=4189 format="\"json\"" time=2025-06-14T20:26:17.617-04:00 level=DEBUG source=vocabulary.go:52 msg="adding bos token to prompt" id=[2] time=2025-06-14T20:26:17.617-04:00 level=DEBUG source=cache.go:136 msg="loading cache slot" id=0 cache=1317 prompt=1121 used=0 remaining=1121 [GIN] 2025/06/14 - 20:26:33 | 200 | 15.704443348s | 127.0.0.1 | POST "/api/chat" time=2025-06-14T20:26:33.045-04:00 level=DEBUG source=sched.go:434 msg="context for request finished" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 time=2025-06-14T20:26:33.045-04:00 level=DEBUG source=sched.go:343 msg="runner with non-zero duration has gone idle, adding timer" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 duration=2562047h47m16.854775807s time=2025-06-14T20:26:33.045-04:00 level=DEBUG source=sched.go:361 msg="after processing request finished event" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 refCount=0 time=2025-06-14T20:26:33.104-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=general.alignment default=32 time=2025-06-14T20:26:33.106-04:00 level=DEBUG source=sched.go:615 msg="evaluating already loaded" model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 time=2025-06-14T20:26:33.106-04:00 level=DEBUG source=server.go:729 msg="completion request" images=0 prompt=3249 format="\"json\"" time=2025-06-14T20:26:33.319-04:00 level=DEBUG source=vocabulary.go:52 msg="adding bos token to prompt" id=[2] time=2025-06-14T20:26:33.319-04:00 level=DEBUG source=cache.go:136 msg="loading cache slot" id=0 cache=1500 prompt=832 used=0 remaining=832 [GIN] 2025/06/14 - 20:26:48 | 200 | 15.555706737s | 127.0.0.1 | POST "/api/chat" time=2025-06-14T20:26:48.605-04:00 level=DEBUG source=sched.go:434 msg="context for request finished" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 time=2025-06-14T20:26:48.605-04:00 level=DEBUG source=sched.go:343 msg="runner with non-zero duration has gone idle, adding timer" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 duration=2562047h47m16.854775807s time=2025-06-14T20:26:48.605-04:00 level=DEBUG source=sched.go:361 msg="after processing request finished event" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 refCount=0 time=2025-06-14T20:26:48.642-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=general.alignment default=32 time=2025-06-14T20:26:48.644-04:00 level=DEBUG source=sched.go:615 msg="evaluating already loaded" model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 time=2025-06-14T20:26:48.644-04:00 level=DEBUG source=server.go:729 msg="completion request" images=0 prompt=6622 format="\"json\"" time=2025-06-14T20:26:48.878-04:00 level=DEBUG source=vocabulary.go:52 msg="adding bos token to prompt" id=[2] time=2025-06-14T20:26:48.878-04:00 level=DEBUG source=cache.go:136 msg="loading cache slot" id=0 cache=1211 prompt=1567 used=0 remaining=1567 [GIN] 2025/06/14 - 20:27:05 | 200 | 16.591033158s | 127.0.0.1 | POST "/api/chat" time=2025-06-14T20:27:05.200-04:00 level=DEBUG source=sched.go:434 msg="context for request finished" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 time=2025-06-14T20:27:05.201-04:00 level=DEBUG source=sched.go:343 msg="runner with non-zero duration has gone idle, adding timer" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 duration=2562047h47m16.854775807s time=2025-06-14T20:27:05.201-04:00 level=DEBUG source=sched.go:361 msg="after processing request finished event" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 refCount=0 time=2025-06-14T20:27:05.255-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=general.alignment default=32 time=2025-06-14T20:27:05.257-04:00 level=DEBUG source=sched.go:615 msg="evaluating already loaded" model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 time=2025-06-14T20:27:05.257-04:00 level=DEBUG source=server.go:729 msg="completion request" images=0 prompt=3115 format="\"json\"" time=2025-06-14T20:27:05.464-04:00 level=DEBUG source=vocabulary.go:52 msg="adding bos token to prompt" id=[2] time=2025-06-14T20:27:05.464-04:00 level=DEBUG source=cache.go:136 msg="loading cache slot" id=0 cache=1946 prompt=780 used=0 remaining=780 [GIN] 2025/06/14 - 20:27:20 | 200 | 15.367599033s | 127.0.0.1 | POST "/api/chat" time=2025-06-14T20:27:20.573-04:00 level=DEBUG source=sched.go:434 msg="context for request finished" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 time=2025-06-14T20:27:20.573-04:00 level=DEBUG source=sched.go:343 msg="runner with non-zero duration has gone idle, adding timer" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 duration=2562047h47m16.854775807s time=2025-06-14T20:27:20.573-04:00 level=DEBUG source=sched.go:361 msg="after processing request finished event" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 refCount=0 time=2025-06-14T20:27:20.631-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=general.alignment default=32 time=2025-06-14T20:27:20.633-04:00 level=DEBUG source=sched.go:615 msg="evaluating already loaded" model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 time=2025-06-14T20:27:20.633-04:00 level=DEBUG source=server.go:729 msg="completion request" images=0 prompt=3543 format="\"json\"" time=2025-06-14T20:27:20.876-04:00 level=DEBUG source=vocabulary.go:52 msg="adding bos token to prompt" id=[2] time=2025-06-14T20:27:20.876-04:00 level=DEBUG source=cache.go:136 msg="loading cache slot" id=0 cache=1159 prompt=815 used=0 remaining=815 [GIN] 2025/06/14 - 20:27:35 | 200 | 15.308709032s | 127.0.0.1 | POST "/api/chat" time=2025-06-14T20:27:35.886-04:00 level=DEBUG source=sched.go:434 msg="context for request finished" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 time=2025-06-14T20:27:35.886-04:00 level=DEBUG source=sched.go:343 msg="runner with non-zero duration has gone idle, adding timer" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 duration=2562047h47m16.854775807s time=2025-06-14T20:27:35.886-04:00 level=DEBUG source=sched.go:361 msg="after processing request finished event" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 refCount=0 time=2025-06-14T20:27:35.921-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=general.alignment default=32 time=2025-06-14T20:27:35.922-04:00 level=DEBUG source=sched.go:615 msg="evaluating already loaded" model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 time=2025-06-14T20:27:35.923-04:00 level=DEBUG source=server.go:729 msg="completion request" images=0 prompt=4373 format="\"json\"" time=2025-06-14T20:27:36.118-04:00 level=DEBUG source=vocabulary.go:52 msg="adding bos token to prompt" id=[2] time=2025-06-14T20:27:36.118-04:00 level=DEBUG source=cache.go:136 msg="loading cache slot" id=0 cache=1194 prompt=1036 used=0 remaining=1036 [GIN] 2025/06/14 - 20:27:51 | 200 | 15.496595172s | 127.0.0.1 | POST "/api/chat" time=2025-06-14T20:27:51.388-04:00 level=DEBUG source=sched.go:434 msg="context for request finished" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 time=2025-06-14T20:27:51.388-04:00 level=DEBUG source=sched.go:343 msg="runner with non-zero duration has gone idle, adding timer" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 duration=2562047h47m16.854775807s time=2025-06-14T20:27:51.388-04:00 level=DEBUG source=sched.go:361 msg="after processing request finished event" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 refCount=0 time=2025-06-14T20:27:51.561-04:00 level=DEBUG source=sched.go:872 msg="shutting down runner" model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 time=2025-06-14T20:27:51.561-04:00 level=DEBUG source=server.go:1023 msg="stopping llama server" pid=3566569 time=2025-06-14T20:27:51.561-04:00 level=DEBUG source=server.go:1029 msg="waiting for llama server to exit" pid=3566569 time=2025-06-14T20:27:51.561-04:00 level=DEBUG source=sched.go:122 msg="shutting down scheduler pending loop" time=2025-06-14T20:27:51.561-04:00 level=DEBUG source=sched.go:322 msg="shutting down scheduler completed loop" time=2025-06-14T20:27:52.053-04:00 level=DEBUG source=server.go:1033 msg="llama server stopped" pid=3566569 ```
Author
Owner

@rohanshad commented on GitHub (Jun 15, 2025):

Actually realized that with v0.9.0 things appear to stabilize after about 45 minutes for me which is new behavior and I just wasn't running it long enough to see that change. Thanks for showing that second graph that helped a lot!

<!-- gh-comment-id:2973444797 --> @rohanshad commented on GitHub (Jun 15, 2025): Actually realized that with v0.9.0 things appear to stabilize after about 45 minutes for me which is new behavior and I just wasn't running it long enough to see that change. Thanks for showing that second graph that helped a lot!
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#69085