[GH-ISSUE #10688] Memory leak during inference using Gemma3 with structured output #69085

New Issue

GiteaMirror · 2026-05-04T17:07:34-05:00

GiteaMirror commented

2026-05-04 17:07:34 -05:00

Originally created by @leokeba on GitHub (May 13, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/10688

Originally assigned to: @ParthSareen on GitHub.

What is the issue?

I am getting what looks like a big memory leak when using ollama on a debian host with structured output with gemma3-12b. It's running on a 3090Ti with default settings.

Memory (system RAM) usage grows very fast (about 30mb/sec) continuously until it saturates the host and hangs. I then need to restart the ollama server and the same thing happens as soon as I start inferencing again. If I stop the inference script the memory usage does not go down, I always have to kill the process to clear the RAM.

I'm attaching the log output from the server, is anything else needed to look into this ?

Relevant log output

leo@debian-nvidia:~$ ollama serve
2025/05/13 16:48:00 routes.go:1233: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/home/leo/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
time=2025-05-13T16:48:00.561+02:00 level=INFO source=images.go:463 msg="total blobs: 33"
time=2025-05-13T16:48:00.562+02:00 level=INFO source=images.go:470 msg="total unused blobs removed: 0"
time=2025-05-13T16:48:00.562+02:00 level=INFO source=routes.go:1300 msg="Listening on 127.0.0.1:11434 (version 0.6.8)"
time=2025-05-13T16:48:00.562+02:00 level=INFO source=gpu.go:217 msg="looking for compatible GPUs"
time=2025-05-13T16:48:00.784+02:00 level=INFO source=types.go:130 msg="inference compute" id=GPU-d744590e-2a3b-8e2e-f4bc-988c67d6c902 library=cuda variant=v12 compute=8.6 driver=12.8 name="NVIDIA GeForce RTX 3090 Ti" total="23.6 GiB" available="23.3 GiB"
[GIN] 2025/05/13 - 16:48:37 | 200 |     980.376µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:48:37.849+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-05-13T16:48:37.996+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-05-13T16:48:38.026+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-05-13T16:48:38.028+02:00 level=INFO source=sched.go:754 msg="new model will fit in available VRAM in single GPU, loading" model=/home/leo/.ollama/models/blobs/sha256-e8ad13eff07a78d89926e9e8b882317d082ef5bf9768ad7b50fcdbbcd63748de gpu=GPU-d744590e-2a3b-8e2e-f4bc-988c67d6c902 parallel=2 available=24974786560 required="11.0 GiB"
time=2025-05-13T16:48:38.137+02:00 level=INFO source=server.go:106 msg="system memory" total="31.2 GiB" free="29.3 GiB" free_swap="542.1 MiB"
time=2025-05-13T16:48:38.139+02:00 level=INFO source=server.go:139 msg=offload library=cuda layers.requested=-1 layers.model=49 layers.offload=49 layers.split="" memory.available="[23.3 GiB]" memory.gpu_overhead="0 B" memory.required.full="11.0 GiB" memory.required.partial="11.0 GiB" memory.required.kv="1.3 GiB" memory.required.allocations="[11.0 GiB]" memory.weights.total="6.8 GiB" memory.weights.repeating="6.0 GiB" memory.weights.nonrepeating="787.5 MiB" memory.graph.full="519.5 MiB" memory.graph.partial="1.3 GiB" projector.weights="795.9 MiB" projector.graph="1.0 GiB"
time=2025-05-13T16:48:38.192+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-05-13T16:48:38.193+02:00 level=WARN source=ggml.go:152 msg="key not found" key=tokenizer.ggml.add_eot_token default=false
time=2025-05-13T16:48:38.196+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.attention.layer_norm_rms_epsilon default=9.999999974752427e-07
time=2025-05-13T16:48:38.196+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.local.freq_base default=10000
time=2025-05-13T16:48:38.196+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06
time=2025-05-13T16:48:38.196+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.freq_scale default=1
time=2025-05-13T16:48:38.196+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.mm_tokens_per_image default=256
time=2025-05-13T16:48:38.197+02:00 level=INFO source=server.go:410 msg="starting llama server" cmd="/usr/local/bin/ollama runner --ollama-engine --model /home/leo/.ollama/models/blobs/sha256-e8ad13eff07a78d89926e9e8b882317d082ef5bf9768ad7b50fcdbbcd63748de --ctx-size 8192 --batch-size 512 --n-gpu-layers 49 --threads 10 --parallel 2 --port 33857"
time=2025-05-13T16:48:38.197+02:00 level=INFO source=sched.go:452 msg="loaded runners" count=1
time=2025-05-13T16:48:38.197+02:00 level=INFO source=server.go:589 msg="waiting for llama runner to start responding"
time=2025-05-13T16:48:38.197+02:00 level=INFO source=server.go:623 msg="waiting for server to become available" status="llm server not responding"
time=2025-05-13T16:48:38.205+02:00 level=INFO source=runner.go:851 msg="starting ollama engine"
time=2025-05-13T16:48:38.205+02:00 level=INFO source=runner.go:914 msg="Server listening on 127.0.0.1:33857"
time=2025-05-13T16:48:38.255+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-05-13T16:48:38.256+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.name default=""
time=2025-05-13T16:48:38.256+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.description default=""
time=2025-05-13T16:48:38.256+02:00 level=INFO source=ggml.go:72 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=1065 num_key_values=37
load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-haswell.so
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 3090 Ti, compute capability 8.6, VMM: yes
load_backend: loaded CUDA backend from /usr/local/lib/ollama/cuda_v12/libggml-cuda.so
time=2025-05-13T16:48:38.313+02:00 level=INFO source=ggml.go:103 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=500,600,610,700,750,800,860,870,890,900,1200 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc)
time=2025-05-13T16:48:38.385+02:00 level=INFO source=ggml.go:298 msg="model weights" buffer=CUDA0 size="7.6 GiB"
time=2025-05-13T16:48:38.385+02:00 level=INFO source=ggml.go:298 msg="model weights" buffer=CPU size="787.5 MiB"
time=2025-05-13T16:48:38.449+02:00 level=INFO source=server.go:623 msg="waiting for server to become available" status="llm server loading model"
time=2025-05-13T16:48:43.663+02:00 level=WARN source=ggml.go:152 msg="key not found" key=tokenizer.ggml.add_eot_token default=false
time=2025-05-13T16:48:43.667+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.attention.layer_norm_rms_epsilon default=9.999999974752427e-07
time=2025-05-13T16:48:43.667+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.local.freq_base default=10000
time=2025-05-13T16:48:43.667+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06
time=2025-05-13T16:48:43.667+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.freq_scale default=1
time=2025-05-13T16:48:43.667+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.mm_tokens_per_image default=256
time=2025-05-13T16:48:43.694+02:00 level=INFO source=ggml.go:553 msg="compute graph" backend=CUDA0 buffer_type=CUDA0 size="308.0 MiB"
time=2025-05-13T16:48:43.694+02:00 level=INFO source=ggml.go:553 msg="compute graph" backend=CPU buffer_type=CPU size="7.5 MiB"
time=2025-05-13T16:48:43.713+02:00 level=INFO source=server.go:628 msg="llama runner started in 5.52 seconds"
[GIN] 2025/05/13 - 16:48:45 | 200 |  7.544991459s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:48:45 | 200 |     858.796µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:48:45.387+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:48:46 | 200 |  990.816217ms |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:48:46 | 200 |     821.351µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:48:46.381+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:48:47 | 200 |  768.481138ms |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:48:47 | 200 |       828.9µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:48:47.153+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:48:47 | 200 |  762.318591ms |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:48:47 | 200 |     813.687µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:48:47.918+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:48:48 | 200 |  763.796017ms |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:48:48 | 200 |     935.834µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:48:48.686+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:48:49 | 200 |  762.516804ms |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:48:49 | 200 |     835.289µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:48:49.451+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:48:50 | 200 |  755.520694ms |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:48:50 | 200 |     843.444µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:48:50.209+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:48:50 | 200 |  754.848081ms |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:48:50 | 200 |     861.046µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:48:50.969+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:48:51 | 200 |  745.156696ms |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:48:51 | 200 |     863.531µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:48:51.716+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:48:52 | 200 |  1.054927393s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:48:52 | 200 |     855.916µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:48:52.775+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:48:53 | 200 |  994.745329ms |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:48:53 | 200 |     854.313µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:48:53.773+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:48:54 | 200 |  1.049445941s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:48:54 | 200 |     811.638µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:48:54.824+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:48:55 | 200 |  850.098955ms |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:48:55 | 200 |     792.085µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:48:55.678+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:48:56 | 200 |  993.567647ms |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:48:56 | 200 |     817.565µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:48:56.675+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:48:57 | 200 |  984.835547ms |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:48:57 | 200 |     816.258µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:48:57.663+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:48:58 | 200 |  1.027214846s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:48:58 | 200 |     815.524µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:48:58.693+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:48:59 | 200 |  1.056577493s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:48:59 | 200 |     932.081µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:48:59.754+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:49:00 | 200 |  988.791845ms |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:49:00 | 200 |     821.784µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:49:00.745+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:49:01 | 200 |  1.002698372s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:49:01 | 200 |     832.131µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:49:01.751+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:49:02 | 200 |  1.061438887s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:49:02 | 200 |     853.777µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:49:02.817+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:49:03 | 200 |  1.073778163s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:49:03 | 200 |     818.274µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:49:03.893+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:49:04 | 200 |  978.437141ms |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:49:04 | 200 |     820.064µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:49:04.875+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:49:05 | 200 |  984.538035ms |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:49:05 | 200 |     815.883µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:49:05.863+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:49:06 | 200 |  984.138888ms |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:49:06 | 200 |     828.386µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:49:06.849+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:49:07 | 200 |  984.296063ms |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:49:07 | 200 |     802.968µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:49:07.837+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:49:08 | 200 |  1.076913058s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:49:08 | 200 |      812.68µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:49:08.917+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:49:09 | 200 |  994.295375ms |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:49:09 | 200 |     829.005µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:49:09.915+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:49:10 | 200 |  925.360822ms |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:49:10 | 200 |     837.753µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:49:10.843+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:49:11 | 200 |  1.007319818s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:49:11 | 200 |      848.73µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:49:11.854+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:49:12 | 200 |  1.000909106s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:49:12 | 200 |      844.68µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:49:12.858+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:49:13 | 200 |  1.008300583s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:49:13 | 200 |     801.465µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:49:13.869+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:49:14 | 200 |  1.006693507s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:49:14 | 200 |     831.461µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:49:14.880+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:49:15 | 200 |  950.929392ms |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:49:15 | 200 |     841.006µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:49:15.833+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:49:16 | 200 |  952.581587ms |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:49:16 | 200 |     839.864µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:49:16.789+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:49:17 | 200 |   997.29827ms |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:49:17 | 200 |     820.078µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:49:17.790+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:49:18 | 200 |  1.056098038s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:49:18 | 200 |     807.002µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:49:18.849+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:49:19 | 200 |  994.453749ms |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:49:19 | 200 |     803.421µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:49:19.846+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:49:20 | 200 |   988.04637ms |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:49:20 | 200 |     837.535µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:49:20.838+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:49:21 | 200 |  932.301373ms |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:49:21 | 200 |     839.469µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:49:21.773+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:49:22 | 200 |  1.059917962s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:49:22 | 200 |      824.76µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:49:22.836+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:49:24 | 200 |  1.269081446s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:49:24 | 200 |     820.431µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:49:24.109+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:49:24 | 200 |   730.49696ms |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:49:24 | 200 |     837.096µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:49:24.842+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:49:25 | 200 |  749.720052ms |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:49:25 | 200 |     893.069µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:49:25.596+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:49:26 | 200 |    1.0075791s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:49:26 | 200 |     826.889µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:49:26.607+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:49:27 | 200 |  985.420873ms |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:49:27 | 200 |     809.116µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:49:27.595+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:49:28 | 200 |  1.078693931s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:49:28 | 200 |     836.868µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:49:28.676+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:49:29 | 200 |  1.026126364s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:49:29 | 200 |     814.288µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:49:29.707+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:49:30 | 200 |   984.94677ms |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:49:30 | 200 |     817.913µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:49:30.694+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:49:31 | 200 |  1.032658607s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:49:31 | 200 |     794.625µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:49:31.730+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:49:32 | 200 |  1.066533978s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:49:33 | 200 |     836.575µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:49:33.168+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:49:34 | 200 |  1.447059891s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:49:34 | 200 |     841.524µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:49:34.618+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:49:35 | 200 |  1.027703899s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:49:35 | 200 |     863.036µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:49:35.648+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:49:36 | 200 |  994.915362ms |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:49:36 | 200 |     803.869µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:49:36.647+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:49:38 | 200 |   1.58195298s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:49:38 | 200 |     884.533µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:49:38.232+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:49:39 | 200 |  1.142600246s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:49:39 | 200 |      825.11µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:49:39.377+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:49:40 | 200 |  1.080568844s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:49:40 | 200 |     815.512µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:49:40.462+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:49:41 | 200 |  1.146987692s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:49:41 | 200 |     811.222µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:49:41.611+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:49:42 | 200 |  979.825221ms |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:49:42 | 200 |     852.475µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:49:42.594+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:49:43 | 200 |  982.748158ms |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:49:43 | 200 |     820.496µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:49:43.581+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:49:44 | 200 |  1.100700606s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:49:44 | 200 |     862.996µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:49:44.684+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:49:46 | 200 |  1.497456592s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:49:46 | 200 |     853.133µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:49:46.185+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:49:47 | 200 |  1.064251921s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:49:47 | 200 |     890.363µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:49:47.253+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:49:48 | 200 |  919.330615ms |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:49:48 | 200 |     887.668µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:49:48.175+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:49:49 | 200 |  937.752028ms |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:49:49 | 200 |     826.643µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:49:49.116+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:49:50 | 200 |  991.541073ms |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:49:50 | 200 |      815.88µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:49:50.112+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:49:51 | 200 |  997.471176ms |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:49:51 | 200 |     835.182µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:49:51.112+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:49:52 | 200 |  1.051647686s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:49:52 | 200 |     817.042µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:49:52.166+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:49:53 | 200 |  1.104912045s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:49:53 | 200 |     834.291µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:49:53.275+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:49:54 | 200 |  945.281302ms |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:49:54 | 200 |    1.186419ms |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:49:54.224+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:49:55 | 200 |  898.399962ms |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:49:55 | 200 |     796.173µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:49:55.125+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:49:56 | 200 |  1.108278874s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:49:56 | 200 |     877.902µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:49:56.237+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:49:57 | 200 |  922.307376ms |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:49:57 | 200 |     818.859µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:49:57.162+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:49:58 | 200 |  1.093775179s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:49:58 | 200 |     826.919µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:49:58.259+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:49:59 | 200 |  937.066277ms |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:49:59 | 200 |     817.475µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:49:59.200+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:50:00 | 200 |  902.689181ms |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:50:00 | 200 |     830.142µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:50:00.105+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:50:01 | 200 |  1.583952845s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:50:01 | 200 |     827.615µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:50:01.692+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:50:02 | 200 |  1.105850744s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:50:02 | 200 |      842.06µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:50:02.802+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:50:03 | 200 |  928.793041ms |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:50:03 | 200 |     827.222µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:50:03.733+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:50:04 | 200 |  910.071569ms |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:50:04 | 200 |    1.226551ms |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:50:04.648+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:50:05 | 200 |  936.489608ms |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:50:05 | 200 |     833.971µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:50:05.590+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:50:06 | 200 |  1.164017222s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:50:06 | 200 |     832.653µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:50:06.754+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:50:07 | 200 |  1.022418572s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:50:07 | 200 |     837.561µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:50:07.781+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:50:08 | 200 |  969.502488ms |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:50:08 | 200 |     779.343µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:50:08.753+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:50:09 | 200 |  961.229721ms |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:50:09 | 200 |     821.739µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:50:09.717+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:50:10 | 200 |  1.200781498s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:50:10 | 200 |     838.319µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:50:10.921+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:50:11 | 200 |  1.050320821s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:50:11 | 200 |      814.57µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:50:11.975+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:50:12 | 200 |  985.961573ms |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:50:12 | 200 |     904.037µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:50:12.964+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:50:14 | 200 |  1.193615905s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:50:14 | 200 |     820.404µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:50:14.160+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:50:15 | 200 |  959.403609ms |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:50:15 | 200 |     831.935µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:50:15.124+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:50:16 | 200 |  1.176947939s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:50:16 | 200 |     831.985µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:50:16.303+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:50:17 | 200 |  923.669162ms |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:50:17 | 200 |     784.075µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:50:17.230+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:50:18 | 200 |   969.61224ms |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:50:18 | 200 |     806.076µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:50:18.203+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:50:19 | 200 |  1.001123719s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:50:19 | 200 |     812.874µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:50:19.207+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:50:20 | 200 |  948.493703ms |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:50:20 | 200 |     829.258µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:50:20.159+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:50:21 | 200 |  929.241283ms |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:50:21 | 200 |     857.089µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:50:21.092+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:50:22 | 200 |  1.107269829s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:50:22 | 200 |     805.287µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:50:22.202+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:50:23 | 200 |  1.089536435s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:50:23 | 200 |     836.418µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:50:23.295+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:50:23 | 500 |   494.37497ms |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:51:36 | 200 |     819.058µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:51:36.739+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-05-13T16:51:36.894+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-05-13T16:51:36.925+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-05-13T16:51:36.929+02:00 level=INFO source=sched.go:517 msg="updated VRAM based on existing loaded models" gpu=GPU-d744590e-2a3b-8e2e-f4bc-988c67d6c902 library=cuda total="23.6 GiB" available="12.6 GiB"
time=2025-05-13T16:51:36.929+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.vision.block_count default=0
time=2025-05-13T16:51:36.930+02:00 level=INFO source=sched.go:754 msg="new model will fit in available VRAM in single GPU, loading" model=/home/leo/.ollama/models/blobs/sha256-2fc97d5d1c3217ac90931d7992762a418e09b2c5c49c030c6c1f9e8fbb221013 gpu=GPU-d744590e-2a3b-8e2e-f4bc-988c67d6c902 parallel=2 available=13522255936 required="10.0 GiB"
time=2025-05-13T16:51:37.047+02:00 level=INFO source=server.go:106 msg="system memory" total="31.2 GiB" free="25.2 GiB" free_swap="542.6 MiB"
time=2025-05-13T16:51:37.047+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.vision.block_count default=0
time=2025-05-13T16:51:37.048+02:00 level=INFO source=server.go:139 msg=offload library=cuda layers.requested=-1 layers.model=49 layers.offload=49 layers.split="" memory.available="[12.6 GiB]" memory.gpu_overhead="0 B" memory.required.full="10.0 GiB" memory.required.partial="10.0 GiB" memory.required.kv="1.3 GiB" memory.required.allocations="[10.0 GiB]" memory.weights.total="7.6 GiB" memory.weights.repeating="6.8 GiB" memory.weights.nonrepeating="787.7 MiB" memory.graph.full="519.6 MiB" memory.graph.partial="1.3 GiB"
time=2025-05-13T16:51:37.098+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-05-13T16:51:37.099+02:00 level=WARN source=ggml.go:152 msg="key not found" key=tokenizer.ggml.add_eot_token default=false
time=2025-05-13T16:51:37.101+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.vision.image_size default=0
time=2025-05-13T16:51:37.101+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.vision.patch_size default=0
time=2025-05-13T16:51:37.101+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.vision.num_channels default=0
time=2025-05-13T16:51:37.101+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.vision.block_count default=0
time=2025-05-13T16:51:37.101+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.vision.embedding_length default=0
time=2025-05-13T16:51:37.101+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.vision.attention.head_count default=0
time=2025-05-13T16:51:37.101+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.vision.image_size default=0
time=2025-05-13T16:51:37.101+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.vision.patch_size default=0
time=2025-05-13T16:51:37.101+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.vision.attention.layer_norm_epsilon default=0
time=2025-05-13T16:51:37.103+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.freq_scale default=1
time=2025-05-13T16:51:37.103+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.mm_tokens_per_image default=256
time=2025-05-13T16:51:37.103+02:00 level=INFO source=server.go:410 msg="starting llama server" cmd="/usr/local/bin/ollama runner --ollama-engine --model /home/leo/.ollama/models/blobs/sha256-2fc97d5d1c3217ac90931d7992762a418e09b2c5c49c030c6c1f9e8fbb221013 --ctx-size 8192 --batch-size 512 --n-gpu-layers 49 --threads 10 --parallel 2 --port 42857"
time=2025-05-13T16:51:37.103+02:00 level=INFO source=sched.go:452 msg="loaded runners" count=2
time=2025-05-13T16:51:37.104+02:00 level=INFO source=server.go:589 msg="waiting for llama runner to start responding"
time=2025-05-13T16:51:37.104+02:00 level=INFO source=server.go:623 msg="waiting for server to become available" status="llm server not responding"
time=2025-05-13T16:51:37.111+02:00 level=INFO source=runner.go:851 msg="starting ollama engine"
time=2025-05-13T16:51:37.112+02:00 level=INFO source=runner.go:914 msg="Server listening on 127.0.0.1:42857"
time=2025-05-13T16:51:37.165+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-05-13T16:51:37.166+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.name default=""
time=2025-05-13T16:51:37.166+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.description default=""
time=2025-05-13T16:51:37.166+02:00 level=INFO source=ggml.go:72 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=1737 num_key_values=32
load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-haswell.so
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 3090 Ti, compute capability 8.6, VMM: yes
load_backend: loaded CUDA backend from /usr/local/lib/ollama/cuda_v12/libggml-cuda.so
time=2025-05-13T16:51:37.204+02:00 level=INFO source=ggml.go:103 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=500,600,610,700,750,800,860,870,890,900,1200 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc)
time=2025-05-13T16:51:37.285+02:00 level=INFO source=ggml.go:298 msg="model weights" buffer=CUDA0 size="8.4 GiB"
time=2025-05-13T16:51:37.285+02:00 level=INFO source=ggml.go:298 msg="model weights" buffer=CPU size="787.7 MiB"
time=2025-05-13T16:51:37.355+02:00 level=INFO source=server.go:623 msg="waiting for server to become available" status="llm server loading model"
time=2025-05-13T16:51:38.979+02:00 level=WARN source=ggml.go:152 msg="key not found" key=tokenizer.ggml.add_eot_token default=false
time=2025-05-13T16:51:38.981+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.vision.image_size default=0
time=2025-05-13T16:51:38.981+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.vision.patch_size default=0
time=2025-05-13T16:51:38.981+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.vision.num_channels default=0
time=2025-05-13T16:51:38.981+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.vision.block_count default=0
time=2025-05-13T16:51:38.981+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.vision.embedding_length default=0
time=2025-05-13T16:51:38.981+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.vision.attention.head_count default=0
time=2025-05-13T16:51:38.981+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.vision.image_size default=0
time=2025-05-13T16:51:38.981+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.vision.patch_size default=0
time=2025-05-13T16:51:38.981+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.vision.attention.layer_norm_epsilon default=0
time=2025-05-13T16:51:38.983+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.freq_scale default=1
time=2025-05-13T16:51:38.983+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.mm_tokens_per_image default=256
time=2025-05-13T16:51:39.009+02:00 level=INFO source=ggml.go:553 msg="compute graph" backend=CUDA0 buffer_type=CUDA0 size="308.0 MiB"
time=2025-05-13T16:51:39.009+02:00 level=INFO source=ggml.go:553 msg="compute graph" backend=CPU buffer_type=CPU size="7.5 MiB"
time=2025-05-13T16:51:39.109+02:00 level=INFO source=server.go:628 msg="llama runner started in 2.01 seconds"
[GIN] 2025/05/13 - 16:51:40 | 200 |  3.625931541s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:51:40 | 200 |    1.304686ms |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:51:40.368+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:51:41 | 200 |  1.173288276s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:51:41 | 200 |     884.953µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:51:41.545+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:51:42 | 200 |  1.093041205s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:51:42 | 200 |     876.112µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:51:42.641+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:51:43 | 200 |  1.190148307s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:51:43 | 200 |     927.104µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:51:43.835+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:51:44 | 200 |  1.117178999s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:51:44 | 200 |     805.534µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:51:44.955+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:51:46 | 200 |  1.188636665s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:51:46 | 200 |     1.21855ms |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:51:46.148+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:51:47 | 200 |  1.126001863s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:51:47 | 200 |     842.821µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:51:47.277+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:51:48 | 200 |  1.062130448s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:51:48 | 200 |      853.01µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:51:48.342+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:51:49 | 200 |  1.192205675s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:51:49 | 200 |     828.288µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:51:49.537+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:51:50 | 200 |  1.064617413s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:51:50 | 200 |     826.038µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:51:50.605+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:51:51 | 200 |  1.181298946s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:51:51 | 200 |     912.704µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:51:51.792+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:51:52 | 200 |  1.157731531s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:51:52 | 200 |     827.216µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:51:52.951+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:51:54 | 200 |  1.144311709s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:51:54 | 200 |     852.097µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:51:54.098+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:51:55 | 200 |  1.151178661s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:51:55 | 200 |     823.791µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:51:55.253+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:51:56 | 200 |  1.098051386s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:51:56 | 200 |     830.556µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:51:56.354+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:51:57 | 200 |  1.095882359s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:51:57 | 200 |     823.501µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:51:57.455+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:51:58 | 200 |  1.192077006s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:51:58 | 200 |     830.431µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:51:58.649+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:51:59 | 200 |  1.179899598s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:51:59 | 200 |     865.571µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:51:59.831+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:52:00 | 200 |  1.190354606s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:52:00 | 200 |     823.648µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:52:01.025+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:52:02 | 200 |  1.058573208s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:52:02 | 200 |     808.631µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:52:02.087+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:52:03 | 200 |  1.189769749s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:52:03 | 200 |     843.265µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:52:03.282+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:52:04 | 200 |  1.148332443s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:52:04 | 200 |     828.562µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:52:04.432+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:52:05 | 200 |  1.116938469s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:52:05 | 200 |     840.112µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:52:05.552+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:52:06 | 200 |  1.185360654s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:52:06 | 200 |     830.644µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:52:06.741+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:52:07 | 200 |  1.187190771s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:52:07 | 200 |     837.899µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:52:07.931+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:52:09 | 200 |  1.200790074s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:52:09 | 200 |     899.137µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:52:09.135+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:52:10 | 200 |  1.135805072s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:52:10 | 200 |     1.33389ms |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:52:10.275+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:52:11 | 200 |  1.178719063s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:52:11 | 200 |     792.231µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:52:11.457+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:52:12 | 200 |  1.068732647s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:52:12 | 200 |    1.358903ms |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:52:12.530+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:52:13 | 200 |  1.084584064s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:52:13 | 200 |     815.049µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:52:13.617+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:52:14 | 200 |  1.099273485s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:52:14 | 200 |     872.335µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:52:14.720+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:52:15 | 200 |  1.291627781s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:52:15 | 200 |     853.691µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:52:16.017+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:52:17 | 200 |  1.165006914s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:52:17 | 200 |     857.917µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:52:17.183+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:52:18 | 200 |  1.185455866s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:52:18 | 200 |      851.02µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:52:18.372+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:52:19 | 200 |  1.182260809s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:52:19 | 200 |     833.657µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:52:19.558+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:52:20 | 200 |  1.100934571s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:52:20 | 200 |     838.036µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:52:20.662+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:52:21 | 200 |  1.067580922s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:52:21 | 200 |      845.81µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:52:21.734+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:52:22 | 200 |  824.388844ms |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:52:22 | 200 |     851.607µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:52:22.561+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:52:23 | 200 |  1.157841226s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:52:23 | 200 |     808.772µs |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:52:23.722+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:52:24 | 200 |  1.226060831s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/05/13 - 16:52:24 | 200 |     1.02996ms |       127.0.0.1 | GET      "/api/tags"
time=2025-05-13T16:52:24.951+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
[GIN] 2025/05/13 - 16:52:26 | 200 |  1.300700119s |       127.0.0.1 | POST     "/api/chat"

OS

Linux

GPU

Nvidia

CPU

Intel

Ollama version

0.6.8

Originally created by @leokeba on GitHub (May 13, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/10688 Originally assigned to: @ParthSareen on GitHub. ### What is the issue? I am getting what looks like a big memory leak when using ollama on a debian host with structured output with gemma3-12b. It's running on a 3090Ti with default settings. Memory (system RAM) usage grows very fast (about 30mb/sec) continuously until it saturates the host and hangs. I then need to restart the ollama server and the same thing happens as soon as I start inferencing again. If I stop the inference script the memory usage does not go down, I always have to kill the process to clear the RAM. I'm attaching the log output from the server, is anything else needed to look into this ? ### Relevant log output ```shell leo@debian-nvidia:~$ ollama serve 2025/05/13 16:48:00 routes.go:1233: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/home/leo/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]" time=2025-05-13T16:48:00.561+02:00 level=INFO source=images.go:463 msg="total blobs: 33" time=2025-05-13T16:48:00.562+02:00 level=INFO source=images.go:470 msg="total unused blobs removed: 0" time=2025-05-13T16:48:00.562+02:00 level=INFO source=routes.go:1300 msg="Listening on 127.0.0.1:11434 (version 0.6.8)" time=2025-05-13T16:48:00.562+02:00 level=INFO source=gpu.go:217 msg="looking for compatible GPUs" time=2025-05-13T16:48:00.784+02:00 level=INFO source=types.go:130 msg="inference compute" id=GPU-d744590e-2a3b-8e2e-f4bc-988c67d6c902 library=cuda variant=v12 compute=8.6 driver=12.8 name="NVIDIA GeForce RTX 3090 Ti" total="23.6 GiB" available="23.3 GiB" [GIN] 2025/05/13 - 16:48:37 | 200 | 980.376µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:48:37.849+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 time=2025-05-13T16:48:37.996+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 time=2025-05-13T16:48:38.026+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 time=2025-05-13T16:48:38.028+02:00 level=INFO source=sched.go:754 msg="new model will fit in available VRAM in single GPU, loading" model=/home/leo/.ollama/models/blobs/sha256-e8ad13eff07a78d89926e9e8b882317d082ef5bf9768ad7b50fcdbbcd63748de gpu=GPU-d744590e-2a3b-8e2e-f4bc-988c67d6c902 parallel=2 available=24974786560 required="11.0 GiB" time=2025-05-13T16:48:38.137+02:00 level=INFO source=server.go:106 msg="system memory" total="31.2 GiB" free="29.3 GiB" free_swap="542.1 MiB" time=2025-05-13T16:48:38.139+02:00 level=INFO source=server.go:139 msg=offload library=cuda layers.requested=-1 layers.model=49 layers.offload=49 layers.split="" memory.available="[23.3 GiB]" memory.gpu_overhead="0 B" memory.required.full="11.0 GiB" memory.required.partial="11.0 GiB" memory.required.kv="1.3 GiB" memory.required.allocations="[11.0 GiB]" memory.weights.total="6.8 GiB" memory.weights.repeating="6.0 GiB" memory.weights.nonrepeating="787.5 MiB" memory.graph.full="519.5 MiB" memory.graph.partial="1.3 GiB" projector.weights="795.9 MiB" projector.graph="1.0 GiB" time=2025-05-13T16:48:38.192+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 time=2025-05-13T16:48:38.193+02:00 level=WARN source=ggml.go:152 msg="key not found" key=tokenizer.ggml.add_eot_token default=false time=2025-05-13T16:48:38.196+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.attention.layer_norm_rms_epsilon default=9.999999974752427e-07 time=2025-05-13T16:48:38.196+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.local.freq_base default=10000 time=2025-05-13T16:48:38.196+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06 time=2025-05-13T16:48:38.196+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.freq_scale default=1 time=2025-05-13T16:48:38.196+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.mm_tokens_per_image default=256 time=2025-05-13T16:48:38.197+02:00 level=INFO source=server.go:410 msg="starting llama server" cmd="/usr/local/bin/ollama runner --ollama-engine --model /home/leo/.ollama/models/blobs/sha256-e8ad13eff07a78d89926e9e8b882317d082ef5bf9768ad7b50fcdbbcd63748de --ctx-size 8192 --batch-size 512 --n-gpu-layers 49 --threads 10 --parallel 2 --port 33857" time=2025-05-13T16:48:38.197+02:00 level=INFO source=sched.go:452 msg="loaded runners" count=1 time=2025-05-13T16:48:38.197+02:00 level=INFO source=server.go:589 msg="waiting for llama runner to start responding" time=2025-05-13T16:48:38.197+02:00 level=INFO source=server.go:623 msg="waiting for server to become available" status="llm server not responding" time=2025-05-13T16:48:38.205+02:00 level=INFO source=runner.go:851 msg="starting ollama engine" time=2025-05-13T16:48:38.205+02:00 level=INFO source=runner.go:914 msg="Server listening on 127.0.0.1:33857" time=2025-05-13T16:48:38.255+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 time=2025-05-13T16:48:38.256+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.name default="" time=2025-05-13T16:48:38.256+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.description default="" time=2025-05-13T16:48:38.256+02:00 level=INFO source=ggml.go:72 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=1065 num_key_values=37 load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-haswell.so ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 CUDA devices: Device 0: NVIDIA GeForce RTX 3090 Ti, compute capability 8.6, VMM: yes load_backend: loaded CUDA backend from /usr/local/lib/ollama/cuda_v12/libggml-cuda.so time=2025-05-13T16:48:38.313+02:00 level=INFO source=ggml.go:103 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=500,600,610,700,750,800,860,870,890,900,1200 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc) time=2025-05-13T16:48:38.385+02:00 level=INFO source=ggml.go:298 msg="model weights" buffer=CUDA0 size="7.6 GiB" time=2025-05-13T16:48:38.385+02:00 level=INFO source=ggml.go:298 msg="model weights" buffer=CPU size="787.5 MiB" time=2025-05-13T16:48:38.449+02:00 level=INFO source=server.go:623 msg="waiting for server to become available" status="llm server loading model" time=2025-05-13T16:48:43.663+02:00 level=WARN source=ggml.go:152 msg="key not found" key=tokenizer.ggml.add_eot_token default=false time=2025-05-13T16:48:43.667+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.attention.layer_norm_rms_epsilon default=9.999999974752427e-07 time=2025-05-13T16:48:43.667+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.local.freq_base default=10000 time=2025-05-13T16:48:43.667+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06 time=2025-05-13T16:48:43.667+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.freq_scale default=1 time=2025-05-13T16:48:43.667+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.mm_tokens_per_image default=256 time=2025-05-13T16:48:43.694+02:00 level=INFO source=ggml.go:553 msg="compute graph" backend=CUDA0 buffer_type=CUDA0 size="308.0 MiB" time=2025-05-13T16:48:43.694+02:00 level=INFO source=ggml.go:553 msg="compute graph" backend=CPU buffer_type=CPU size="7.5 MiB" time=2025-05-13T16:48:43.713+02:00 level=INFO source=server.go:628 msg="llama runner started in 5.52 seconds" [GIN] 2025/05/13 - 16:48:45 | 200 | 7.544991459s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:48:45 | 200 | 858.796µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:48:45.387+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:48:46 | 200 | 990.816217ms | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:48:46 | 200 | 821.351µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:48:46.381+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:48:47 | 200 | 768.481138ms | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:48:47 | 200 | 828.9µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:48:47.153+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:48:47 | 200 | 762.318591ms | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:48:47 | 200 | 813.687µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:48:47.918+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:48:48 | 200 | 763.796017ms | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:48:48 | 200 | 935.834µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:48:48.686+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:48:49 | 200 | 762.516804ms | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:48:49 | 200 | 835.289µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:48:49.451+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:48:50 | 200 | 755.520694ms | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:48:50 | 200 | 843.444µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:48:50.209+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:48:50 | 200 | 754.848081ms | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:48:50 | 200 | 861.046µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:48:50.969+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:48:51 | 200 | 745.156696ms | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:48:51 | 200 | 863.531µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:48:51.716+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:48:52 | 200 | 1.054927393s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:48:52 | 200 | 855.916µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:48:52.775+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:48:53 | 200 | 994.745329ms | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:48:53 | 200 | 854.313µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:48:53.773+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:48:54 | 200 | 1.049445941s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:48:54 | 200 | 811.638µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:48:54.824+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:48:55 | 200 | 850.098955ms | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:48:55 | 200 | 792.085µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:48:55.678+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:48:56 | 200 | 993.567647ms | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:48:56 | 200 | 817.565µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:48:56.675+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:48:57 | 200 | 984.835547ms | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:48:57 | 200 | 816.258µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:48:57.663+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:48:58 | 200 | 1.027214846s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:48:58 | 200 | 815.524µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:48:58.693+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:48:59 | 200 | 1.056577493s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:48:59 | 200 | 932.081µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:48:59.754+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:49:00 | 200 | 988.791845ms | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:49:00 | 200 | 821.784µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:49:00.745+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:49:01 | 200 | 1.002698372s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:49:01 | 200 | 832.131µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:49:01.751+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:49:02 | 200 | 1.061438887s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:49:02 | 200 | 853.777µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:49:02.817+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:49:03 | 200 | 1.073778163s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:49:03 | 200 | 818.274µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:49:03.893+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:49:04 | 200 | 978.437141ms | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:49:04 | 200 | 820.064µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:49:04.875+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:49:05 | 200 | 984.538035ms | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:49:05 | 200 | 815.883µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:49:05.863+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:49:06 | 200 | 984.138888ms | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:49:06 | 200 | 828.386µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:49:06.849+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:49:07 | 200 | 984.296063ms | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:49:07 | 200 | 802.968µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:49:07.837+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:49:08 | 200 | 1.076913058s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:49:08 | 200 | 812.68µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:49:08.917+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:49:09 | 200 | 994.295375ms | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:49:09 | 200 | 829.005µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:49:09.915+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:49:10 | 200 | 925.360822ms | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:49:10 | 200 | 837.753µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:49:10.843+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:49:11 | 200 | 1.007319818s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:49:11 | 200 | 848.73µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:49:11.854+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:49:12 | 200 | 1.000909106s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:49:12 | 200 | 844.68µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:49:12.858+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:49:13 | 200 | 1.008300583s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:49:13 | 200 | 801.465µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:49:13.869+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:49:14 | 200 | 1.006693507s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:49:14 | 200 | 831.461µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:49:14.880+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:49:15 | 200 | 950.929392ms | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:49:15 | 200 | 841.006µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:49:15.833+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:49:16 | 200 | 952.581587ms | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:49:16 | 200 | 839.864µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:49:16.789+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:49:17 | 200 | 997.29827ms | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:49:17 | 200 | 820.078µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:49:17.790+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:49:18 | 200 | 1.056098038s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:49:18 | 200 | 807.002µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:49:18.849+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:49:19 | 200 | 994.453749ms | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:49:19 | 200 | 803.421µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:49:19.846+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:49:20 | 200 | 988.04637ms | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:49:20 | 200 | 837.535µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:49:20.838+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:49:21 | 200 | 932.301373ms | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:49:21 | 200 | 839.469µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:49:21.773+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:49:22 | 200 | 1.059917962s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:49:22 | 200 | 824.76µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:49:22.836+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:49:24 | 200 | 1.269081446s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:49:24 | 200 | 820.431µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:49:24.109+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:49:24 | 200 | 730.49696ms | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:49:24 | 200 | 837.096µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:49:24.842+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:49:25 | 200 | 749.720052ms | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:49:25 | 200 | 893.069µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:49:25.596+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:49:26 | 200 | 1.0075791s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:49:26 | 200 | 826.889µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:49:26.607+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:49:27 | 200 | 985.420873ms | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:49:27 | 200 | 809.116µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:49:27.595+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:49:28 | 200 | 1.078693931s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:49:28 | 200 | 836.868µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:49:28.676+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:49:29 | 200 | 1.026126364s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:49:29 | 200 | 814.288µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:49:29.707+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:49:30 | 200 | 984.94677ms | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:49:30 | 200 | 817.913µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:49:30.694+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:49:31 | 200 | 1.032658607s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:49:31 | 200 | 794.625µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:49:31.730+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:49:32 | 200 | 1.066533978s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:49:33 | 200 | 836.575µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:49:33.168+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:49:34 | 200 | 1.447059891s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:49:34 | 200 | 841.524µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:49:34.618+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:49:35 | 200 | 1.027703899s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:49:35 | 200 | 863.036µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:49:35.648+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:49:36 | 200 | 994.915362ms | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:49:36 | 200 | 803.869µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:49:36.647+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:49:38 | 200 | 1.58195298s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:49:38 | 200 | 884.533µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:49:38.232+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:49:39 | 200 | 1.142600246s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:49:39 | 200 | 825.11µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:49:39.377+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:49:40 | 200 | 1.080568844s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:49:40 | 200 | 815.512µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:49:40.462+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:49:41 | 200 | 1.146987692s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:49:41 | 200 | 811.222µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:49:41.611+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:49:42 | 200 | 979.825221ms | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:49:42 | 200 | 852.475µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:49:42.594+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:49:43 | 200 | 982.748158ms | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:49:43 | 200 | 820.496µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:49:43.581+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:49:44 | 200 | 1.100700606s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:49:44 | 200 | 862.996µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:49:44.684+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:49:46 | 200 | 1.497456592s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:49:46 | 200 | 853.133µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:49:46.185+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:49:47 | 200 | 1.064251921s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:49:47 | 200 | 890.363µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:49:47.253+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:49:48 | 200 | 919.330615ms | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:49:48 | 200 | 887.668µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:49:48.175+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:49:49 | 200 | 937.752028ms | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:49:49 | 200 | 826.643µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:49:49.116+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:49:50 | 200 | 991.541073ms | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:49:50 | 200 | 815.88µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:49:50.112+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:49:51 | 200 | 997.471176ms | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:49:51 | 200 | 835.182µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:49:51.112+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:49:52 | 200 | 1.051647686s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:49:52 | 200 | 817.042µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:49:52.166+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:49:53 | 200 | 1.104912045s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:49:53 | 200 | 834.291µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:49:53.275+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:49:54 | 200 | 945.281302ms | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:49:54 | 200 | 1.186419ms | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:49:54.224+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:49:55 | 200 | 898.399962ms | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:49:55 | 200 | 796.173µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:49:55.125+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:49:56 | 200 | 1.108278874s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:49:56 | 200 | 877.902µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:49:56.237+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:49:57 | 200 | 922.307376ms | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:49:57 | 200 | 818.859µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:49:57.162+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:49:58 | 200 | 1.093775179s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:49:58 | 200 | 826.919µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:49:58.259+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:49:59 | 200 | 937.066277ms | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:49:59 | 200 | 817.475µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:49:59.200+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:50:00 | 200 | 902.689181ms | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:50:00 | 200 | 830.142µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:50:00.105+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:50:01 | 200 | 1.583952845s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:50:01 | 200 | 827.615µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:50:01.692+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:50:02 | 200 | 1.105850744s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:50:02 | 200 | 842.06µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:50:02.802+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:50:03 | 200 | 928.793041ms | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:50:03 | 200 | 827.222µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:50:03.733+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:50:04 | 200 | 910.071569ms | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:50:04 | 200 | 1.226551ms | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:50:04.648+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:50:05 | 200 | 936.489608ms | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:50:05 | 200 | 833.971µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:50:05.590+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:50:06 | 200 | 1.164017222s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:50:06 | 200 | 832.653µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:50:06.754+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:50:07 | 200 | 1.022418572s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:50:07 | 200 | 837.561µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:50:07.781+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:50:08 | 200 | 969.502488ms | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:50:08 | 200 | 779.343µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:50:08.753+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:50:09 | 200 | 961.229721ms | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:50:09 | 200 | 821.739µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:50:09.717+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:50:10 | 200 | 1.200781498s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:50:10 | 200 | 838.319µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:50:10.921+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:50:11 | 200 | 1.050320821s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:50:11 | 200 | 814.57µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:50:11.975+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:50:12 | 200 | 985.961573ms | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:50:12 | 200 | 904.037µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:50:12.964+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:50:14 | 200 | 1.193615905s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:50:14 | 200 | 820.404µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:50:14.160+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:50:15 | 200 | 959.403609ms | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:50:15 | 200 | 831.935µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:50:15.124+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:50:16 | 200 | 1.176947939s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:50:16 | 200 | 831.985µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:50:16.303+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:50:17 | 200 | 923.669162ms | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:50:17 | 200 | 784.075µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:50:17.230+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:50:18 | 200 | 969.61224ms | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:50:18 | 200 | 806.076µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:50:18.203+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:50:19 | 200 | 1.001123719s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:50:19 | 200 | 812.874µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:50:19.207+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:50:20 | 200 | 948.493703ms | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:50:20 | 200 | 829.258µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:50:20.159+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:50:21 | 200 | 929.241283ms | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:50:21 | 200 | 857.089µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:50:21.092+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:50:22 | 200 | 1.107269829s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:50:22 | 200 | 805.287µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:50:22.202+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:50:23 | 200 | 1.089536435s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:50:23 | 200 | 836.418µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:50:23.295+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:50:23 | 500 | 494.37497ms | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:51:36 | 200 | 819.058µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:51:36.739+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 time=2025-05-13T16:51:36.894+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 time=2025-05-13T16:51:36.925+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 time=2025-05-13T16:51:36.929+02:00 level=INFO source=sched.go:517 msg="updated VRAM based on existing loaded models" gpu=GPU-d744590e-2a3b-8e2e-f4bc-988c67d6c902 library=cuda total="23.6 GiB" available="12.6 GiB" time=2025-05-13T16:51:36.929+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.vision.block_count default=0 time=2025-05-13T16:51:36.930+02:00 level=INFO source=sched.go:754 msg="new model will fit in available VRAM in single GPU, loading" model=/home/leo/.ollama/models/blobs/sha256-2fc97d5d1c3217ac90931d7992762a418e09b2c5c49c030c6c1f9e8fbb221013 gpu=GPU-d744590e-2a3b-8e2e-f4bc-988c67d6c902 parallel=2 available=13522255936 required="10.0 GiB" time=2025-05-13T16:51:37.047+02:00 level=INFO source=server.go:106 msg="system memory" total="31.2 GiB" free="25.2 GiB" free_swap="542.6 MiB" time=2025-05-13T16:51:37.047+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.vision.block_count default=0 time=2025-05-13T16:51:37.048+02:00 level=INFO source=server.go:139 msg=offload library=cuda layers.requested=-1 layers.model=49 layers.offload=49 layers.split="" memory.available="[12.6 GiB]" memory.gpu_overhead="0 B" memory.required.full="10.0 GiB" memory.required.partial="10.0 GiB" memory.required.kv="1.3 GiB" memory.required.allocations="[10.0 GiB]" memory.weights.total="7.6 GiB" memory.weights.repeating="6.8 GiB" memory.weights.nonrepeating="787.7 MiB" memory.graph.full="519.6 MiB" memory.graph.partial="1.3 GiB" time=2025-05-13T16:51:37.098+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 time=2025-05-13T16:51:37.099+02:00 level=WARN source=ggml.go:152 msg="key not found" key=tokenizer.ggml.add_eot_token default=false time=2025-05-13T16:51:37.101+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.vision.image_size default=0 time=2025-05-13T16:51:37.101+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.vision.patch_size default=0 time=2025-05-13T16:51:37.101+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.vision.num_channels default=0 time=2025-05-13T16:51:37.101+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.vision.block_count default=0 time=2025-05-13T16:51:37.101+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.vision.embedding_length default=0 time=2025-05-13T16:51:37.101+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.vision.attention.head_count default=0 time=2025-05-13T16:51:37.101+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.vision.image_size default=0 time=2025-05-13T16:51:37.101+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.vision.patch_size default=0 time=2025-05-13T16:51:37.101+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.vision.attention.layer_norm_epsilon default=0 time=2025-05-13T16:51:37.103+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.freq_scale default=1 time=2025-05-13T16:51:37.103+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.mm_tokens_per_image default=256 time=2025-05-13T16:51:37.103+02:00 level=INFO source=server.go:410 msg="starting llama server" cmd="/usr/local/bin/ollama runner --ollama-engine --model /home/leo/.ollama/models/blobs/sha256-2fc97d5d1c3217ac90931d7992762a418e09b2c5c49c030c6c1f9e8fbb221013 --ctx-size 8192 --batch-size 512 --n-gpu-layers 49 --threads 10 --parallel 2 --port 42857" time=2025-05-13T16:51:37.103+02:00 level=INFO source=sched.go:452 msg="loaded runners" count=2 time=2025-05-13T16:51:37.104+02:00 level=INFO source=server.go:589 msg="waiting for llama runner to start responding" time=2025-05-13T16:51:37.104+02:00 level=INFO source=server.go:623 msg="waiting for server to become available" status="llm server not responding" time=2025-05-13T16:51:37.111+02:00 level=INFO source=runner.go:851 msg="starting ollama engine" time=2025-05-13T16:51:37.112+02:00 level=INFO source=runner.go:914 msg="Server listening on 127.0.0.1:42857" time=2025-05-13T16:51:37.165+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 time=2025-05-13T16:51:37.166+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.name default="" time=2025-05-13T16:51:37.166+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.description default="" time=2025-05-13T16:51:37.166+02:00 level=INFO source=ggml.go:72 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=1737 num_key_values=32 load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-haswell.so ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 CUDA devices: Device 0: NVIDIA GeForce RTX 3090 Ti, compute capability 8.6, VMM: yes load_backend: loaded CUDA backend from /usr/local/lib/ollama/cuda_v12/libggml-cuda.so time=2025-05-13T16:51:37.204+02:00 level=INFO source=ggml.go:103 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=500,600,610,700,750,800,860,870,890,900,1200 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc) time=2025-05-13T16:51:37.285+02:00 level=INFO source=ggml.go:298 msg="model weights" buffer=CUDA0 size="8.4 GiB" time=2025-05-13T16:51:37.285+02:00 level=INFO source=ggml.go:298 msg="model weights" buffer=CPU size="787.7 MiB" time=2025-05-13T16:51:37.355+02:00 level=INFO source=server.go:623 msg="waiting for server to become available" status="llm server loading model" time=2025-05-13T16:51:38.979+02:00 level=WARN source=ggml.go:152 msg="key not found" key=tokenizer.ggml.add_eot_token default=false time=2025-05-13T16:51:38.981+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.vision.image_size default=0 time=2025-05-13T16:51:38.981+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.vision.patch_size default=0 time=2025-05-13T16:51:38.981+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.vision.num_channels default=0 time=2025-05-13T16:51:38.981+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.vision.block_count default=0 time=2025-05-13T16:51:38.981+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.vision.embedding_length default=0 time=2025-05-13T16:51:38.981+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.vision.attention.head_count default=0 time=2025-05-13T16:51:38.981+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.vision.image_size default=0 time=2025-05-13T16:51:38.981+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.vision.patch_size default=0 time=2025-05-13T16:51:38.981+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.vision.attention.layer_norm_epsilon default=0 time=2025-05-13T16:51:38.983+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.freq_scale default=1 time=2025-05-13T16:51:38.983+02:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.mm_tokens_per_image default=256 time=2025-05-13T16:51:39.009+02:00 level=INFO source=ggml.go:553 msg="compute graph" backend=CUDA0 buffer_type=CUDA0 size="308.0 MiB" time=2025-05-13T16:51:39.009+02:00 level=INFO source=ggml.go:553 msg="compute graph" backend=CPU buffer_type=CPU size="7.5 MiB" time=2025-05-13T16:51:39.109+02:00 level=INFO source=server.go:628 msg="llama runner started in 2.01 seconds" [GIN] 2025/05/13 - 16:51:40 | 200 | 3.625931541s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:51:40 | 200 | 1.304686ms | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:51:40.368+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:51:41 | 200 | 1.173288276s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:51:41 | 200 | 884.953µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:51:41.545+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:51:42 | 200 | 1.093041205s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:51:42 | 200 | 876.112µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:51:42.641+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:51:43 | 200 | 1.190148307s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:51:43 | 200 | 927.104µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:51:43.835+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:51:44 | 200 | 1.117178999s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:51:44 | 200 | 805.534µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:51:44.955+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:51:46 | 200 | 1.188636665s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:51:46 | 200 | 1.21855ms | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:51:46.148+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:51:47 | 200 | 1.126001863s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:51:47 | 200 | 842.821µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:51:47.277+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:51:48 | 200 | 1.062130448s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:51:48 | 200 | 853.01µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:51:48.342+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:51:49 | 200 | 1.192205675s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:51:49 | 200 | 828.288µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:51:49.537+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:51:50 | 200 | 1.064617413s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:51:50 | 200 | 826.038µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:51:50.605+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:51:51 | 200 | 1.181298946s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:51:51 | 200 | 912.704µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:51:51.792+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:51:52 | 200 | 1.157731531s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:51:52 | 200 | 827.216µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:51:52.951+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:51:54 | 200 | 1.144311709s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:51:54 | 200 | 852.097µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:51:54.098+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:51:55 | 200 | 1.151178661s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:51:55 | 200 | 823.791µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:51:55.253+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:51:56 | 200 | 1.098051386s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:51:56 | 200 | 830.556µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:51:56.354+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:51:57 | 200 | 1.095882359s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:51:57 | 200 | 823.501µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:51:57.455+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:51:58 | 200 | 1.192077006s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:51:58 | 200 | 830.431µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:51:58.649+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:51:59 | 200 | 1.179899598s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:51:59 | 200 | 865.571µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:51:59.831+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:52:00 | 200 | 1.190354606s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:52:00 | 200 | 823.648µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:52:01.025+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:52:02 | 200 | 1.058573208s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:52:02 | 200 | 808.631µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:52:02.087+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:52:03 | 200 | 1.189769749s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:52:03 | 200 | 843.265µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:52:03.282+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:52:04 | 200 | 1.148332443s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:52:04 | 200 | 828.562µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:52:04.432+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:52:05 | 200 | 1.116938469s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:52:05 | 200 | 840.112µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:52:05.552+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:52:06 | 200 | 1.185360654s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:52:06 | 200 | 830.644µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:52:06.741+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:52:07 | 200 | 1.187190771s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:52:07 | 200 | 837.899µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:52:07.931+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:52:09 | 200 | 1.200790074s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:52:09 | 200 | 899.137µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:52:09.135+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:52:10 | 200 | 1.135805072s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:52:10 | 200 | 1.33389ms | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:52:10.275+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:52:11 | 200 | 1.178719063s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:52:11 | 200 | 792.231µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:52:11.457+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:52:12 | 200 | 1.068732647s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:52:12 | 200 | 1.358903ms | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:52:12.530+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:52:13 | 200 | 1.084584064s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:52:13 | 200 | 815.049µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:52:13.617+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:52:14 | 200 | 1.099273485s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:52:14 | 200 | 872.335µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:52:14.720+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:52:15 | 200 | 1.291627781s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:52:15 | 200 | 853.691µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:52:16.017+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:52:17 | 200 | 1.165006914s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:52:17 | 200 | 857.917µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:52:17.183+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:52:18 | 200 | 1.185455866s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:52:18 | 200 | 851.02µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:52:18.372+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:52:19 | 200 | 1.182260809s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:52:19 | 200 | 833.657µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:52:19.558+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:52:20 | 200 | 1.100934571s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:52:20 | 200 | 838.036µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:52:20.662+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:52:21 | 200 | 1.067580922s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:52:21 | 200 | 845.81µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:52:21.734+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:52:22 | 200 | 824.388844ms | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:52:22 | 200 | 851.607µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:52:22.561+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:52:23 | 200 | 1.157841226s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:52:23 | 200 | 808.772µs | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:52:23.722+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:52:24 | 200 | 1.226060831s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/05/13 - 16:52:24 | 200 | 1.02996ms | 127.0.0.1 | GET "/api/tags" time=2025-05-13T16:52:24.951+02:00 level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 [GIN] 2025/05/13 - 16:52:26 | 200 | 1.300700119s | 127.0.0.1 | POST "/api/chat" ``` ### OS Linux ### GPU Nvidia ### CPU Intel ### Ollama version 0.6.8

GiteaMirror added the bug label 2026-05-04 17:07:34 -05:00

GiteaMirror closed this issue

2026-05-04 17:07:36 -05:00

GiteaMirror commented

2026-05-04 17:07:37 -05:00

@leokeba commented on GitHub (May 13, 2025):

Here's a minimal script to reproduce the issue :

from pydantic import BaseModel
from enum import StrEnum
from typing import List
import ollama

class TweetSubject(StrEnum):
    REGLES_DE_CRISE_ADAPTATIONS = "Règles de crise / adaptations"
    MEDECINE_PROTECTION_SANITAIRE = "Médecine / protection sanitaire"
    VACCIN = "Vaccin"
    PENURIE_GESTION_MATERIEL = "Pénurie / gestion matériel"
    PASS_SANITAIRE = "Pass sanitaire"
    AUTRES = "Autres"
    NSP = "NSP"

subjects_description = {
    TweetSubject.REGLES_DE_CRISE_ADAPTATIONS: "règles édictées pour répondre à la situation de pandémie de covid19 et façons de s’y adapter. Aspect légal, autorisations, interdictions, etc.",
    TweetSubject.MEDECINE_PROTECTION_SANITAIRE: "sujets médicaux, liés au covid19 ou à d’autres maladies, aux risques qu’elles entraînent et aux façons de s’en protéger.",
    TweetSubject.VACCIN: "vaccin contre le covid19. Ne se combine pas avec un autre sujet sauf si plusieurs questions bien distinctes.",
    TweetSubject.PENURIE_GESTION_MATERIEL: "gestion par l’état du matériel nécessaire pour lutter contre le covid19, y compris les moyens humains des hôpitaux, le prix des moyens de protection et leur accessibilité.",
    TweetSubject.PASS_SANITAIRE: "pass sanitaire, passeport vaccinal ou toute différenciation de droit entre personnes vaccinées et personnes non-vaccinées, ou toute autorisation ou interdiction liée au résultat d’un test. Si le sujet « pass sanitaire » est présent dans le message, pas de combinaison avec un autre. Si les sujets « pass sanitaire » et « vaccin » sont présents, indiquer « pass sanitaire », sauf si plusieurs questions bien distinctes.",
    TweetSubject.AUTRES: "sujet qui ne rentre dans aucune des autres catégories.",
    TweetSubject.NSP: "s’il n’y a pas d’élément susceptible d’éclairer sur le sujet du tweet"
}

class TweetEmotion(StrEnum):
    MECONTENTEMENT_COLERE = "Mécontentement / colère"
    GRATITUDE_VALIDATION = "Gratitude / validation"
    NEUTRE = "Neutre"
    AUTRE = "Autre"
    NSP = "NSP"

emotions_description = {
    TweetEmotion.MECONTENTEMENT_COLERE: "expression d’une émotion pouvant aller d’un léger mécontentement à la colère la plus violente.",
    TweetEmotion.GRATITUDE_VALIDATION: "expression d’une gratitude ou validation de propos ou d’actions.",
    TweetEmotion.NEUTRE: "pas d’émotion particulière détectable.",
    TweetEmotion.AUTRE: "expression d’une émotion qui ne rentre dans aucune des autres catégories.",
    TweetEmotion.NSP: "s’il n’y a pas d’élément susceptible d’éclairer sur l’émotion du tweet."
}

class TweetAnalysis(BaseModel):
    subject: List[TweetSubject]
    emotion: List[TweetEmotion]

subject_list = [subject.value for subject in TweetSubject]
emotion_list = [emotion.value for emotion in TweetEmotion]

def build_analysis_prompt(tweet):
    subject_list_string = '''
    - '''.join(f"'{subject}' : {subjects_description[subject]}" for subject in subject_list)
    emotion_list_string = '''
    - '''.join(f"'{emotion}' : {emotions_description[emotion]}" for emotion in emotion_list)

    prompt = f"""Analyze the following tweet and return the subjects and emotions that best characterizes its contents as a json object.
                Do not return more than 2 subjects and 2 emotions.
    
                Tweet: {tweet}

                Subjects: 
                - {subject_list_string}

                Emotions: 
                - {emotion_list_string}"""
    return prompt

def get_ollama_structured_response(prompt: str, model: BaseModel):
        """
        Sends a prompt to the Ollama model and returns the response content.
        """
        response = ollama.chat(
            model='gemma3:12b', 
            messages=[
                {
                    'role': 'user',
                    'content': prompt,
                }
            ],
            format=model.model_json_schema(),
        )
        return model.model_validate_json(response['message']['content'])

tweet1 = '''Il n'y a pas que la Chine et l'Italie dont s'inspirer         
Taiwan semble exemplaire dans son traitement du #Coronavirus  #Onvousrépond 
https://t.co/kr8LE9C0vg'''

tweet2 = '@la_muse88 Ça dépendra des résultats du 1er tour...si LR est en tête pas de confinement avant le 2e tour, sinon....#OnVousRepond #France2'

tweet3 = '''1- 50% des malades en réanimation on moins de 65ans
2- Est-ce que le fait d'être fumeur est un facteur à risque ?
3-Pourquoi les bureaux de tabac restent ouverts (non alimentaire)? 
 #michelcimes #France2 #OnVousRepond #COVIDー19'''

while 1:
    response = get_ollama_structured_response(build_analysis_prompt(tweet2), TweetAnalysis)
    print(response)
    response = get_ollama_structured_response(build_analysis_prompt(tweet1), TweetAnalysis)
    print(response)
    response = get_ollama_structured_response(build_analysis_prompt(tweet3), TweetAnalysis)
    print(response)

@leokeba commented on GitHub (May 13, 2025): Here's a minimal script to reproduce the issue : ``` from pydantic import BaseModel from enum import StrEnum from typing import List import ollama class TweetSubject(StrEnum): REGLES_DE_CRISE_ADAPTATIONS = "Règles de crise / adaptations" MEDECINE_PROTECTION_SANITAIRE = "Médecine / protection sanitaire" VACCIN = "Vaccin" PENURIE_GESTION_MATERIEL = "Pénurie / gestion matériel" PASS_SANITAIRE = "Pass sanitaire" AUTRES = "Autres" NSP = "NSP" subjects_description = { TweetSubject.REGLES_DE_CRISE_ADAPTATIONS: "règles édictées pour répondre à la situation de pandémie de covid19 et façons de s’y adapter. Aspect légal, autorisations, interdictions, etc.", TweetSubject.MEDECINE_PROTECTION_SANITAIRE: "sujets médicaux, liés au covid19 ou à d’autres maladies, aux risques qu’elles entraînent et aux façons de s’en protéger.", TweetSubject.VACCIN: "vaccin contre le covid19. Ne se combine pas avec un autre sujet sauf si plusieurs questions bien distinctes.", TweetSubject.PENURIE_GESTION_MATERIEL: "gestion par l’état du matériel nécessaire pour lutter contre le covid19, y compris les moyens humains des hôpitaux, le prix des moyens de protection et leur accessibilité.", TweetSubject.PASS_SANITAIRE: "pass sanitaire, passeport vaccinal ou toute différenciation de droit entre personnes vaccinées et personnes non-vaccinées, ou toute autorisation ou interdiction liée au résultat d’un test. Si le sujet « pass sanitaire » est présent dans le message, pas de combinaison avec un autre. Si les sujets « pass sanitaire » et « vaccin » sont présents, indiquer « pass sanitaire », sauf si plusieurs questions bien distinctes.", TweetSubject.AUTRES: "sujet qui ne rentre dans aucune des autres catégories.", TweetSubject.NSP: "s’il n’y a pas d’élément susceptible d’éclairer sur le sujet du tweet" } class TweetEmotion(StrEnum): MECONTENTEMENT_COLERE = "Mécontentement / colère" GRATITUDE_VALIDATION = "Gratitude / validation" NEUTRE = "Neutre" AUTRE = "Autre" NSP = "NSP" emotions_description = { TweetEmotion.MECONTENTEMENT_COLERE: "expression d’une émotion pouvant aller d’un léger mécontentement à la colère la plus violente.", TweetEmotion.GRATITUDE_VALIDATION: "expression d’une gratitude ou validation de propos ou d’actions.", TweetEmotion.NEUTRE: "pas d’émotion particulière détectable.", TweetEmotion.AUTRE: "expression d’une émotion qui ne rentre dans aucune des autres catégories.", TweetEmotion.NSP: "s’il n’y a pas d’élément susceptible d’éclairer sur l’émotion du tweet." } class TweetAnalysis(BaseModel): subject: List[TweetSubject] emotion: List[TweetEmotion] subject_list = [subject.value for subject in TweetSubject] emotion_list = [emotion.value for emotion in TweetEmotion] def build_analysis_prompt(tweet): subject_list_string = ''' - '''.join(f"'{subject}' : {subjects_description[subject]}" for subject in subject_list) emotion_list_string = ''' - '''.join(f"'{emotion}' : {emotions_description[emotion]}" for emotion in emotion_list) prompt = f"""Analyze the following tweet and return the subjects and emotions that best characterizes its contents as a json object. Do not return more than 2 subjects and 2 emotions. Tweet: {tweet} Subjects: - {subject_list_string} Emotions: - {emotion_list_string}""" return prompt def get_ollama_structured_response(prompt: str, model: BaseModel): """ Sends a prompt to the Ollama model and returns the response content. """ response = ollama.chat( model='gemma3:12b', messages=[ { 'role': 'user', 'content': prompt, } ], format=model.model_json_schema(), ) return model.model_validate_json(response['message']['content']) tweet1 = '''Il n'y a pas que la Chine et l'Italie dont s'inspirer Taiwan semble exemplaire dans son traitement du #Coronavirus #Onvousrépond https://t.co/kr8LE9C0vg''' tweet2 = '@la_muse88 Ça dépendra des résultats du 1er tour...si LR est en tête pas de confinement avant le 2e tour, sinon....#OnVousRepond #France2' tweet3 = '''1- 50% des malades en réanimation on moins de 65ans 2- Est-ce que le fait d'être fumeur est un facteur à risque ? 3-Pourquoi les bureaux de tabac restent ouverts (non alimentaire)? #michelcimes #France2 #OnVousRepond #COVIDー19''' while 1: response = get_ollama_structured_response(build_analysis_prompt(tweet2), TweetAnalysis) print(response) response = get_ollama_structured_response(build_analysis_prompt(tweet1), TweetAnalysis) print(response) response = get_ollama_structured_response(build_analysis_prompt(tweet3), TweetAnalysis) print(response) ```

GiteaMirror commented

2026-05-04 17:07:38 -05:00

@rick-github commented on GitHub (May 13, 2025):

#!/usr/bin/env python3

from pydantic import BaseModel
import ollama

ollama_url = "http://localhost:11434"
model = "gemma3:12b"

class result(BaseModel):
    parmeter1: str

messages=[
    {"role": "system", "content": "Extract the information."},
    {"role": "user", "content": "parameter 1 is 'hello'."},
]

def f():
  response = ollama.chat(
      model=model,
      messages=messages,
      format=result.model_json_schema())
  r = result.model_validate_json(response.message.content)
  print(r)

f()

Linear increase points to unreleased buffer,

@rick-github commented on GitHub (May 13, 2025): ```python #!/usr/bin/env python3 from pydantic import BaseModel import ollama ollama_url = "http://localhost:11434" model = "gemma3:12b" class result(BaseModel): parmeter1: str messages=[ {"role": "system", "content": "Extract the information."}, {"role": "user", "content": "parameter 1 is 'hello'."}, ] def f(): response = ollama.chat( model=model, messages=messages, format=result.model_json_schema()) r = result.model_validate_json(response.message.content) print(r) f() ``` ![Image](https://github.com/user-attachments/assets/8b7f96fd-08ed-458a-9bd8-412cbed967a1) Linear increase points to unreleased buffer,

GiteaMirror commented

2026-05-04 17:07:40 -05:00

@ParthSareen commented on GitHub (May 13, 2025):

Sorry about that! And thanks @leokeba @rick-github for the logs and the data. Will look into it

@ParthSareen commented on GitHub (May 13, 2025): Sorry about that! And thanks @leokeba @rick-github for the logs and the data. Will look into it

GiteaMirror commented

2026-05-04 17:07:41 -05:00

@rohanshad commented on GitHub (Jun 8, 2025):

Hi all,
I'm actually running into this issue issue even with v 0.9.0. RAM usage creeps up with each returned response of the chat api using gemma3 family of models till it saturates the system. There's minimal memory usage in the python script when profiled, and the log output is essentially the same as OP. With htop, I see multiple instances of ollama runner that stepwise increase MEM% usage as responses are processed.

@rohanshad commented on GitHub (Jun 8, 2025): Hi all, I'm actually running into this issue issue even with v 0.9.0. RAM usage creeps up with each returned response of the chat api using gemma3 family of models till it saturates the system. There's minimal memory usage in the python script when profiled, and the log output is essentially the same as OP. With htop, I see multiple instances of ollama runner that stepwise increase MEM% usage as responses are processed.

GiteaMirror commented

2026-05-04 17:07:41 -05:00

@rick-github commented on GitHub (Jun 8, 2025):

After some growth at the start, RSS was pretty constant over the course of an hour.

It may be a function of the type of grammar that is being created. Can you provide a simple script like the one above that demonstrates the problem?

@rick-github commented on GitHub (Jun 8, 2025): After some growth at the start, RSS was pretty constant over the course of an hour. ![Image](https://github.com/user-attachments/assets/2de66cc0-d594-49e1-b252-378cdb4c6c76) It may be a function of the type of grammar that is being created. Can you provide a simple script like the one above that demonstrates the problem?

GiteaMirror commented

2026-05-04 17:07:42 -05:00

@rohanshad commented on GitHub (Jun 8, 2025):

This is what my script kinda looks like. Strucured json response with an instance of AsyncClient spawned on each GPU. I can give you the whole script if needed. I'm pretty convinced the asyncio wrappers themselves aren't the issue, workers are spawned and terminate appropriately with no obvious leak on profiling. I can send you the whole script separately too if need be.


        async def chat(self, report_text, host, gpu_index, client):
		'''
		Core interface to Ollama server via AsyncClient 
		System message is hard coded 
		Report text fed in via worker function
		'''
		try:
			# Make the API request using the specified host and model
			response = await client.chat(
				model=self.model,
				messages=[{"role":"user", "content":self.system_msg+report_text}],
				format="json"
			)
			if self.debug:
				print('RAW OLLAMA RESPONSE:')
				print(response["message"]["content"])

			# Extract the response content
			response_json = json.loads(response['message']['content'].replace('\n','').strip().removeprefix('```json').removesuffix('```'))
			total_time = response['total_duration']/1e+9
			report_tokens = response['prompt_eval_count'] 

			return response_json, total_time, report_tokens

		except Exception as e:
			print(f"Error on host {host}")
			print(e)


	async def worker(self, host, gpu_index, task_queue):
		'''
		Worker function to pass report into chat function and return labelled result 
		Writes to list to collect outputs from all workers with lock 
		'''

		#Init ollama async client one time per worker
		client = ollama.AsyncClient(host=host)

		while not task_queue.empty():
	                report_text = await task_queue.get()

			# Prep and parse inputs
			response, total_time, report_tokens = await self.chat(report_text, host, gpu_index, client)
			
			record = {
			'total_time': total_time,
			'report_tokens': report_tokens,
			'report_json': response,
			}
			print(f'Labelling complete:{filename} | {host}:GPU#{gpu_index} | {round(total_time,2)}s')

			with self.lock:
				self.worker_outputs.append(record)

			del report_text
			del response
			del record

			task_queue.task_done()

@rohanshad commented on GitHub (Jun 8, 2025): This is what my script kinda looks like. Strucured json response with an instance of `AsyncClient` spawned on each GPU. I can give you the whole script if needed. I'm pretty convinced the asyncio wrappers themselves aren't the issue, workers are spawned and terminate appropriately with no obvious leak on profiling. I can send you the whole script separately too if need be. ```python async def chat(self, report_text, host, gpu_index, client): ''' Core interface to Ollama server via AsyncClient System message is hard coded Report text fed in via worker function ''' try: # Make the API request using the specified host and model response = await client.chat( model=self.model, messages=[{"role":"user", "content":self.system_msg+report_text}], format="json" ) if self.debug: print('RAW OLLAMA RESPONSE:') print(response["message"]["content"]) # Extract the response content response_json = json.loads(response['message']['content'].replace('\n','').strip().removeprefix('```json').removesuffix('```')) total_time = response['total_duration']/1e+9 report_tokens = response['prompt_eval_count'] return response_json, total_time, report_tokens except Exception as e: print(f"Error on host {host}") print(e) async def worker(self, host, gpu_index, task_queue): ''' Worker function to pass report into chat function and return labelled result Writes to list to collect outputs from all workers with lock ''' #Init ollama async client one time per worker client = ollama.AsyncClient(host=host) while not task_queue.empty(): report_text = await task_queue.get() # Prep and parse inputs response, total_time, report_tokens = await self.chat(report_text, host, gpu_index, client) record = { 'total_time': total_time, 'report_tokens': report_tokens, 'report_json': response, } print(f'Labelling complete:{filename} | {host}:GPU#{gpu_index} | {round(total_time,2)}s') with self.lock: self.worker_outputs.append(record) del report_text del response del record task_queue.task_done() ```

GiteaMirror commented

2026-05-04 17:07:43 -05:00

@rick-github commented on GitHub (Jun 8, 2025):

I was unable to duplicate with this script fragment, so I think the whole script and some example input is required. Server logs with OLLAMA_DEBUG=1 may also add some insight.

@rick-github commented on GitHub (Jun 8, 2025): I was unable to duplicate with this script fragment, so I think the whole script and some example input is required. [Server logs](https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md#how-to-troubleshoot-issues) with `OLLAMA_DEBUG=1` may also add some insight.

GiteaMirror commented

2026-05-04 17:07:44 -05:00

@rohanshad commented on GitHub (Jun 15, 2025):

Will send and prep a reproducible script soon, but here's server log with OLLAMA_DEBUG=1:

time=2025-06-14T20:24:09.722-04:00 level=INFO source=routes.go:1234 msg="server config" env="map[CUDA_VISIBLE_DEVICES:1 GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:DEBUG OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://localhost:11433 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:2562047h47m16.854775807s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:2h0m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/usr/share/ollama/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
time=2025-06-14T20:24:09.723-04:00 level=INFO source=images.go:479 msg="total blobs: 21"
time=2025-06-14T20:24:09.723-04:00 level=INFO source=images.go:486 msg="total unused blobs removed: 0"
time=2025-06-14T20:24:09.723-04:00 level=INFO source=routes.go:1287 msg="Listening on 127.0.0.1:11433 (version 0.9.0)"
time=2025-06-14T20:24:09.724-04:00 level=DEBUG source=sched.go:108 msg="starting llm scheduler"
time=2025-06-14T20:24:09.724-04:00 level=INFO source=gpu.go:217 msg="looking for compatible GPUs"
time=2025-06-14T20:24:09.728-04:00 level=DEBUG source=gpu.go:98 msg="searching for GPU discovery libraries for NVIDIA"
time=2025-06-14T20:24:09.728-04:00 level=DEBUG source=gpu.go:501 msg="Searching for GPU library" name=libcuda.so*
time=2025-06-14T20:24:09.728-04:00 level=DEBUG source=gpu.go:525 msg="gpu library search" globs="[/usr/local/lib/ollama/libcuda.so* /home/rohanshad/Hiesinger Lab Dropbox/MRI ML/cmr_core/utils/libcuda.so* /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]"
time=2025-06-14T20:24:09.733-04:00 level=DEBUG source=gpu.go:558 msg="discovered GPU libraries" paths="[/usr/lib/i386-linux-gnu/libcuda.so.535.230.02 /usr/lib/x86_64-linux-gnu/libcuda.so.535.230.02]"
initializing /usr/lib/i386-linux-gnu/libcuda.so.535.230.02
library /usr/lib/i386-linux-gnu/libcuda.so.535.230.02 load err: /usr/lib/i386-linux-gnu/libcuda.so.535.230.02: wrong ELF class: ELFCLASS32
time=2025-06-14T20:24:09.733-04:00 level=DEBUG source=gpu.go:609 msg="skipping 32bit library" library=/usr/lib/i386-linux-gnu/libcuda.so.535.230.02
initializing /usr/lib/x86_64-linux-gnu/libcuda.so.535.230.02
dlsym: cuInit - 0x7ed2504c2470
dlsym: cuDriverGetVersion - 0x7ed2504c2490
dlsym: cuDeviceGetCount - 0x7ed2504c24d0
dlsym: cuDeviceGet - 0x7ed2504c24b0
dlsym: cuDeviceGetAttribute - 0x7ed2504c25b0
dlsym: cuDeviceGetUuid - 0x7ed2504c2510
dlsym: cuDeviceGetName - 0x7ed2504c24f0
dlsym: cuCtxCreate_v3 - 0x7ed2504ca170
dlsym: cuMemGetInfo_v2 - 0x7ed2504d5640
dlsym: cuCtxDestroy - 0x7ed250524640
calling cuInit
calling cuDriverGetVersion
raw version 0x2ef4
CUDA driver version: 12.2
calling cuDeviceGetCount
device count 1
time=2025-06-14T20:24:09.740-04:00 level=DEBUG source=gpu.go:125 msg="detected GPUs" count=1 library=/usr/lib/x86_64-linux-gnu/libcuda.so.535.230.02
[GPU-ac66182e-fbc0-516e-a4c6-d0231a67ad6d] CUDA totalMem 24245mb
[GPU-ac66182e-fbc0-516e-a4c6-d0231a67ad6d] CUDA freeMem 22415mb
[GPU-ac66182e-fbc0-516e-a4c6-d0231a67ad6d] Compute Capability 8.6
time=2025-06-14T20:24:09.852-04:00 level=DEBUG source=amd_linux.go:419 msg="amdgpu driver not detected /sys/module/amdgpu"
releasing cuda driver library
time=2025-06-14T20:24:09.852-04:00 level=INFO source=types.go:130 msg="inference compute" id=GPU-ac66182e-fbc0-516e-a4c6-d0231a67ad6d library=cuda variant=v12 compute=8.6 driver=12.2 name="NVIDIA RTX A5000" total="23.7 GiB" available="21.9 GiB"
time=2025-06-14T20:24:19.847-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=general.alignment default=32
time=2025-06-14T20:24:19.848-04:00 level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="125.6 GiB" before.free="114.4 GiB" before.free_swap="327.6 MiB" now.total="125.6 GiB" now.free="114.4 GiB" now.free_swap="327.6 MiB"
initializing /usr/lib/x86_64-linux-gnu/libcuda.so.535.230.02
dlsym: cuInit - 0x7ed2504c2470
dlsym: cuDriverGetVersion - 0x7ed2504c2490
dlsym: cuDeviceGetCount - 0x7ed2504c24d0
dlsym: cuDeviceGet - 0x7ed2504c24b0
dlsym: cuDeviceGetAttribute - 0x7ed2504c25b0
dlsym: cuDeviceGetUuid - 0x7ed2504c2510
dlsym: cuDeviceGetName - 0x7ed2504c24f0
dlsym: cuCtxCreate_v3 - 0x7ed2504ca170
dlsym: cuMemGetInfo_v2 - 0x7ed2504d5640
dlsym: cuCtxDestroy - 0x7ed250524640
calling cuInit
calling cuDriverGetVersion
raw version 0x2ef4
CUDA driver version: 12.2
calling cuDeviceGetCount
device count 1
time=2025-06-14T20:24:20.129-04:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-ac66182e-fbc0-516e-a4c6-d0231a67ad6d name="NVIDIA RTX A5000" overhead="0 B" before.total="23.7 GiB" before.free="21.9 GiB" now.total="23.7 GiB" now.free="21.9 GiB" now.used="1.8 GiB"
releasing cuda driver library
time=2025-06-14T20:24:20.129-04:00 level=DEBUG source=sched.go:185 msg="updating default concurrency" OLLAMA_MAX_LOADED_MODELS=3 gpu_count=1
time=2025-06-14T20:24:20.179-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=general.alignment default=32
time=2025-06-14T20:24:20.211-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=general.alignment default=32
time=2025-06-14T20:24:20.213-04:00 level=DEBUG source=sched.go:228 msg="loading first model" model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87
time=2025-06-14T20:24:20.213-04:00 level=DEBUG source=memory.go:111 msg=evaluating library=cuda gpu_count=1 available="[21.9 GiB]"
time=2025-06-14T20:24:20.217-04:00 level=INFO source=sched.go:788 msg="new model will fit in available VRAM in single GPU, loading" model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 gpu=GPU-ac66182e-fbc0-516e-a4c6-d0231a67ad6d parallel=1 available=23504683008 required="19.9 GiB"
time=2025-06-14T20:24:20.218-04:00 level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="125.6 GiB" before.free="114.4 GiB" before.free_swap="327.6 MiB" now.total="125.6 GiB" now.free="114.3 GiB" now.free_swap="327.6 MiB"
initializing /usr/lib/x86_64-linux-gnu/libcuda.so.535.230.02
dlsym: cuInit - 0x7ed2504c2470
dlsym: cuDriverGetVersion - 0x7ed2504c2490
dlsym: cuDeviceGetCount - 0x7ed2504c24d0
dlsym: cuDeviceGet - 0x7ed2504c24b0
dlsym: cuDeviceGetAttribute - 0x7ed2504c25b0
dlsym: cuDeviceGetUuid - 0x7ed2504c2510
dlsym: cuDeviceGetName - 0x7ed2504c24f0
dlsym: cuCtxCreate_v3 - 0x7ed2504ca170
dlsym: cuMemGetInfo_v2 - 0x7ed2504d5640
dlsym: cuCtxDestroy - 0x7ed250524640
calling cuInit
calling cuDriverGetVersion
raw version 0x2ef4
CUDA driver version: 12.2
calling cuDeviceGetCount
device count 1
time=2025-06-14T20:24:20.465-04:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-ac66182e-fbc0-516e-a4c6-d0231a67ad6d name="NVIDIA RTX A5000" overhead="0 B" before.total="23.7 GiB" before.free="21.9 GiB" now.total="23.7 GiB" now.free="21.9 GiB" now.used="1.8 GiB"
releasing cuda driver library
time=2025-06-14T20:24:20.465-04:00 level=INFO source=server.go:135 msg="system memory" total="125.6 GiB" free="114.3 GiB" free_swap="327.6 MiB"
time=2025-06-14T20:24:20.465-04:00 level=DEBUG source=memory.go:111 msg=evaluating library=cuda gpu_count=1 available="[21.9 GiB]"
time=2025-06-14T20:24:20.467-04:00 level=INFO source=server.go:168 msg=offload library=cuda layers.requested=-1 layers.model=63 layers.offload=63 layers.split="" memory.available="[21.9 GiB]" memory.gpu_overhead="0 B" memory.required.full="19.9 GiB" memory.required.partial="19.9 GiB" memory.required.kv="944.0 MiB" memory.required.allocations="[19.9 GiB]" memory.weights.total="16.0 GiB" memory.weights.repeating="13.4 GiB" memory.weights.nonrepeating="2.6 GiB" memory.graph.full="522.5 MiB" memory.graph.partial="1.6 GiB" projector.weights="806.2 MiB" projector.graph="1.0 GiB"
time=2025-06-14T20:24:20.467-04:00 level=DEBUG source=server.go:284 msg="compatible gpu libraries" compatible="[cuda_v12 cuda_v11]"
time=2025-06-14T20:24:20.539-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=general.alignment default=32
time=2025-06-14T20:24:20.541-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=tokenizer.ggml.eot_token_id default=106
time=2025-06-14T20:24:20.541-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}"
time=2025-06-14T20:24:20.544-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=gemma3.rope.local.freq_base default=10000
time=2025-06-14T20:24:20.544-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06
time=2025-06-14T20:24:20.544-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=gemma3.rope.freq_scale default=1
time=2025-06-14T20:24:20.544-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=gemma3.mm_tokens_per_image default=256
time=2025-06-14T20:24:20.544-04:00 level=DEBUG source=server.go:360 msg="adding gpu library" path=/usr/local/lib/ollama/cuda_v12
time=2025-06-14T20:24:20.544-04:00 level=DEBUG source=server.go:367 msg="adding gpu dependency paths" paths=[/usr/local/lib/ollama/cuda_v12]
time=2025-06-14T20:24:20.544-04:00 level=INFO source=server.go:431 msg="starting llama server" cmd="/usr/local/bin/ollama runner --ollama-engine --model /usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 --ctx-size 4096 --batch-size 512 --n-gpu-layers 63 --threads 32 --parallel 1 --port 43009"
time=2025-06-14T20:24:20.544-04:00 level=DEBUG source=server.go:432 msg=subprocess OLLAMA_LOAD_TIMEOUT=120m OLLAMA_HOST=http://localhost:11433 OLLAMA_DEBUG=1 CUDA_VISIBLE_DEVICES=GPU-ac66182e-fbc0-516e-a4c6-d0231a67ad6d OLLAMA_KEEP_ALIVE=-1 PATH=/home/rohanshad/.local/bin:/home/rohanshad/anaconda3/envs/cmr_dev/bin:/home/rohanshad/anaconda3/condabin:/home/rohanshad/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/snap/bin OLLAMA_MODELS=/usr/share/ollama/.ollama/models OLLAMA_FLASH_ATTENTION=0 OLLAMA_NUM_PARALLEL=1 OLLAMA_MAX_LOADED_MODELS=3 OLLAMA_LIBRARY_PATH=/usr/local/lib/ollama:/usr/local/lib/ollama/cuda_v12 LD_LIBRARY_PATH=/usr/local/lib/ollama/cuda_v12:/usr/local/lib/ollama/cuda_v12:/usr/local/lib/ollama:/usr/local/lib/ollama
time=2025-06-14T20:24:20.545-04:00 level=INFO source=sched.go:483 msg="loaded runners" count=1
time=2025-06-14T20:24:20.545-04:00 level=INFO source=server.go:591 msg="waiting for llama runner to start responding"
time=2025-06-14T20:24:20.545-04:00 level=INFO source=server.go:625 msg="waiting for server to become available" status="llm server not responding"
time=2025-06-14T20:24:20.561-04:00 level=INFO source=runner.go:925 msg="starting ollama engine"
time=2025-06-14T20:24:20.562-04:00 level=INFO source=runner.go:983 msg="Server listening on 127.0.0.1:43009"
time=2025-06-14T20:24:20.616-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=general.alignment default=32
time=2025-06-14T20:24:20.618-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=general.name default=""
time=2025-06-14T20:24:20.618-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=general.description default=""
time=2025-06-14T20:24:20.618-04:00 level=INFO source=ggml.go:92 msg="" architecture=gemma3 file_type=Q4_0 name="" description="" num_tensors=1247 num_key_values=40
time=2025-06-14T20:24:20.618-04:00 level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/local/lib/ollama
load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-haswell.so
time=2025-06-14T20:24:20.623-04:00 level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/local/lib/ollama/cuda_v12
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA RTX A5000, compute capability 8.6, VMM: yes
load_backend: loaded CUDA backend from /usr/local/lib/ollama/cuda_v12/libggml-cuda.so
time=2025-06-14T20:24:20.698-04:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=500,600,610,700,750,800,860,870,890,900,1200 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc)
time=2025-06-14T20:24:20.797-04:00 level=INFO source=server.go:625 msg="waiting for server to become available" status="llm server loading model"
time=2025-06-14T20:24:20.864-04:00 level=INFO source=ggml.go:351 msg="model weights" buffer=CUDA0 size="16.8 GiB"
time=2025-06-14T20:24:20.864-04:00 level=INFO source=ggml.go:351 msg="model weights" buffer=CPU size="2.6 GiB"
time=2025-06-14T20:24:20.864-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=tokenizer.ggml.eot_token_id default=106
time=2025-06-14T20:24:20.864-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}"
time=2025-06-14T20:24:20.867-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=gemma3.rope.local.freq_base default=10000
time=2025-06-14T20:24:20.867-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06
time=2025-06-14T20:24:20.867-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=gemma3.rope.freq_scale default=1
time=2025-06-14T20:24:20.867-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=gemma3.mm_tokens_per_image default=256
time=2025-06-14T20:24:21.106-04:00 level=DEBUG source=ggml.go:620 msg="compute graph" nodes=972 splits=1
time=2025-06-14T20:24:21.106-04:00 level=INFO source=ggml.go:638 msg="compute graph" backend=CUDA0 buffer_type=CUDA0 size="1.1 GiB"
time=2025-06-14T20:24:21.106-04:00 level=INFO source=ggml.go:638 msg="compute graph" backend=CPU buffer_type=CPU size="0 B"
time=2025-06-14T20:24:21.133-04:00 level=DEBUG source=ggml.go:620 msg="compute graph" nodes=2737 splits=2
time=2025-06-14T20:24:21.133-04:00 level=INFO source=ggml.go:638 msg="compute graph" backend=CUDA0 buffer_type=CUDA0 size="1.1 GiB"
time=2025-06-14T20:24:21.133-04:00 level=INFO source=ggml.go:638 msg="compute graph" backend=CPU buffer_type=CPU size="10.5 MiB"
time=2025-06-14T20:24:21.134-04:00 level=DEBUG source=runner.go:883 msg=memory allocated.InputWeights=2818572288A allocated.CPU.Graph=11010048A allocated.CUDA0.Weights="[232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 3676309632A]" allocated.CUDA0.Cache="[12582912A 12582912A 12582912A 12582912A 12582912A 33554432A 12582912A 12582912A 12582912A 12582912A 12582912A 33554432A 12582912A 12582912A 12582912A 12582912A 12582912A 33554432A 12582912A 12582912A 12582912A 12582912A 12582912A 33554432A 12582912A 12582912A 12582912A 12582912A 12582912A 33554432A 12582912A 12582912A 12582912A 12582912A 12582912A 33554432A 12582912A 12582912A 12582912A 12582912A 12582912A 33554432A 12582912A 12582912A 12582912A 12582912A 12582912A 33554432A 12582912A 12582912A 12582912A 12582912A 12582912A 33554432A 12582912A 12582912A 12582912A 12582912A 12582912A 33554432A 12582912A 12582912A 0U]" allocated.CUDA0.Graph=1190150144A
time=2025-06-14T20:24:21.299-04:00 level=DEBUG source=server.go:636 msg="model load progress 0.06"
time=2025-06-14T20:24:21.550-04:00 level=DEBUG source=server.go:636 msg="model load progress 0.15"
time=2025-06-14T20:24:21.801-04:00 level=DEBUG source=server.go:636 msg="model load progress 0.22"
time=2025-06-14T20:24:22.052-04:00 level=DEBUG source=server.go:636 msg="model load progress 0.30"
time=2025-06-14T20:24:22.303-04:00 level=DEBUG source=server.go:636 msg="model load progress 0.38"
time=2025-06-14T20:24:22.554-04:00 level=DEBUG source=server.go:636 msg="model load progress 0.46"
time=2025-06-14T20:24:22.805-04:00 level=DEBUG source=server.go:636 msg="model load progress 0.54"
time=2025-06-14T20:24:23.056-04:00 level=DEBUG source=server.go:636 msg="model load progress 0.61"
time=2025-06-14T20:24:23.307-04:00 level=DEBUG source=server.go:636 msg="model load progress 0.69"
time=2025-06-14T20:24:23.558-04:00 level=DEBUG source=server.go:636 msg="model load progress 0.77"
time=2025-06-14T20:24:23.809-04:00 level=DEBUG source=server.go:636 msg="model load progress 0.85"
time=2025-06-14T20:24:24.060-04:00 level=DEBUG source=server.go:636 msg="model load progress 0.87"
time=2025-06-14T20:24:24.311-04:00 level=DEBUG source=server.go:636 msg="model load progress 0.89"
time=2025-06-14T20:24:24.561-04:00 level=DEBUG source=server.go:636 msg="model load progress 0.91"
time=2025-06-14T20:24:24.812-04:00 level=DEBUG source=server.go:636 msg="model load progress 0.93"
time=2025-06-14T20:24:25.063-04:00 level=DEBUG source=server.go:636 msg="model load progress 0.95"
time=2025-06-14T20:24:25.313-04:00 level=DEBUG source=server.go:636 msg="model load progress 0.97"
time=2025-06-14T20:24:25.564-04:00 level=DEBUG source=server.go:636 msg="model load progress 0.99"
time=2025-06-14T20:24:25.815-04:00 level=INFO source=server.go:630 msg="llama runner started in 5.27 seconds"
time=2025-06-14T20:24:25.815-04:00 level=DEBUG source=sched.go:495 msg="finished setting up" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096
time=2025-06-14T20:24:25.815-04:00 level=DEBUG source=server.go:729 msg="completion request" images=0 prompt=3158 format="\"json\""
time=2025-06-14T20:24:26.077-04:00 level=DEBUG source=vocabulary.go:52 msg="adding bos token to prompt" id=[2]
time=2025-06-14T20:24:26.077-04:00 level=DEBUG source=cache.go:136 msg="loading cache slot" id=0 cache=0 prompt=791 used=0 remaining=791
[GIN] 2025/06/14 - 20:24:41 | 200 | 21.398847101s |       127.0.0.1 | POST     "/api/chat"
time=2025-06-14T20:24:41.213-04:00 level=DEBUG source=sched.go:503 msg="context for request finished"
time=2025-06-14T20:24:41.213-04:00 level=DEBUG source=sched.go:343 msg="runner with non-zero duration has gone idle, adding timer" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 duration=2562047h47m16.854775807s
time=2025-06-14T20:24:41.213-04:00 level=DEBUG source=sched.go:361 msg="after processing request finished event" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 refCount=0
time=2025-06-14T20:24:41.270-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=general.alignment default=32
time=2025-06-14T20:24:41.272-04:00 level=DEBUG source=sched.go:615 msg="evaluating already loaded" model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87
time=2025-06-14T20:24:41.272-04:00 level=DEBUG source=server.go:729 msg="completion request" images=0 prompt=2979 format="\"json\""
time=2025-06-14T20:24:41.513-04:00 level=DEBUG source=vocabulary.go:52 msg="adding bos token to prompt" id=[2]
time=2025-06-14T20:24:41.513-04:00 level=DEBUG source=cache.go:136 msg="loading cache slot" id=0 cache=1170 prompt=759 used=0 remaining=759
[GIN] 2025/06/14 - 20:24:56 | 200 | 15.389986769s |       127.0.0.1 | POST     "/api/chat"
time=2025-06-14T20:24:56.606-04:00 level=DEBUG source=sched.go:434 msg="context for request finished" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096
time=2025-06-14T20:24:56.606-04:00 level=DEBUG source=sched.go:343 msg="runner with non-zero duration has gone idle, adding timer" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 duration=2562047h47m16.854775807s
time=2025-06-14T20:24:56.606-04:00 level=DEBUG source=sched.go:361 msg="after processing request finished event" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 refCount=0
time=2025-06-14T20:24:56.650-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=general.alignment default=32
time=2025-06-14T20:24:56.652-04:00 level=DEBUG source=sched.go:615 msg="evaluating already loaded" model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87
time=2025-06-14T20:24:56.653-04:00 level=DEBUG source=server.go:729 msg="completion request" images=0 prompt=3331 format="\"json\""
time=2025-06-14T20:24:56.823-04:00 level=DEBUG source=vocabulary.go:52 msg="adding bos token to prompt" id=[2]
time=2025-06-14T20:24:56.824-04:00 level=DEBUG source=cache.go:136 msg="loading cache slot" id=0 cache=1138 prompt=870 used=0 remaining=870
[GIN] 2025/06/14 - 20:25:13 | 200 | 16.622287091s |       127.0.0.1 | POST     "/api/chat"
time=2025-06-14T20:25:13.232-04:00 level=DEBUG source=sched.go:434 msg="context for request finished" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096
time=2025-06-14T20:25:13.232-04:00 level=DEBUG source=sched.go:343 msg="runner with non-zero duration has gone idle, adding timer" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 duration=2562047h47m16.854775807s
time=2025-06-14T20:25:13.232-04:00 level=DEBUG source=sched.go:361 msg="after processing request finished event" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 refCount=0
time=2025-06-14T20:25:13.292-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=general.alignment default=32
time=2025-06-14T20:25:13.294-04:00 level=DEBUG source=sched.go:615 msg="evaluating already loaded" model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87
time=2025-06-14T20:25:13.294-04:00 level=DEBUG source=server.go:729 msg="completion request" images=0 prompt=4148 format="\"json\""
time=2025-06-14T20:25:13.527-04:00 level=DEBUG source=vocabulary.go:52 msg="adding bos token to prompt" id=[2]
time=2025-06-14T20:25:13.527-04:00 level=DEBUG source=cache.go:136 msg="loading cache slot" id=0 cache=1285 prompt=931 used=0 remaining=931
[GIN] 2025/06/14 - 20:25:30 | 200 |  16.87517067s |       127.0.0.1 | POST     "/api/chat"
time=2025-06-14T20:25:30.113-04:00 level=DEBUG source=sched.go:434 msg="context for request finished" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096
time=2025-06-14T20:25:30.113-04:00 level=DEBUG source=sched.go:343 msg="runner with non-zero duration has gone idle, adding timer" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 duration=2562047h47m16.854775807s
time=2025-06-14T20:25:30.113-04:00 level=DEBUG source=sched.go:361 msg="after processing request finished event" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 refCount=0
time=2025-06-14T20:25:30.172-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=general.alignment default=32
time=2025-06-14T20:25:30.174-04:00 level=DEBUG source=sched.go:615 msg="evaluating already loaded" model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87
time=2025-06-14T20:25:30.174-04:00 level=DEBUG source=server.go:729 msg="completion request" images=0 prompt=2802 format="\"json\""
time=2025-06-14T20:25:30.407-04:00 level=DEBUG source=vocabulary.go:52 msg="adding bos token to prompt" id=[2]
time=2025-06-14T20:25:30.407-04:00 level=DEBUG source=cache.go:136 msg="loading cache slot" id=0 cache=1346 prompt=721 used=0 remaining=721
[GIN] 2025/06/14 - 20:25:43 | 200 | 13.766561764s |       127.0.0.1 | POST     "/api/chat"
time=2025-06-14T20:25:43.884-04:00 level=DEBUG source=sched.go:434 msg="context for request finished" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096
time=2025-06-14T20:25:43.884-04:00 level=DEBUG source=sched.go:343 msg="runner with non-zero duration has gone idle, adding timer" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 duration=2562047h47m16.854775807s
time=2025-06-14T20:25:43.884-04:00 level=DEBUG source=sched.go:361 msg="after processing request finished event" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 refCount=0
time=2025-06-14T20:25:43.940-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=general.alignment default=32
time=2025-06-14T20:25:43.943-04:00 level=DEBUG source=sched.go:615 msg="evaluating already loaded" model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87
time=2025-06-14T20:25:43.944-04:00 level=DEBUG source=server.go:729 msg="completion request" images=0 prompt=2965 format="\"json\""
time=2025-06-14T20:25:44.179-04:00 level=DEBUG source=vocabulary.go:52 msg="adding bos token to prompt" id=[2]
time=2025-06-14T20:25:44.180-04:00 level=DEBUG source=cache.go:136 msg="loading cache slot" id=0 cache=1061 prompt=753 used=0 remaining=753
[GIN] 2025/06/14 - 20:26:00 | 200 | 16.590631273s |       127.0.0.1 | POST     "/api/chat"
time=2025-06-14T20:26:00.479-04:00 level=DEBUG source=sched.go:434 msg="context for request finished" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096
time=2025-06-14T20:26:00.479-04:00 level=DEBUG source=sched.go:343 msg="runner with non-zero duration has gone idle, adding timer" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 duration=2562047h47m16.854775807s
time=2025-06-14T20:26:00.479-04:00 level=DEBUG source=sched.go:361 msg="after processing request finished event" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 refCount=0
time=2025-06-14T20:26:00.534-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=general.alignment default=32
time=2025-06-14T20:26:00.536-04:00 level=DEBUG source=sched.go:615 msg="evaluating already loaded" model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87
time=2025-06-14T20:26:00.536-04:00 level=DEBUG source=server.go:729 msg="completion request" images=0 prompt=3681 format="\"json\""
time=2025-06-14T20:26:00.756-04:00 level=DEBUG source=vocabulary.go:52 msg="adding bos token to prompt" id=[2]
time=2025-06-14T20:26:00.756-04:00 level=DEBUG source=cache.go:136 msg="loading cache slot" id=0 cache=1168 prompt=902 used=0 remaining=902
[GIN] 2025/06/14 - 20:26:17 | 200 | 16.851443096s |       127.0.0.1 | POST     "/api/chat"
time=2025-06-14T20:26:17.335-04:00 level=DEBUG source=sched.go:434 msg="context for request finished" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096
time=2025-06-14T20:26:17.335-04:00 level=DEBUG source=sched.go:343 msg="runner with non-zero duration has gone idle, adding timer" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 duration=2562047h47m16.854775807s
time=2025-06-14T20:26:17.335-04:00 level=DEBUG source=sched.go:361 msg="after processing request finished event" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 refCount=0
time=2025-06-14T20:26:17.391-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=general.alignment default=32
time=2025-06-14T20:26:17.393-04:00 level=DEBUG source=sched.go:615 msg="evaluating already loaded" model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87
time=2025-06-14T20:26:17.394-04:00 level=DEBUG source=server.go:729 msg="completion request" images=0 prompt=4189 format="\"json\""
time=2025-06-14T20:26:17.617-04:00 level=DEBUG source=vocabulary.go:52 msg="adding bos token to prompt" id=[2]
time=2025-06-14T20:26:17.617-04:00 level=DEBUG source=cache.go:136 msg="loading cache slot" id=0 cache=1317 prompt=1121 used=0 remaining=1121
[GIN] 2025/06/14 - 20:26:33 | 200 | 15.704443348s |       127.0.0.1 | POST     "/api/chat"
time=2025-06-14T20:26:33.045-04:00 level=DEBUG source=sched.go:434 msg="context for request finished" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096
time=2025-06-14T20:26:33.045-04:00 level=DEBUG source=sched.go:343 msg="runner with non-zero duration has gone idle, adding timer" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 duration=2562047h47m16.854775807s
time=2025-06-14T20:26:33.045-04:00 level=DEBUG source=sched.go:361 msg="after processing request finished event" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 refCount=0
time=2025-06-14T20:26:33.104-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=general.alignment default=32
time=2025-06-14T20:26:33.106-04:00 level=DEBUG source=sched.go:615 msg="evaluating already loaded" model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87
time=2025-06-14T20:26:33.106-04:00 level=DEBUG source=server.go:729 msg="completion request" images=0 prompt=3249 format="\"json\""
time=2025-06-14T20:26:33.319-04:00 level=DEBUG source=vocabulary.go:52 msg="adding bos token to prompt" id=[2]
time=2025-06-14T20:26:33.319-04:00 level=DEBUG source=cache.go:136 msg="loading cache slot" id=0 cache=1500 prompt=832 used=0 remaining=832
[GIN] 2025/06/14 - 20:26:48 | 200 | 15.555706737s |       127.0.0.1 | POST     "/api/chat"
time=2025-06-14T20:26:48.605-04:00 level=DEBUG source=sched.go:434 msg="context for request finished" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096
time=2025-06-14T20:26:48.605-04:00 level=DEBUG source=sched.go:343 msg="runner with non-zero duration has gone idle, adding timer" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 duration=2562047h47m16.854775807s
time=2025-06-14T20:26:48.605-04:00 level=DEBUG source=sched.go:361 msg="after processing request finished event" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 refCount=0
time=2025-06-14T20:26:48.642-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=general.alignment default=32
time=2025-06-14T20:26:48.644-04:00 level=DEBUG source=sched.go:615 msg="evaluating already loaded" model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87
time=2025-06-14T20:26:48.644-04:00 level=DEBUG source=server.go:729 msg="completion request" images=0 prompt=6622 format="\"json\""
time=2025-06-14T20:26:48.878-04:00 level=DEBUG source=vocabulary.go:52 msg="adding bos token to prompt" id=[2]
time=2025-06-14T20:26:48.878-04:00 level=DEBUG source=cache.go:136 msg="loading cache slot" id=0 cache=1211 prompt=1567 used=0 remaining=1567
[GIN] 2025/06/14 - 20:27:05 | 200 | 16.591033158s |       127.0.0.1 | POST     "/api/chat"
time=2025-06-14T20:27:05.200-04:00 level=DEBUG source=sched.go:434 msg="context for request finished" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096
time=2025-06-14T20:27:05.201-04:00 level=DEBUG source=sched.go:343 msg="runner with non-zero duration has gone idle, adding timer" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 duration=2562047h47m16.854775807s
time=2025-06-14T20:27:05.201-04:00 level=DEBUG source=sched.go:361 msg="after processing request finished event" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 refCount=0
time=2025-06-14T20:27:05.255-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=general.alignment default=32
time=2025-06-14T20:27:05.257-04:00 level=DEBUG source=sched.go:615 msg="evaluating already loaded" model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87
time=2025-06-14T20:27:05.257-04:00 level=DEBUG source=server.go:729 msg="completion request" images=0 prompt=3115 format="\"json\""
time=2025-06-14T20:27:05.464-04:00 level=DEBUG source=vocabulary.go:52 msg="adding bos token to prompt" id=[2]
time=2025-06-14T20:27:05.464-04:00 level=DEBUG source=cache.go:136 msg="loading cache slot" id=0 cache=1946 prompt=780 used=0 remaining=780
[GIN] 2025/06/14 - 20:27:20 | 200 | 15.367599033s |       127.0.0.1 | POST     "/api/chat"
time=2025-06-14T20:27:20.573-04:00 level=DEBUG source=sched.go:434 msg="context for request finished" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096
time=2025-06-14T20:27:20.573-04:00 level=DEBUG source=sched.go:343 msg="runner with non-zero duration has gone idle, adding timer" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 duration=2562047h47m16.854775807s
time=2025-06-14T20:27:20.573-04:00 level=DEBUG source=sched.go:361 msg="after processing request finished event" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 refCount=0
time=2025-06-14T20:27:20.631-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=general.alignment default=32
time=2025-06-14T20:27:20.633-04:00 level=DEBUG source=sched.go:615 msg="evaluating already loaded" model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87
time=2025-06-14T20:27:20.633-04:00 level=DEBUG source=server.go:729 msg="completion request" images=0 prompt=3543 format="\"json\""
time=2025-06-14T20:27:20.876-04:00 level=DEBUG source=vocabulary.go:52 msg="adding bos token to prompt" id=[2]
time=2025-06-14T20:27:20.876-04:00 level=DEBUG source=cache.go:136 msg="loading cache slot" id=0 cache=1159 prompt=815 used=0 remaining=815
[GIN] 2025/06/14 - 20:27:35 | 200 | 15.308709032s |       127.0.0.1 | POST     "/api/chat"
time=2025-06-14T20:27:35.886-04:00 level=DEBUG source=sched.go:434 msg="context for request finished" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096
time=2025-06-14T20:27:35.886-04:00 level=DEBUG source=sched.go:343 msg="runner with non-zero duration has gone idle, adding timer" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 duration=2562047h47m16.854775807s
time=2025-06-14T20:27:35.886-04:00 level=DEBUG source=sched.go:361 msg="after processing request finished event" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 refCount=0
time=2025-06-14T20:27:35.921-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=general.alignment default=32
time=2025-06-14T20:27:35.922-04:00 level=DEBUG source=sched.go:615 msg="evaluating already loaded" model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87
time=2025-06-14T20:27:35.923-04:00 level=DEBUG source=server.go:729 msg="completion request" images=0 prompt=4373 format="\"json\""
time=2025-06-14T20:27:36.118-04:00 level=DEBUG source=vocabulary.go:52 msg="adding bos token to prompt" id=[2]
time=2025-06-14T20:27:36.118-04:00 level=DEBUG source=cache.go:136 msg="loading cache slot" id=0 cache=1194 prompt=1036 used=0 remaining=1036
[GIN] 2025/06/14 - 20:27:51 | 200 | 15.496595172s |       127.0.0.1 | POST     "/api/chat"
time=2025-06-14T20:27:51.388-04:00 level=DEBUG source=sched.go:434 msg="context for request finished" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096
time=2025-06-14T20:27:51.388-04:00 level=DEBUG source=sched.go:343 msg="runner with non-zero duration has gone idle, adding timer" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 duration=2562047h47m16.854775807s
time=2025-06-14T20:27:51.388-04:00 level=DEBUG source=sched.go:361 msg="after processing request finished event" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 refCount=0
time=2025-06-14T20:27:51.561-04:00 level=DEBUG source=sched.go:872 msg="shutting down runner" model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87
time=2025-06-14T20:27:51.561-04:00 level=DEBUG source=server.go:1023 msg="stopping llama server" pid=3566569
time=2025-06-14T20:27:51.561-04:00 level=DEBUG source=server.go:1029 msg="waiting for llama server to exit" pid=3566569
time=2025-06-14T20:27:51.561-04:00 level=DEBUG source=sched.go:122 msg="shutting down scheduler pending loop"
time=2025-06-14T20:27:51.561-04:00 level=DEBUG source=sched.go:322 msg="shutting down scheduler completed loop"
time=2025-06-14T20:27:52.053-04:00 level=DEBUG source=server.go:1033 msg="llama server stopped" pid=3566569

@rohanshad commented on GitHub (Jun 15, 2025): Will send and prep a reproducible script soon, but here's server log with `OLLAMA_DEBUG=1`: ``` time=2025-06-14T20:24:09.722-04:00 level=INFO source=routes.go:1234 msg="server config" env="map[CUDA_VISIBLE_DEVICES:1 GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:DEBUG OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://localhost:11433 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:2562047h47m16.854775807s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:2h0m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/usr/share/ollama/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]" time=2025-06-14T20:24:09.723-04:00 level=INFO source=images.go:479 msg="total blobs: 21" time=2025-06-14T20:24:09.723-04:00 level=INFO source=images.go:486 msg="total unused blobs removed: 0" time=2025-06-14T20:24:09.723-04:00 level=INFO source=routes.go:1287 msg="Listening on 127.0.0.1:11433 (version 0.9.0)" time=2025-06-14T20:24:09.724-04:00 level=DEBUG source=sched.go:108 msg="starting llm scheduler" time=2025-06-14T20:24:09.724-04:00 level=INFO source=gpu.go:217 msg="looking for compatible GPUs" time=2025-06-14T20:24:09.728-04:00 level=DEBUG source=gpu.go:98 msg="searching for GPU discovery libraries for NVIDIA" time=2025-06-14T20:24:09.728-04:00 level=DEBUG source=gpu.go:501 msg="Searching for GPU library" name=libcuda.so* time=2025-06-14T20:24:09.728-04:00 level=DEBUG source=gpu.go:525 msg="gpu library search" globs="[/usr/local/lib/ollama/libcuda.so* /home/rohanshad/Hiesinger Lab Dropbox/MRI ML/cmr_core/utils/libcuda.so* /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]" time=2025-06-14T20:24:09.733-04:00 level=DEBUG source=gpu.go:558 msg="discovered GPU libraries" paths="[/usr/lib/i386-linux-gnu/libcuda.so.535.230.02 /usr/lib/x86_64-linux-gnu/libcuda.so.535.230.02]" initializing /usr/lib/i386-linux-gnu/libcuda.so.535.230.02 library /usr/lib/i386-linux-gnu/libcuda.so.535.230.02 load err: /usr/lib/i386-linux-gnu/libcuda.so.535.230.02: wrong ELF class: ELFCLASS32 time=2025-06-14T20:24:09.733-04:00 level=DEBUG source=gpu.go:609 msg="skipping 32bit library" library=/usr/lib/i386-linux-gnu/libcuda.so.535.230.02 initializing /usr/lib/x86_64-linux-gnu/libcuda.so.535.230.02 dlsym: cuInit - 0x7ed2504c2470 dlsym: cuDriverGetVersion - 0x7ed2504c2490 dlsym: cuDeviceGetCount - 0x7ed2504c24d0 dlsym: cuDeviceGet - 0x7ed2504c24b0 dlsym: cuDeviceGetAttribute - 0x7ed2504c25b0 dlsym: cuDeviceGetUuid - 0x7ed2504c2510 dlsym: cuDeviceGetName - 0x7ed2504c24f0 dlsym: cuCtxCreate_v3 - 0x7ed2504ca170 dlsym: cuMemGetInfo_v2 - 0x7ed2504d5640 dlsym: cuCtxDestroy - 0x7ed250524640 calling cuInit calling cuDriverGetVersion raw version 0x2ef4 CUDA driver version: 12.2 calling cuDeviceGetCount device count 1 time=2025-06-14T20:24:09.740-04:00 level=DEBUG source=gpu.go:125 msg="detected GPUs" count=1 library=/usr/lib/x86_64-linux-gnu/libcuda.so.535.230.02 [GPU-ac66182e-fbc0-516e-a4c6-d0231a67ad6d] CUDA totalMem 24245mb [GPU-ac66182e-fbc0-516e-a4c6-d0231a67ad6d] CUDA freeMem 22415mb [GPU-ac66182e-fbc0-516e-a4c6-d0231a67ad6d] Compute Capability 8.6 time=2025-06-14T20:24:09.852-04:00 level=DEBUG source=amd_linux.go:419 msg="amdgpu driver not detected /sys/module/amdgpu" releasing cuda driver library time=2025-06-14T20:24:09.852-04:00 level=INFO source=types.go:130 msg="inference compute" id=GPU-ac66182e-fbc0-516e-a4c6-d0231a67ad6d library=cuda variant=v12 compute=8.6 driver=12.2 name="NVIDIA RTX A5000" total="23.7 GiB" available="21.9 GiB" time=2025-06-14T20:24:19.847-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=general.alignment default=32 time=2025-06-14T20:24:19.848-04:00 level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="125.6 GiB" before.free="114.4 GiB" before.free_swap="327.6 MiB" now.total="125.6 GiB" now.free="114.4 GiB" now.free_swap="327.6 MiB" initializing /usr/lib/x86_64-linux-gnu/libcuda.so.535.230.02 dlsym: cuInit - 0x7ed2504c2470 dlsym: cuDriverGetVersion - 0x7ed2504c2490 dlsym: cuDeviceGetCount - 0x7ed2504c24d0 dlsym: cuDeviceGet - 0x7ed2504c24b0 dlsym: cuDeviceGetAttribute - 0x7ed2504c25b0 dlsym: cuDeviceGetUuid - 0x7ed2504c2510 dlsym: cuDeviceGetName - 0x7ed2504c24f0 dlsym: cuCtxCreate_v3 - 0x7ed2504ca170 dlsym: cuMemGetInfo_v2 - 0x7ed2504d5640 dlsym: cuCtxDestroy - 0x7ed250524640 calling cuInit calling cuDriverGetVersion raw version 0x2ef4 CUDA driver version: 12.2 calling cuDeviceGetCount device count 1 time=2025-06-14T20:24:20.129-04:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-ac66182e-fbc0-516e-a4c6-d0231a67ad6d name="NVIDIA RTX A5000" overhead="0 B" before.total="23.7 GiB" before.free="21.9 GiB" now.total="23.7 GiB" now.free="21.9 GiB" now.used="1.8 GiB" releasing cuda driver library time=2025-06-14T20:24:20.129-04:00 level=DEBUG source=sched.go:185 msg="updating default concurrency" OLLAMA_MAX_LOADED_MODELS=3 gpu_count=1 time=2025-06-14T20:24:20.179-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=general.alignment default=32 time=2025-06-14T20:24:20.211-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=general.alignment default=32 time=2025-06-14T20:24:20.213-04:00 level=DEBUG source=sched.go:228 msg="loading first model" model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 time=2025-06-14T20:24:20.213-04:00 level=DEBUG source=memory.go:111 msg=evaluating library=cuda gpu_count=1 available="[21.9 GiB]" time=2025-06-14T20:24:20.217-04:00 level=INFO source=sched.go:788 msg="new model will fit in available VRAM in single GPU, loading" model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 gpu=GPU-ac66182e-fbc0-516e-a4c6-d0231a67ad6d parallel=1 available=23504683008 required="19.9 GiB" time=2025-06-14T20:24:20.218-04:00 level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="125.6 GiB" before.free="114.4 GiB" before.free_swap="327.6 MiB" now.total="125.6 GiB" now.free="114.3 GiB" now.free_swap="327.6 MiB" initializing /usr/lib/x86_64-linux-gnu/libcuda.so.535.230.02 dlsym: cuInit - 0x7ed2504c2470 dlsym: cuDriverGetVersion - 0x7ed2504c2490 dlsym: cuDeviceGetCount - 0x7ed2504c24d0 dlsym: cuDeviceGet - 0x7ed2504c24b0 dlsym: cuDeviceGetAttribute - 0x7ed2504c25b0 dlsym: cuDeviceGetUuid - 0x7ed2504c2510 dlsym: cuDeviceGetName - 0x7ed2504c24f0 dlsym: cuCtxCreate_v3 - 0x7ed2504ca170 dlsym: cuMemGetInfo_v2 - 0x7ed2504d5640 dlsym: cuCtxDestroy - 0x7ed250524640 calling cuInit calling cuDriverGetVersion raw version 0x2ef4 CUDA driver version: 12.2 calling cuDeviceGetCount device count 1 time=2025-06-14T20:24:20.465-04:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-ac66182e-fbc0-516e-a4c6-d0231a67ad6d name="NVIDIA RTX A5000" overhead="0 B" before.total="23.7 GiB" before.free="21.9 GiB" now.total="23.7 GiB" now.free="21.9 GiB" now.used="1.8 GiB" releasing cuda driver library time=2025-06-14T20:24:20.465-04:00 level=INFO source=server.go:135 msg="system memory" total="125.6 GiB" free="114.3 GiB" free_swap="327.6 MiB" time=2025-06-14T20:24:20.465-04:00 level=DEBUG source=memory.go:111 msg=evaluating library=cuda gpu_count=1 available="[21.9 GiB]" time=2025-06-14T20:24:20.467-04:00 level=INFO source=server.go:168 msg=offload library=cuda layers.requested=-1 layers.model=63 layers.offload=63 layers.split="" memory.available="[21.9 GiB]" memory.gpu_overhead="0 B" memory.required.full="19.9 GiB" memory.required.partial="19.9 GiB" memory.required.kv="944.0 MiB" memory.required.allocations="[19.9 GiB]" memory.weights.total="16.0 GiB" memory.weights.repeating="13.4 GiB" memory.weights.nonrepeating="2.6 GiB" memory.graph.full="522.5 MiB" memory.graph.partial="1.6 GiB" projector.weights="806.2 MiB" projector.graph="1.0 GiB" time=2025-06-14T20:24:20.467-04:00 level=DEBUG source=server.go:284 msg="compatible gpu libraries" compatible="[cuda_v12 cuda_v11]" time=2025-06-14T20:24:20.539-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=general.alignment default=32 time=2025-06-14T20:24:20.541-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=tokenizer.ggml.eot_token_id default=106 time=2025-06-14T20:24:20.541-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}" time=2025-06-14T20:24:20.544-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=gemma3.rope.local.freq_base default=10000 time=2025-06-14T20:24:20.544-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06 time=2025-06-14T20:24:20.544-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=gemma3.rope.freq_scale default=1 time=2025-06-14T20:24:20.544-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=gemma3.mm_tokens_per_image default=256 time=2025-06-14T20:24:20.544-04:00 level=DEBUG source=server.go:360 msg="adding gpu library" path=/usr/local/lib/ollama/cuda_v12 time=2025-06-14T20:24:20.544-04:00 level=DEBUG source=server.go:367 msg="adding gpu dependency paths" paths=[/usr/local/lib/ollama/cuda_v12] time=2025-06-14T20:24:20.544-04:00 level=INFO source=server.go:431 msg="starting llama server" cmd="/usr/local/bin/ollama runner --ollama-engine --model /usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 --ctx-size 4096 --batch-size 512 --n-gpu-layers 63 --threads 32 --parallel 1 --port 43009" time=2025-06-14T20:24:20.544-04:00 level=DEBUG source=server.go:432 msg=subprocess OLLAMA_LOAD_TIMEOUT=120m OLLAMA_HOST=http://localhost:11433 OLLAMA_DEBUG=1 CUDA_VISIBLE_DEVICES=GPU-ac66182e-fbc0-516e-a4c6-d0231a67ad6d OLLAMA_KEEP_ALIVE=-1 PATH=/home/rohanshad/.local/bin:/home/rohanshad/anaconda3/envs/cmr_dev/bin:/home/rohanshad/anaconda3/condabin:/home/rohanshad/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/snap/bin OLLAMA_MODELS=/usr/share/ollama/.ollama/models OLLAMA_FLASH_ATTENTION=0 OLLAMA_NUM_PARALLEL=1 OLLAMA_MAX_LOADED_MODELS=3 OLLAMA_LIBRARY_PATH=/usr/local/lib/ollama:/usr/local/lib/ollama/cuda_v12 LD_LIBRARY_PATH=/usr/local/lib/ollama/cuda_v12:/usr/local/lib/ollama/cuda_v12:/usr/local/lib/ollama:/usr/local/lib/ollama time=2025-06-14T20:24:20.545-04:00 level=INFO source=sched.go:483 msg="loaded runners" count=1 time=2025-06-14T20:24:20.545-04:00 level=INFO source=server.go:591 msg="waiting for llama runner to start responding" time=2025-06-14T20:24:20.545-04:00 level=INFO source=server.go:625 msg="waiting for server to become available" status="llm server not responding" time=2025-06-14T20:24:20.561-04:00 level=INFO source=runner.go:925 msg="starting ollama engine" time=2025-06-14T20:24:20.562-04:00 level=INFO source=runner.go:983 msg="Server listening on 127.0.0.1:43009" time=2025-06-14T20:24:20.616-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=general.alignment default=32 time=2025-06-14T20:24:20.618-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=general.name default="" time=2025-06-14T20:24:20.618-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=general.description default="" time=2025-06-14T20:24:20.618-04:00 level=INFO source=ggml.go:92 msg="" architecture=gemma3 file_type=Q4_0 name="" description="" num_tensors=1247 num_key_values=40 time=2025-06-14T20:24:20.618-04:00 level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/local/lib/ollama load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-haswell.so time=2025-06-14T20:24:20.623-04:00 level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/local/lib/ollama/cuda_v12 ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 CUDA devices: Device 0: NVIDIA RTX A5000, compute capability 8.6, VMM: yes load_backend: loaded CUDA backend from /usr/local/lib/ollama/cuda_v12/libggml-cuda.so time=2025-06-14T20:24:20.698-04:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=500,600,610,700,750,800,860,870,890,900,1200 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc) time=2025-06-14T20:24:20.797-04:00 level=INFO source=server.go:625 msg="waiting for server to become available" status="llm server loading model" time=2025-06-14T20:24:20.864-04:00 level=INFO source=ggml.go:351 msg="model weights" buffer=CUDA0 size="16.8 GiB" time=2025-06-14T20:24:20.864-04:00 level=INFO source=ggml.go:351 msg="model weights" buffer=CPU size="2.6 GiB" time=2025-06-14T20:24:20.864-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=tokenizer.ggml.eot_token_id default=106 time=2025-06-14T20:24:20.864-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}" time=2025-06-14T20:24:20.867-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=gemma3.rope.local.freq_base default=10000 time=2025-06-14T20:24:20.867-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06 time=2025-06-14T20:24:20.867-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=gemma3.rope.freq_scale default=1 time=2025-06-14T20:24:20.867-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=gemma3.mm_tokens_per_image default=256 time=2025-06-14T20:24:21.106-04:00 level=DEBUG source=ggml.go:620 msg="compute graph" nodes=972 splits=1 time=2025-06-14T20:24:21.106-04:00 level=INFO source=ggml.go:638 msg="compute graph" backend=CUDA0 buffer_type=CUDA0 size="1.1 GiB" time=2025-06-14T20:24:21.106-04:00 level=INFO source=ggml.go:638 msg="compute graph" backend=CPU buffer_type=CPU size="0 B" time=2025-06-14T20:24:21.133-04:00 level=DEBUG source=ggml.go:620 msg="compute graph" nodes=2737 splits=2 time=2025-06-14T20:24:21.133-04:00 level=INFO source=ggml.go:638 msg="compute graph" backend=CUDA0 buffer_type=CUDA0 size="1.1 GiB" time=2025-06-14T20:24:21.133-04:00 level=INFO source=ggml.go:638 msg="compute graph" backend=CPU buffer_type=CPU size="10.5 MiB" time=2025-06-14T20:24:21.134-04:00 level=DEBUG source=runner.go:883 msg=memory allocated.InputWeights=2818572288A allocated.CPU.Graph=11010048A allocated.CUDA0.Weights="[232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 232331520A 3676309632A]" allocated.CUDA0.Cache="[12582912A 12582912A 12582912A 12582912A 12582912A 33554432A 12582912A 12582912A 12582912A 12582912A 12582912A 33554432A 12582912A 12582912A 12582912A 12582912A 12582912A 33554432A 12582912A 12582912A 12582912A 12582912A 12582912A 33554432A 12582912A 12582912A 12582912A 12582912A 12582912A 33554432A 12582912A 12582912A 12582912A 12582912A 12582912A 33554432A 12582912A 12582912A 12582912A 12582912A 12582912A 33554432A 12582912A 12582912A 12582912A 12582912A 12582912A 33554432A 12582912A 12582912A 12582912A 12582912A 12582912A 33554432A 12582912A 12582912A 12582912A 12582912A 12582912A 33554432A 12582912A 12582912A 0U]" allocated.CUDA0.Graph=1190150144A time=2025-06-14T20:24:21.299-04:00 level=DEBUG source=server.go:636 msg="model load progress 0.06" time=2025-06-14T20:24:21.550-04:00 level=DEBUG source=server.go:636 msg="model load progress 0.15" time=2025-06-14T20:24:21.801-04:00 level=DEBUG source=server.go:636 msg="model load progress 0.22" time=2025-06-14T20:24:22.052-04:00 level=DEBUG source=server.go:636 msg="model load progress 0.30" time=2025-06-14T20:24:22.303-04:00 level=DEBUG source=server.go:636 msg="model load progress 0.38" time=2025-06-14T20:24:22.554-04:00 level=DEBUG source=server.go:636 msg="model load progress 0.46" time=2025-06-14T20:24:22.805-04:00 level=DEBUG source=server.go:636 msg="model load progress 0.54" time=2025-06-14T20:24:23.056-04:00 level=DEBUG source=server.go:636 msg="model load progress 0.61" time=2025-06-14T20:24:23.307-04:00 level=DEBUG source=server.go:636 msg="model load progress 0.69" time=2025-06-14T20:24:23.558-04:00 level=DEBUG source=server.go:636 msg="model load progress 0.77" time=2025-06-14T20:24:23.809-04:00 level=DEBUG source=server.go:636 msg="model load progress 0.85" time=2025-06-14T20:24:24.060-04:00 level=DEBUG source=server.go:636 msg="model load progress 0.87" time=2025-06-14T20:24:24.311-04:00 level=DEBUG source=server.go:636 msg="model load progress 0.89" time=2025-06-14T20:24:24.561-04:00 level=DEBUG source=server.go:636 msg="model load progress 0.91" time=2025-06-14T20:24:24.812-04:00 level=DEBUG source=server.go:636 msg="model load progress 0.93" time=2025-06-14T20:24:25.063-04:00 level=DEBUG source=server.go:636 msg="model load progress 0.95" time=2025-06-14T20:24:25.313-04:00 level=DEBUG source=server.go:636 msg="model load progress 0.97" time=2025-06-14T20:24:25.564-04:00 level=DEBUG source=server.go:636 msg="model load progress 0.99" time=2025-06-14T20:24:25.815-04:00 level=INFO source=server.go:630 msg="llama runner started in 5.27 seconds" time=2025-06-14T20:24:25.815-04:00 level=DEBUG source=sched.go:495 msg="finished setting up" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 time=2025-06-14T20:24:25.815-04:00 level=DEBUG source=server.go:729 msg="completion request" images=0 prompt=3158 format="\"json\"" time=2025-06-14T20:24:26.077-04:00 level=DEBUG source=vocabulary.go:52 msg="adding bos token to prompt" id=[2] time=2025-06-14T20:24:26.077-04:00 level=DEBUG source=cache.go:136 msg="loading cache slot" id=0 cache=0 prompt=791 used=0 remaining=791 [GIN] 2025/06/14 - 20:24:41 | 200 | 21.398847101s | 127.0.0.1 | POST "/api/chat" time=2025-06-14T20:24:41.213-04:00 level=DEBUG source=sched.go:503 msg="context for request finished" time=2025-06-14T20:24:41.213-04:00 level=DEBUG source=sched.go:343 msg="runner with non-zero duration has gone idle, adding timer" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 duration=2562047h47m16.854775807s time=2025-06-14T20:24:41.213-04:00 level=DEBUG source=sched.go:361 msg="after processing request finished event" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 refCount=0 time=2025-06-14T20:24:41.270-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=general.alignment default=32 time=2025-06-14T20:24:41.272-04:00 level=DEBUG source=sched.go:615 msg="evaluating already loaded" model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 time=2025-06-14T20:24:41.272-04:00 level=DEBUG source=server.go:729 msg="completion request" images=0 prompt=2979 format="\"json\"" time=2025-06-14T20:24:41.513-04:00 level=DEBUG source=vocabulary.go:52 msg="adding bos token to prompt" id=[2] time=2025-06-14T20:24:41.513-04:00 level=DEBUG source=cache.go:136 msg="loading cache slot" id=0 cache=1170 prompt=759 used=0 remaining=759 [GIN] 2025/06/14 - 20:24:56 | 200 | 15.389986769s | 127.0.0.1 | POST "/api/chat" time=2025-06-14T20:24:56.606-04:00 level=DEBUG source=sched.go:434 msg="context for request finished" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 time=2025-06-14T20:24:56.606-04:00 level=DEBUG source=sched.go:343 msg="runner with non-zero duration has gone idle, adding timer" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 duration=2562047h47m16.854775807s time=2025-06-14T20:24:56.606-04:00 level=DEBUG source=sched.go:361 msg="after processing request finished event" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 refCount=0 time=2025-06-14T20:24:56.650-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=general.alignment default=32 time=2025-06-14T20:24:56.652-04:00 level=DEBUG source=sched.go:615 msg="evaluating already loaded" model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 time=2025-06-14T20:24:56.653-04:00 level=DEBUG source=server.go:729 msg="completion request" images=0 prompt=3331 format="\"json\"" time=2025-06-14T20:24:56.823-04:00 level=DEBUG source=vocabulary.go:52 msg="adding bos token to prompt" id=[2] time=2025-06-14T20:24:56.824-04:00 level=DEBUG source=cache.go:136 msg="loading cache slot" id=0 cache=1138 prompt=870 used=0 remaining=870 [GIN] 2025/06/14 - 20:25:13 | 200 | 16.622287091s | 127.0.0.1 | POST "/api/chat" time=2025-06-14T20:25:13.232-04:00 level=DEBUG source=sched.go:434 msg="context for request finished" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 time=2025-06-14T20:25:13.232-04:00 level=DEBUG source=sched.go:343 msg="runner with non-zero duration has gone idle, adding timer" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 duration=2562047h47m16.854775807s time=2025-06-14T20:25:13.232-04:00 level=DEBUG source=sched.go:361 msg="after processing request finished event" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 refCount=0 time=2025-06-14T20:25:13.292-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=general.alignment default=32 time=2025-06-14T20:25:13.294-04:00 level=DEBUG source=sched.go:615 msg="evaluating already loaded" model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 time=2025-06-14T20:25:13.294-04:00 level=DEBUG source=server.go:729 msg="completion request" images=0 prompt=4148 format="\"json\"" time=2025-06-14T20:25:13.527-04:00 level=DEBUG source=vocabulary.go:52 msg="adding bos token to prompt" id=[2] time=2025-06-14T20:25:13.527-04:00 level=DEBUG source=cache.go:136 msg="loading cache slot" id=0 cache=1285 prompt=931 used=0 remaining=931 [GIN] 2025/06/14 - 20:25:30 | 200 | 16.87517067s | 127.0.0.1 | POST "/api/chat" time=2025-06-14T20:25:30.113-04:00 level=DEBUG source=sched.go:434 msg="context for request finished" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 time=2025-06-14T20:25:30.113-04:00 level=DEBUG source=sched.go:343 msg="runner with non-zero duration has gone idle, adding timer" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 duration=2562047h47m16.854775807s time=2025-06-14T20:25:30.113-04:00 level=DEBUG source=sched.go:361 msg="after processing request finished event" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 refCount=0 time=2025-06-14T20:25:30.172-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=general.alignment default=32 time=2025-06-14T20:25:30.174-04:00 level=DEBUG source=sched.go:615 msg="evaluating already loaded" model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 time=2025-06-14T20:25:30.174-04:00 level=DEBUG source=server.go:729 msg="completion request" images=0 prompt=2802 format="\"json\"" time=2025-06-14T20:25:30.407-04:00 level=DEBUG source=vocabulary.go:52 msg="adding bos token to prompt" id=[2] time=2025-06-14T20:25:30.407-04:00 level=DEBUG source=cache.go:136 msg="loading cache slot" id=0 cache=1346 prompt=721 used=0 remaining=721 [GIN] 2025/06/14 - 20:25:43 | 200 | 13.766561764s | 127.0.0.1 | POST "/api/chat" time=2025-06-14T20:25:43.884-04:00 level=DEBUG source=sched.go:434 msg="context for request finished" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 time=2025-06-14T20:25:43.884-04:00 level=DEBUG source=sched.go:343 msg="runner with non-zero duration has gone idle, adding timer" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 duration=2562047h47m16.854775807s time=2025-06-14T20:25:43.884-04:00 level=DEBUG source=sched.go:361 msg="after processing request finished event" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 refCount=0 time=2025-06-14T20:25:43.940-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=general.alignment default=32 time=2025-06-14T20:25:43.943-04:00 level=DEBUG source=sched.go:615 msg="evaluating already loaded" model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 time=2025-06-14T20:25:43.944-04:00 level=DEBUG source=server.go:729 msg="completion request" images=0 prompt=2965 format="\"json\"" time=2025-06-14T20:25:44.179-04:00 level=DEBUG source=vocabulary.go:52 msg="adding bos token to prompt" id=[2] time=2025-06-14T20:25:44.180-04:00 level=DEBUG source=cache.go:136 msg="loading cache slot" id=0 cache=1061 prompt=753 used=0 remaining=753 [GIN] 2025/06/14 - 20:26:00 | 200 | 16.590631273s | 127.0.0.1 | POST "/api/chat" time=2025-06-14T20:26:00.479-04:00 level=DEBUG source=sched.go:434 msg="context for request finished" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 time=2025-06-14T20:26:00.479-04:00 level=DEBUG source=sched.go:343 msg="runner with non-zero duration has gone idle, adding timer" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 duration=2562047h47m16.854775807s time=2025-06-14T20:26:00.479-04:00 level=DEBUG source=sched.go:361 msg="after processing request finished event" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 refCount=0 time=2025-06-14T20:26:00.534-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=general.alignment default=32 time=2025-06-14T20:26:00.536-04:00 level=DEBUG source=sched.go:615 msg="evaluating already loaded" model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 time=2025-06-14T20:26:00.536-04:00 level=DEBUG source=server.go:729 msg="completion request" images=0 prompt=3681 format="\"json\"" time=2025-06-14T20:26:00.756-04:00 level=DEBUG source=vocabulary.go:52 msg="adding bos token to prompt" id=[2] time=2025-06-14T20:26:00.756-04:00 level=DEBUG source=cache.go:136 msg="loading cache slot" id=0 cache=1168 prompt=902 used=0 remaining=902 [GIN] 2025/06/14 - 20:26:17 | 200 | 16.851443096s | 127.0.0.1 | POST "/api/chat" time=2025-06-14T20:26:17.335-04:00 level=DEBUG source=sched.go:434 msg="context for request finished" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 time=2025-06-14T20:26:17.335-04:00 level=DEBUG source=sched.go:343 msg="runner with non-zero duration has gone idle, adding timer" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 duration=2562047h47m16.854775807s time=2025-06-14T20:26:17.335-04:00 level=DEBUG source=sched.go:361 msg="after processing request finished event" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 refCount=0 time=2025-06-14T20:26:17.391-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=general.alignment default=32 time=2025-06-14T20:26:17.393-04:00 level=DEBUG source=sched.go:615 msg="evaluating already loaded" model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 time=2025-06-14T20:26:17.394-04:00 level=DEBUG source=server.go:729 msg="completion request" images=0 prompt=4189 format="\"json\"" time=2025-06-14T20:26:17.617-04:00 level=DEBUG source=vocabulary.go:52 msg="adding bos token to prompt" id=[2] time=2025-06-14T20:26:17.617-04:00 level=DEBUG source=cache.go:136 msg="loading cache slot" id=0 cache=1317 prompt=1121 used=0 remaining=1121 [GIN] 2025/06/14 - 20:26:33 | 200 | 15.704443348s | 127.0.0.1 | POST "/api/chat" time=2025-06-14T20:26:33.045-04:00 level=DEBUG source=sched.go:434 msg="context for request finished" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 time=2025-06-14T20:26:33.045-04:00 level=DEBUG source=sched.go:343 msg="runner with non-zero duration has gone idle, adding timer" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 duration=2562047h47m16.854775807s time=2025-06-14T20:26:33.045-04:00 level=DEBUG source=sched.go:361 msg="after processing request finished event" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 refCount=0 time=2025-06-14T20:26:33.104-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=general.alignment default=32 time=2025-06-14T20:26:33.106-04:00 level=DEBUG source=sched.go:615 msg="evaluating already loaded" model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 time=2025-06-14T20:26:33.106-04:00 level=DEBUG source=server.go:729 msg="completion request" images=0 prompt=3249 format="\"json\"" time=2025-06-14T20:26:33.319-04:00 level=DEBUG source=vocabulary.go:52 msg="adding bos token to prompt" id=[2] time=2025-06-14T20:26:33.319-04:00 level=DEBUG source=cache.go:136 msg="loading cache slot" id=0 cache=1500 prompt=832 used=0 remaining=832 [GIN] 2025/06/14 - 20:26:48 | 200 | 15.555706737s | 127.0.0.1 | POST "/api/chat" time=2025-06-14T20:26:48.605-04:00 level=DEBUG source=sched.go:434 msg="context for request finished" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 time=2025-06-14T20:26:48.605-04:00 level=DEBUG source=sched.go:343 msg="runner with non-zero duration has gone idle, adding timer" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 duration=2562047h47m16.854775807s time=2025-06-14T20:26:48.605-04:00 level=DEBUG source=sched.go:361 msg="after processing request finished event" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 refCount=0 time=2025-06-14T20:26:48.642-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=general.alignment default=32 time=2025-06-14T20:26:48.644-04:00 level=DEBUG source=sched.go:615 msg="evaluating already loaded" model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 time=2025-06-14T20:26:48.644-04:00 level=DEBUG source=server.go:729 msg="completion request" images=0 prompt=6622 format="\"json\"" time=2025-06-14T20:26:48.878-04:00 level=DEBUG source=vocabulary.go:52 msg="adding bos token to prompt" id=[2] time=2025-06-14T20:26:48.878-04:00 level=DEBUG source=cache.go:136 msg="loading cache slot" id=0 cache=1211 prompt=1567 used=0 remaining=1567 [GIN] 2025/06/14 - 20:27:05 | 200 | 16.591033158s | 127.0.0.1 | POST "/api/chat" time=2025-06-14T20:27:05.200-04:00 level=DEBUG source=sched.go:434 msg="context for request finished" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 time=2025-06-14T20:27:05.201-04:00 level=DEBUG source=sched.go:343 msg="runner with non-zero duration has gone idle, adding timer" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 duration=2562047h47m16.854775807s time=2025-06-14T20:27:05.201-04:00 level=DEBUG source=sched.go:361 msg="after processing request finished event" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 refCount=0 time=2025-06-14T20:27:05.255-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=general.alignment default=32 time=2025-06-14T20:27:05.257-04:00 level=DEBUG source=sched.go:615 msg="evaluating already loaded" model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 time=2025-06-14T20:27:05.257-04:00 level=DEBUG source=server.go:729 msg="completion request" images=0 prompt=3115 format="\"json\"" time=2025-06-14T20:27:05.464-04:00 level=DEBUG source=vocabulary.go:52 msg="adding bos token to prompt" id=[2] time=2025-06-14T20:27:05.464-04:00 level=DEBUG source=cache.go:136 msg="loading cache slot" id=0 cache=1946 prompt=780 used=0 remaining=780 [GIN] 2025/06/14 - 20:27:20 | 200 | 15.367599033s | 127.0.0.1 | POST "/api/chat" time=2025-06-14T20:27:20.573-04:00 level=DEBUG source=sched.go:434 msg="context for request finished" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 time=2025-06-14T20:27:20.573-04:00 level=DEBUG source=sched.go:343 msg="runner with non-zero duration has gone idle, adding timer" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 duration=2562047h47m16.854775807s time=2025-06-14T20:27:20.573-04:00 level=DEBUG source=sched.go:361 msg="after processing request finished event" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 refCount=0 time=2025-06-14T20:27:20.631-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=general.alignment default=32 time=2025-06-14T20:27:20.633-04:00 level=DEBUG source=sched.go:615 msg="evaluating already loaded" model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 time=2025-06-14T20:27:20.633-04:00 level=DEBUG source=server.go:729 msg="completion request" images=0 prompt=3543 format="\"json\"" time=2025-06-14T20:27:20.876-04:00 level=DEBUG source=vocabulary.go:52 msg="adding bos token to prompt" id=[2] time=2025-06-14T20:27:20.876-04:00 level=DEBUG source=cache.go:136 msg="loading cache slot" id=0 cache=1159 prompt=815 used=0 remaining=815 [GIN] 2025/06/14 - 20:27:35 | 200 | 15.308709032s | 127.0.0.1 | POST "/api/chat" time=2025-06-14T20:27:35.886-04:00 level=DEBUG source=sched.go:434 msg="context for request finished" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 time=2025-06-14T20:27:35.886-04:00 level=DEBUG source=sched.go:343 msg="runner with non-zero duration has gone idle, adding timer" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 duration=2562047h47m16.854775807s time=2025-06-14T20:27:35.886-04:00 level=DEBUG source=sched.go:361 msg="after processing request finished event" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 refCount=0 time=2025-06-14T20:27:35.921-04:00 level=DEBUG source=ggml.go:155 msg="key not found" key=general.alignment default=32 time=2025-06-14T20:27:35.922-04:00 level=DEBUG source=sched.go:615 msg="evaluating already loaded" model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 time=2025-06-14T20:27:35.923-04:00 level=DEBUG source=server.go:729 msg="completion request" images=0 prompt=4373 format="\"json\"" time=2025-06-14T20:27:36.118-04:00 level=DEBUG source=vocabulary.go:52 msg="adding bos token to prompt" id=[2] time=2025-06-14T20:27:36.118-04:00 level=DEBUG source=cache.go:136 msg="loading cache slot" id=0 cache=1194 prompt=1036 used=0 remaining=1036 [GIN] 2025/06/14 - 20:27:51 | 200 | 15.496595172s | 127.0.0.1 | POST "/api/chat" time=2025-06-14T20:27:51.388-04:00 level=DEBUG source=sched.go:434 msg="context for request finished" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 time=2025-06-14T20:27:51.388-04:00 level=DEBUG source=sched.go:343 msg="runner with non-zero duration has gone idle, adding timer" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 duration=2562047h47m16.854775807s time=2025-06-14T20:27:51.388-04:00 level=DEBUG source=sched.go:361 msg="after processing request finished event" runner.name=registry.ollama.ai/library/gemma3:27b-it-qat runner.inference=cuda runner.devices=1 runner.size="19.9 GiB" runner.vram="19.9 GiB" runner.parallel=1 runner.pid=3566569 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 runner.num_ctx=4096 refCount=0 time=2025-06-14T20:27:51.561-04:00 level=DEBUG source=sched.go:872 msg="shutting down runner" model=/usr/share/ollama/.ollama/models/blobs/sha256-ccc0cddac56136ef0969cf2e3e9ac051124c937be42503b47ec570dead85ff87 time=2025-06-14T20:27:51.561-04:00 level=DEBUG source=server.go:1023 msg="stopping llama server" pid=3566569 time=2025-06-14T20:27:51.561-04:00 level=DEBUG source=server.go:1029 msg="waiting for llama server to exit" pid=3566569 time=2025-06-14T20:27:51.561-04:00 level=DEBUG source=sched.go:122 msg="shutting down scheduler pending loop" time=2025-06-14T20:27:51.561-04:00 level=DEBUG source=sched.go:322 msg="shutting down scheduler completed loop" time=2025-06-14T20:27:52.053-04:00 level=DEBUG source=server.go:1033 msg="llama server stopped" pid=3566569 ```

GiteaMirror commented

2026-05-04 17:07:44 -05:00

@rohanshad commented on GitHub (Jun 15, 2025):

Actually realized that with v0.9.0 things appear to stabilize after about 45 minutes for me which is new behavior and I just wasn't running it long enough to see that change. Thanks for showing that second graph that helped a lot!

@rohanshad commented on GitHub (Jun 15, 2025): Actually realized that with v0.9.0 things appear to stabilize after about 45 minutes for me which is new behavior and I just wasn't running it long enough to see that change. Thanks for showing that second graph that helped a lot!

Sign in to join this conversation.

Branches Tags

main

hoyyeva/anthropic-local-image-path

dhiltgen/ci

dhiltgen/llama-runner

parth-remove-claude-desktop-launch

hoyyeva/anthropic-reference-images-path

parth-anthropic-reference-images-path

brucemacd/download-before-remove

hoyyeva/editor-config-repair

parth-mlx-decode-checkpoints

parth-launch-codex-app

hoyyeva/fix-codex-model-metadata-warning

hoyyeva/qwen

parth/hide-claude-desktop-till-release

hoyyeva/opencode-image-modality

parth-add-claude-code-autoinstall

release_v0.22.0

pdevine/manifest-list

codex/fix-codex-model-metadata-warning

pdevine/addressable-manifest

brucemacd/launch-fetch-reccomended

jmorganca/llama-compat

launch-copilot-cli

hoyyeva/opencode-thinking

release_v0.20.7

parth-auto-save-backup

parth-test

jmorganca/gemma4-audio-replacements

fix-manifest-digest-on-pull

hoyyeva/vscode-improve

brucemacd/install-server-wait

parth/update-claude-docs

brucemac/start-ap-install

pdevine/mlx-update

pdevine/qwen35_vision

drifkin/api-show-fallback

mintlify/image-generation-1773352582

hoyyeva/server-context-length-local-config

jmorganca/faster-reptition-penalties

jmorganca/convert-nemotron

parth-pi-thinking

pdevine/sampling-penalties

jmorganca/fix-create-quantization-memory

dongchen/resumable_transfer_fix

pdevine/sampling-cache-error

jessegross/mlx-usage

hoyyeva/openclaw-config

hoyyeva/app-html

pdevine/qwen3next

brucemacd/sign-sh-install

brucemacd/tui-update

brucemacd/usage-api

jmorganca/launch-empty

fix-app-dist-embed

mxyng/mlx-compile

mxyng/mlx-quant

mxyng/mlx-glm4.7

mxyng/mlx

brucemacd/simplify-model-picker

jmorganca/qwen3-concurrent

fix-glm-4.7-flash-mla-config

drifkin/qwen3-coder-opening-tag

brucemacd/usage-cli

fix-cuda12-fattn-shmem

ollama-imagegen-docs

parth/fix-multiline-inputs

brucemacd/config-docs

mxyng/model-files

mxyng/simple-execute

fix-imagegen-ollama-models

mxyng/async-upload

jmorganca/lazy-no-dtype-changes

imagegen-auto-detect-create

parth/decrease-concurrent-download-hf

fix-mlx-quantize-init

jmorganca/x-cleanup

usage

imagegen-readme

jmorganca/glm-image

mlx-gpu-cd

jmorganca/imagegen-modelfile

parth/agent-skills

parth/agent-allowlist

parth/signed-in-offline

parth/agents

parth/fix-context-chopping

improve-cloud-flow

parth/add-models-websearch

parth/prompt-renderer-mcp

jmorganca/native-settings

jmorganca/download-stream-hash

jmorganca/client2-rebased

brucemacd/oai-chat-req-multipart

jessegross/multi_chunk_reserve

grace/additional-omit-empty

grace/mistral-3-large

mxyng/tokenizer2

mxyng/tokenizer

jessegross/flash

hoyyeva/windows-nacked-app

mxyng/cleanup-attention

grace/deepseek-parser

hoyyeva/remember-unsent-prompt

parth/add-lfs-pointer-error-conversion

parth/olmo2-test2

hoyyeva/ollama-launchagent-plist

nicole/olmo-model

parth/olmo-test

mxyng/remove-embedded

parth/render-template

jmorganca/intellect-3

parth/remove-prealloc-linter

jmorganca/cmd-eval

nicole/nomic-embed-text-fix

mxyng/lint-2

hoyyeva/add-gemini-3-pro-preview

hoyyeva/load-model-list

mxyng/expand-path

mxyng/environ-2

hoyyeva/deeplink-json-encoding

parth/improve-tool-calling-tests

hoyyeva/conversation

hoyyeva/assistant-edit-response

hoyyeva/thinking

origin/brucemacd/invalid-char-i-err

parth/improve-tool-calling

jmorganca/required-omitempty

grace/qwen3-vl-tests

mxyng/iter-client

parth/docs-readme

nicole/embed-test

pdevine/integration-benchstat

parth/remove-generate-cmd

parth/add-toolcall-id

mxyng/server-tests

jmorganca/glm-4.6

jmorganca/gin-h-compat

drifkin/stable-tool-args

pdevine/qwen3-more-thinking

parth/add-websearch-client

nicole/websearch_local

jmorganca/qwen3-coder-updates

grace/deepseek-v3-migration-tests

mxyng/fix-create

jmorganca/cloud-errors

pdevine/parser-tidy

revert-12233-parth/simplify-entrypoints-runner

parth/enable-so-gpt-oss

brucemacd/qwen3vl

jmorganca/readme-simplify

parth/gpt-oss-structured-outputs

revert-12039-jmorganca/tools-braces

mxyng/embeddings

mxyng/gguf

mxyng/benchmark

mxyng/types-null

parth/move-parsing

mxyng/gemma2

jmorganca/docs

mxyng/16-bit

mxyng/create-stdin

pdevine/authorizedkeys

mxyng/quant

parth/opt-in-error-context-window

brucemacd/cache-models

brucemacd/runner-completion

jmorganca/llama-update-6

brucemacd/benchmark-list

brucemacd/partial-read-caps

parth/deepseek-r1-tools

mxyng/omit-array

parth/tool-prefix-temp

brucemacd/runner-test

jmorganca/qwen25vl

brucemacd/model-forward-test-ext

parth/python-function-parsing

jmorganca/cuda-compression-none

drifkin/num-parallel

drifkin/chat-truncation-fix

jmorganca/sync

parth/python-tools-calling

drifkin/array-head-count

brucemacd/create-no-loop

parth/server-enable-content-stream-with-tools

qwen25omni

mxyng/v3

brucemacd/ropeconfig

jmorganca/silence-tokenizer

parth/sample-so-test

parth/sampling-structured-outputs

brucemacd/doc-go-engine

parth/constrained-sampling-json

jmorganca/mistral-wip

brucemacd/mistral-small-convert

parth/sample-unmarshal-json-for-params

brucemacd/jomorganca/mistral

pdevine/bfloat16

jmorganca/mistral

brucemacd/mistral

pdevine/logging

parth/sample-correctness-fix

parth/sample-fix-sorting

jmorgan/sample-fix-sorting-extras

jmorganca/temp-0-images

brucemacd/parallel-embed-models

brucemacd/shim-grammar

jmorganca/fix-gguf-error

bmizerany/nameswork

jmorganca/faster-releases

bmizerany/validatenames

brucemacd/err-no-vocab

brucemacd/rope-config

brucemacd/err-hint

brucemacd/qwen2_5

brucemacd/logprobs

brucemacd/new_runner_graph_bench

progress-flicker

brucemacd/forward-test

brucemacd/go_qwen2

pdevine/gemma2

jmorganca/add-missing-symlink-eval

mxyng/next-debug

parth/set-context-size-openai

brucemacd/next-bpe-bench

brucemacd/next-bpe-test

brucemacd/new_runner_e2e

brucemacd/new_runner_qwen2

pdevine/convert-cohere2

brucemacd/convert-cli

parth/log-probs

mxyng/next-mlx

mxyng/cmd-history

parth/templating

parth/tokenize-detokenize

brucemacd/check-key-register

bmizerany/grammar

jmorganca/vendor-081b29bd

mxyng/func-checks

jmorganca/fix-null-format

parth/fix-default-to-warn-json

jmorganca/qwen2vl

jmorganca/no-concat

parth/cmd-cleanup-SO

brucemacd/check-key-register-structured-err

parth/openai-stream-usage

parth/fix-referencing-so

stream-tools-stop

jmorganca/degin-1

brucemacd/install-path-clean

brucemacd/push-name-validation

brucemacd/browser-key-register

jmorganca/openai-fix-first-message

jmorganca/fix-proxy

jessegross/sample

parth/disallow-streaming-tools

dhiltgen/remove_submodule

jmorganca/ga

jmorganca/mllama

pdevine/newlines

pdevine/geems-2b

jmorganca/llama-bump

mxyng/modelname-7

mxyng/gin-slog

mxyng/modelname-6

jyan/convert-prog

jyan/quant5

paligemma-support

pdevine/import-docs

jmorganca/openai-context

jyan/paligemma

jyan/p2

jyan/palitest

bmizerany/embedspeedup

jmorganca/llama-vit

brucemacd/allow-ollama

royh/ep-methods

royh/whisper

mxyng/api-models

mxyng/fix-memory

jyan/q4_4/8

jyan/ollama-v

royh/stream-tools

roy-embed-parallel

bmizerany/hrm

revert-5963-revert-5924-mxyng/llama3.1-rope

royh/embed-viz

jyan/local2

jyan/auth

jyan/local

jyan/parse-temp

jmorganca/template-mistral

jyan/reord-g

royh-openai-suffixdocs

royh-imgembed

royh-embed-parallel

jyan/quant4

royh-precision

jyan/progress

pdevine/fix-template

jyan/quant3

pdevine/ggla

mxyng/update-registry-domain

jmorganca/ggml-static

mxyng/create-context

jyan/v0.146

mxyng/layers-from-files

build_dist

bmizerany/noseek

royh-ls

royh-name

timeout

mxyng/server-timestamp

bmizerany/nosillyggufslurps

royh-params

jmorganca/llama-cpp-7c26775

royh-openai-delete

royh-show-rigid

jmorganca/enable-fa

jmorganca/no-error-template

jyan/format

royh-testdelete

bmizerany/fastverify

language_support

pdevine/ps-glitches

brucemacd/tokenize

bruce/iq-quants

bmizerany/filepathwithcoloninhost

mxyng/split-bin

bmizerany/client-registry

jmorganca/if-none-match

native

jmorganca/native

jmorganca/batch-embeddings

jmorganca/initcmake

jmorganca/mm

pdevine/showggmlinfo

modenameenforcealphanum

bmizerany/modenameenforcealphanum

jmorganca/done-reason

jmorganca/llama-cpp-8960fe8

ollama.com

bmizerany/filepathnobuild

bmizerany/types/model/defaultfix

rmdisplaylong

nogogen

bmizerany/x

modelfile-readme

bmizerany/replacecolon

jmorganca/limit

jmorganca/execstack

jmorganca/replace-assets

mxyng/tune-concurrency

jmorganca/testing

whitespace-detection

jmorganca/options

upgrade-all

scratch

cuda-search

mattw/airenamer

mattw/allmodelsonhuggingface

mattw/quantcontext

mattw/whatneedstorun

brucemacd/llama-mem-calc

mattw/faq-context

mattw/communitylinks

mattw/noprune

mattw/python-functioncalling

rename

mxyng/install

pulse

remove-first

editor

mattw/selfqueryingretrieval

cgo

mattw/howtoquant

api

matt/streamingapi

format-config

mxyng/extra-args

shell

update-nous-hermes

cp-model

upload-progress

fix-unknown-model

fix-model-names

delete-fix

insecure-registry

ls

deletemodels

progressbar

readme-updates

license-layers

skip-list

list-models

modelpath

matt/examplemodelfiles

distribution

go-opts

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/ollama#69085