[GH-ISSUE #11240] WSL2 Multi-GPU: Prefill failing to CPU-only, token generation uses GPUs normally #69464

New Issue

GiteaMirror · 2026-05-04T18:11:22-05:00

GiteaMirror commented

2026-05-04 18:11:22 -05:00

Originally created by @eowensai on GitHub (Jun 30, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/11240

What is the issue?

Environment:
Host OS: Windows 11 Pro
WSL2 Distribution: Ubuntu 24.04
GPU: 2x NVIDIA GeForce RTX 3060 (12GB)
NVIDIA Driver Version: 576.80 (on Windows Host)
Model: gemma-3-4b-it-qat.gguf

What happened?
In this WSL2 environment, Ollama consistently uses the CPU for the initial prefill phase of prompt processing. For large prompts, this CPU-bound prefill can take several minutes to complete, during which nvidia-smi reports 0% GPU utilization (although the full model can be seen sharded onto the gpus). Once this phase is over, token generation correctly utilizes the GPUs and achieves expected performance.

I've only seen this with Ollama running in the WSL2 environment, it is not reproducible when running Ollama natively on the Windows host.

The behavior persists even when num_ctx is reduced from 128k to 8k. The issue also occurs in v0.9.3 (what I started with, going down to 0.9.2 was a troubleshooting step)

Several environment and system configurations were also tested with no change in the prefill behavior, including:

Using numactl to set the memory interleave policy.
Explicitly setting CUDA_VISIBLE_DEVICES=0,1.
Forcing the KV cache type with OLLAMA_KV_CACHE_TYPE=f16 in an attempt to address a "kv cache type not supported" warning found in logs.
Attempting to resolve potential library path issues via symbolic links (ln -s) and by explicitly setting LD_LIBRARY_PATH.

The core issue appears to be a failure within the Ollama runner process to initialize a GPU-compatible backend for the prefill task, forcing a fallback to the CPU, despite being able to use the GPU for the decoding task.

What did you expect to happen?
The prefill computation for all prompts should be executed on the GPU, leveraging the available VRAM and compute capabilities. CPU usage should remain low during prefill, and nvidia-smi should show high GPU utilization, resulting in fast prompt processing times, consistent with the performance of native Windows Ollama.

Relevant log output

Jun 29 20:22:04 Aidev systemd[1]: Stopping ollama.service - Ollama Service (v0.9.2 Manual Install)...
Jun 29 20:22:04 Aidev numactl[97909]: time=2025-06-29T20:22:04.904-07:00 level=DEBUG source=sched.go:872 msg="shutting down runner" model=/usr/share/ollama/.ollama/models/blobs/sha256-529850705c0884a283b87d3b261d36ee30821e16f0310962ba977b456ad3b8cd
Jun 29 20:22:04 Aidev numactl[97909]: time=2025-06-29T20:22:04.904-07:00 level=DEBUG source=server.go:1024 msg="stopping llama server" pid=110636
Jun 29 20:22:04 Aidev numactl[97909]: time=2025-06-29T20:22:04.904-07:00 level=DEBUG source=server.go:1030 msg="waiting for llama server to exit" pid=110636
Jun 29 20:22:04 Aidev numactl[97909]: time=2025-06-29T20:22:04.904-07:00 level=DEBUG source=sched.go:322 msg="shutting down scheduler completed loop"
Jun 29 20:22:04 Aidev numactl[97909]: time=2025-06-29T20:22:04.904-07:00 level=DEBUG source=sched.go:122 msg="shutting down scheduler pending loop"
Jun 29 20:22:05 Aidev numactl[97909]: time=2025-06-29T20:22:05.043-07:00 level=DEBUG source=server.go:1034 msg="llama server stopped" pid=110636
Jun 29 20:22:05 Aidev systemd[1]: ollama.service: Deactivated successfully.
Jun 29 20:22:05 Aidev systemd[1]: Stopped ollama.service - Ollama Service (v0.9.2 Manual Install).
Jun 29 20:22:05 Aidev systemd[1]: ollama.service: Consumed 17min 59.851s CPU time, 5.5G memory peak, 0B memory swap peak.
Jun 29 20:22:05 Aidev systemd[1]: Started ollama.service - Ollama Service (v0.9.2 Manual Install).
Jun 29 20:22:05 Aidev numactl[115262]: time=2025-06-29T20:22:05.110-07:00 level=INFO source=routes.go:1235 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:DEBUG OLLAMA_FLASH_ATTENTION:true OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:2562047h47m16.854775807s OLLAMA_KV_CACHE_TYPE:f16 OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/usr/share/ollama/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
Jun 29 20:22:05 Aidev numactl[115262]: time=2025-06-29T20:22:05.111-07:00 level=INFO source=images.go:480 msg="total blobs: 8"
Jun 29 20:22:05 Aidev numactl[115262]: time=2025-06-29T20:22:05.111-07:00 level=INFO source=images.go:487 msg="total unused blobs removed: 0"
Jun 29 20:22:05 Aidev numactl[115262]: time=2025-06-29T20:22:05.111-07:00 level=INFO source=routes.go:1288 msg="Listening on 127.0.0.1:11434 (version 0.9.2)"
Jun 29 20:22:05 Aidev numactl[115262]: time=2025-06-29T20:22:05.112-07:00 level=DEBUG source=sched.go:108 msg="starting llm scheduler"
Jun 29 20:22:05 Aidev numactl[115262]: time=2025-06-29T20:22:05.112-07:00 level=INFO source=gpu.go:217 msg="looking for compatible GPUs"
Jun 29 20:22:05 Aidev numactl[115262]: time=2025-06-29T20:22:05.112-07:00 level=DEBUG source=gpu.go:98 msg="searching for GPU discovery libraries for NVIDIA"
Jun 29 20:22:05 Aidev numactl[115262]: time=2025-06-29T20:22:05.112-07:00 level=DEBUG source=gpu.go:501 msg="Searching for GPU library" name=libcuda.so*
Jun 29 20:22:05 Aidev numactl[115262]: time=2025-06-29T20:22:05.112-07:00 level=DEBUG source=gpu.go:525 msg="gpu library search" globs="[/usr/local/lib/ollama/libcuda.so* /usr/share/ollama/libcuda.so* /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]"
Jun 29 20:22:05 Aidev numactl[115262]: time=2025-06-29T20:22:05.940-07:00 level=DEBUG source=gpu.go:558 msg="discovered GPU libraries" paths="[/usr/lib/x86_64-linux-gnu/libcuda.so.535.230.02 /usr/lib/wsl/lib/libcuda.so /usr/lib/wsl/lib/libcuda.so.1 /usr/lib/wsl/lib/libcuda.so.1.1 /usr/lib/wsl/drivers/nv_dispig.inf_amd64_0afec3f2050014a0/libcuda.so.1.1 /usr/lib/wsl/drivers/nv_dispsi.inf_amd64_bc81edc432675578/libcuda.so.1.1]"
Jun 29 20:22:05 Aidev numactl[115262]: initializing /usr/lib/x86_64-linux-gnu/libcuda.so.535.230.02
Jun 29 20:22:05 Aidev numactl[115262]: dlsym: cuInit - 0x73202d4c2470
Jun 29 20:22:05 Aidev numactl[115262]: dlsym: cuDriverGetVersion - 0x73202d4c2490
Jun 29 20:22:05 Aidev numactl[115262]: dlsym: cuDeviceGetCount - 0x73202d4c24d0
Jun 29 20:22:05 Aidev numactl[115262]: dlsym: cuDeviceGet - 0x73202d4c24b0
Jun 29 20:22:05 Aidev numactl[115262]: dlsym: cuDeviceGetAttribute - 0x73202d4c25b0
Jun 29 20:22:05 Aidev numactl[115262]: dlsym: cuDeviceGetUuid - 0x73202d4c2510
Jun 29 20:22:05 Aidev numactl[115262]: dlsym: cuDeviceGetName - 0x73202d4c24f0
Jun 29 20:22:05 Aidev numactl[115262]: dlsym: cuCtxCreate_v3 - 0x73202d4ca170
Jun 29 20:22:05 Aidev numactl[115262]: dlsym: cuMemGetInfo_v2 - 0x73202d4d5640
Jun 29 20:22:05 Aidev numactl[115262]: dlsym: cuCtxDestroy - 0x73202d524640
Jun 29 20:22:05 Aidev numactl[115262]: calling cuInit
Jun 29 20:22:05 Aidev numactl[115262]: cuInit err: 100
Jun 29 20:22:05 Aidev numactl[115262]: time=2025-06-29T20:22:05.960-07:00 level=INFO source=gpu.go:602 msg="no nvidia devices detected by library /usr/lib/x86_64-linux-gnu/libcuda.so.535.230.02"
Jun 29 20:22:05 Aidev numactl[115262]: initializing /usr/lib/wsl/lib/libcuda.so
Jun 29 20:22:05 Aidev numactl[115262]: dlsym: cuInit - 0x73202ec21dd0
Jun 29 20:22:05 Aidev numactl[115262]: dlsym: cuDriverGetVersion - 0x73202ec21d90
Jun 29 20:22:05 Aidev numactl[115262]: dlsym: cuDeviceGetCount - 0x73202ec21e0d
Jun 29 20:22:05 Aidev numactl[115262]: dlsym: cuDeviceGet - 0x73202ec21e07
Jun 29 20:22:05 Aidev numactl[115262]: dlsym: cuDeviceGetAttribute - 0x73202ec21cf0
Jun 29 20:22:05 Aidev numactl[115262]: dlsym: cuDeviceGetUuid - 0x73202ec21e19
Jun 29 20:22:05 Aidev numactl[115262]: dlsym: cuDeviceGetName - 0x73202ec21e13
Jun 29 20:22:05 Aidev numactl[115262]: dlsym: cuCtxCreate_v3 - 0x73202ec21e8b
Jun 29 20:22:05 Aidev numactl[115262]: dlsym: cuMemGetInfo_v2 - 0x73202ec21f8d
Jun 29 20:22:05 Aidev numactl[115262]: dlsym: cuCtxDestroy - 0x73202ec21e9d
Jun 29 20:22:05 Aidev numactl[115262]: calling cuInit
Jun 29 20:22:06 Aidev numactl[115262]: calling cuDriverGetVersion
Jun 29 20:22:06 Aidev numactl[115262]: raw version 0x2f3a
Jun 29 20:22:06 Aidev numactl[115262]: CUDA driver version: 12.9
Jun 29 20:22:06 Aidev numactl[115262]: calling cuDeviceGetCount
Jun 29 20:22:06 Aidev numactl[115262]: device count 2
Jun 29 20:22:06 Aidev numactl[115262]: time=2025-06-29T20:22:06.011-07:00 level=DEBUG source=gpu.go:125 msg="detected GPUs" count=2 library=/usr/lib/wsl/lib/libcuda.so
Jun 29 20:22:06 Aidev numactl[115262]: [GPU-c6676d00-cd95-9a0e-46b3-f92becc20fea] CUDA totalMem 12287mb
Jun 29 20:22:06 Aidev numactl[115262]: [GPU-c6676d00-cd95-9a0e-46b3-f92becc20fea] CUDA freeMem 11247mb
Jun 29 20:22:06 Aidev numactl[115262]: [GPU-c6676d00-cd95-9a0e-46b3-f92becc20fea] Compute Capability 8.6
Jun 29 20:22:06 Aidev numactl[115262]: [GPU-aa5c4c9d-7ba4-53f2-f3d2-9b12ddf79c6f] CUDA totalMem 12287mb
Jun 29 20:22:06 Aidev numactl[115262]: [GPU-aa5c4c9d-7ba4-53f2-f3d2-9b12ddf79c6f] CUDA freeMem 11242mb
Jun 29 20:22:06 Aidev numactl[115262]: [GPU-aa5c4c9d-7ba4-53f2-f3d2-9b12ddf79c6f] Compute Capability 8.6
Jun 29 20:22:06 Aidev numactl[115262]: time=2025-06-29T20:22:06.252-07:00 level=DEBUG source=amd_linux.go:419 msg="amdgpu driver not detected /sys/module/amdgpu"
Jun 29 20:22:06 Aidev numactl[115262]: releasing cuda driver library
Jun 29 20:22:06 Aidev numactl[115262]: time=2025-06-29T20:22:06.252-07:00 level=INFO source=types.go:130 msg="inference compute" id=GPU-c6676d00-cd95-9a0e-46b3-f92becc20fea library=cuda variant=v12 compute=8.6 driver=12.9 name="NVIDIA GeForce RTX 3060" total="12.0 GiB" available="11.0 GiB"
Jun 29 20:22:06 Aidev numactl[115262]: time=2025-06-29T20:22:06.252-07:00 level=INFO source=types.go:130 msg="inference compute" id=GPU-aa5c4c9d-7ba4-53f2-f3d2-9b12ddf79c6f library=cuda variant=v12 compute=8.6 driver=12.9 name="NVIDIA GeForce RTX 3060" total="12.0 GiB" available="11.0 GiB"
Jun 29 20:22:39 Aidev numactl[115262]: [GIN] 2025/06/29 - 20:22:39 | 200 |      81.082µs |       127.0.0.1 | GET      "/"
Jun 29 20:22:44 Aidev numactl[115262]: [GIN] 2025/06/29 - 20:22:44 | 200 |      22.772µs |       127.0.0.1 | GET      "/"
Jun 29 20:22:44 Aidev numactl[115262]: [GIN] 2025/06/29 - 20:22:44 | 200 |      19.863µs |       127.0.0.1 | GET      "/"
Jun 29 20:22:44 Aidev numactl[115262]: time=2025-06-29T20:22:44.166-07:00 level=DEBUG source=ggml.go:155 msg="key not found" key=general.alignment default=32
Jun 29 20:22:44 Aidev numactl[115262]: time=2025-06-29T20:22:44.166-07:00 level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="31.2 GiB" before.free="30.0 GiB" before.free_swap="8.0 GiB" now.total="31.2 GiB" now.free="30.0 GiB" now.free_swap="8.0 GiB"
Jun 29 20:22:44 Aidev numactl[115262]: initializing /usr/lib/wsl/lib/libcuda.so
Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuInit - 0x73202ec21dd0
Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuDriverGetVersion - 0x73202ec21d90
Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuDeviceGetCount - 0x73202ec21e0d
Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuDeviceGet - 0x73202ec21e07
Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuDeviceGetAttribute - 0x73202ec21cf0
Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuDeviceGetUuid - 0x73202ec21e19
Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuDeviceGetName - 0x73202ec21e13
Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuCtxCreate_v3 - 0x73202ec21e8b
Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuMemGetInfo_v2 - 0x73202ec21f8d
Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuCtxDestroy - 0x73202ec21e9d
Jun 29 20:22:44 Aidev numactl[115262]: calling cuInit
Jun 29 20:22:44 Aidev numactl[115262]: calling cuDriverGetVersion
Jun 29 20:22:44 Aidev numactl[115262]: raw version 0x2f3a
Jun 29 20:22:44 Aidev numactl[115262]: CUDA driver version: 12.9
Jun 29 20:22:44 Aidev numactl[115262]: calling cuDeviceGetCount
Jun 29 20:22:44 Aidev numactl[115262]: device count 2
Jun 29 20:22:44 Aidev numactl[115262]: time=2025-06-29T20:22:44.265-07:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-c6676d00-cd95-9a0e-46b3-f92becc20fea name="NVIDIA GeForce RTX 3060" overhead="0 B" before.total="12.0 GiB" before.free="11.0 GiB" now.total="12.0 GiB" now.free="11.0 GiB" now.used="1.0 GiB"
Jun 29 20:22:44 Aidev numactl[115262]: time=2025-06-29T20:22:44.373-07:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-aa5c4c9d-7ba4-53f2-f3d2-9b12ddf79c6f name="NVIDIA GeForce RTX 3060" overhead="0 B" before.total="12.0 GiB" before.free="11.0 GiB" now.total="12.0 GiB" now.free="11.0 GiB" now.used="1.0 GiB"
Jun 29 20:22:44 Aidev numactl[115262]: releasing cuda driver library
Jun 29 20:22:44 Aidev numactl[115262]: time=2025-06-29T20:22:44.373-07:00 level=DEBUG source=sched.go:185 msg="updating default concurrency" OLLAMA_MAX_LOADED_MODELS=6 gpu_count=2
Jun 29 20:22:44 Aidev numactl[115262]: time=2025-06-29T20:22:44.388-07:00 level=DEBUG source=ggml.go:155 msg="key not found" key=general.alignment default=32
Jun 29 20:22:44 Aidev numactl[115262]: time=2025-06-29T20:22:44.402-07:00 level=DEBUG source=ggml.go:155 msg="key not found" key=general.alignment default=32
Jun 29 20:22:44 Aidev numactl[115262]: time=2025-06-29T20:22:44.403-07:00 level=DEBUG source=sched.go:228 msg="loading first model" model=/usr/share/ollama/.ollama/models/blobs/sha256-529850705c0884a283b87d3b261d36ee30821e16f0310962ba977b456ad3b8cd
Jun 29 20:22:44 Aidev numactl[115262]: time=2025-06-29T20:22:44.403-07:00 level=DEBUG source=memory.go:111 msg=evaluating library=cuda gpu_count=1 available="[11.0 GiB]"
Jun 29 20:22:44 Aidev numactl[115262]: time=2025-06-29T20:22:44.404-07:00 level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="31.2 GiB" before.free="30.0 GiB" before.free_swap="8.0 GiB" now.total="31.2 GiB" now.free="30.0 GiB" now.free_swap="8.0 GiB"
Jun 29 20:22:44 Aidev numactl[115262]: initializing /usr/lib/wsl/lib/libcuda.so
Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuInit - 0x73202ec21dd0
Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuDriverGetVersion - 0x73202ec21d90
Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuDeviceGetCount - 0x73202ec21e0d
Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuDeviceGet - 0x73202ec21e07
Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuDeviceGetAttribute - 0x73202ec21cf0
Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuDeviceGetUuid - 0x73202ec21e19
Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuDeviceGetName - 0x73202ec21e13
Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuCtxCreate_v3 - 0x73202ec21e8b
Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuMemGetInfo_v2 - 0x73202ec21f8d
Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuCtxDestroy - 0x73202ec21e9d
Jun 29 20:22:44 Aidev numactl[115262]: calling cuInit
Jun 29 20:22:44 Aidev numactl[115262]: calling cuDriverGetVersion
Jun 29 20:22:44 Aidev numactl[115262]: raw version 0x2f3a
Jun 29 20:22:44 Aidev numactl[115262]: CUDA driver version: 12.9
Jun 29 20:22:44 Aidev numactl[115262]: calling cuDeviceGetCount
Jun 29 20:22:44 Aidev numactl[115262]: device count 2
Jun 29 20:22:44 Aidev numactl[115262]: time=2025-06-29T20:22:44.505-07:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-c6676d00-cd95-9a0e-46b3-f92becc20fea name="NVIDIA GeForce RTX 3060" overhead="0 B" before.total="12.0 GiB" before.free="11.0 GiB" now.total="12.0 GiB" now.free="11.0 GiB" now.used="1.0 GiB"
Jun 29 20:22:44 Aidev numactl[115262]: time=2025-06-29T20:22:44.596-07:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-aa5c4c9d-7ba4-53f2-f3d2-9b12ddf79c6f name="NVIDIA GeForce RTX 3060" overhead="0 B" before.total="12.0 GiB" before.free="11.0 GiB" now.total="12.0 GiB" now.free="11.0 GiB" now.used="1.0 GiB"
Jun 29 20:22:44 Aidev numactl[115262]: releasing cuda driver library
Jun 29 20:22:44 Aidev numactl[115262]: time=2025-06-29T20:22:44.597-07:00 level=DEBUG source=memory.go:111 msg=evaluating library=cuda gpu_count=1 available="[11.0 GiB]"
Jun 29 20:22:44 Aidev numactl[115262]: time=2025-06-29T20:22:44.597-07:00 level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="31.2 GiB" before.free="30.0 GiB" before.free_swap="8.0 GiB" now.total="31.2 GiB" now.free="30.0 GiB" now.free_swap="8.0 GiB"
Jun 29 20:22:44 Aidev numactl[115262]: initializing /usr/lib/wsl/lib/libcuda.so
Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuInit - 0x73202ec21dd0
Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuDriverGetVersion - 0x73202ec21d90
Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuDeviceGetCount - 0x73202ec21e0d
Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuDeviceGet - 0x73202ec21e07
Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuDeviceGetAttribute - 0x73202ec21cf0
Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuDeviceGetUuid - 0x73202ec21e19
Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuDeviceGetName - 0x73202ec21e13
Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuCtxCreate_v3 - 0x73202ec21e8b
Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuMemGetInfo_v2 - 0x73202ec21f8d
Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuCtxDestroy - 0x73202ec21e9d
Jun 29 20:22:44 Aidev numactl[115262]: calling cuInit
Jun 29 20:22:44 Aidev numactl[115262]: calling cuDriverGetVersion
Jun 29 20:22:44 Aidev numactl[115262]: raw version 0x2f3a
Jun 29 20:22:44 Aidev numactl[115262]: CUDA driver version: 12.9
Jun 29 20:22:44 Aidev numactl[115262]: calling cuDeviceGetCount
Jun 29 20:22:44 Aidev numactl[115262]: device count 2
Jun 29 20:22:44 Aidev numactl[115262]: time=2025-06-29T20:22:44.686-07:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-c6676d00-cd95-9a0e-46b3-f92becc20fea name="NVIDIA GeForce RTX 3060" overhead="0 B" before.total="12.0 GiB" before.free="11.0 GiB" now.total="12.0 GiB" now.free="11.0 GiB" now.used="1.0 GiB"
Jun 29 20:22:44 Aidev numactl[115262]: time=2025-06-29T20:22:44.776-07:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-aa5c4c9d-7ba4-53f2-f3d2-9b12ddf79c6f name="NVIDIA GeForce RTX 3060" overhead="0 B" before.total="12.0 GiB" before.free="11.0 GiB" now.total="12.0 GiB" now.free="11.0 GiB" now.used="1.0 GiB"
Jun 29 20:22:44 Aidev numactl[115262]: releasing cuda driver library
Jun 29 20:22:44 Aidev numactl[115262]: time=2025-06-29T20:22:44.776-07:00 level=DEBUG source=memory.go:111 msg=evaluating library=cuda gpu_count=1 available="[11.0 GiB]"
Jun 29 20:22:44 Aidev numactl[115262]: time=2025-06-29T20:22:44.777-07:00 level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="31.2 GiB" before.free="30.0 GiB" before.free_swap="8.0 GiB" now.total="31.2 GiB" now.free="30.0 GiB" now.free_swap="8.0 GiB"
Jun 29 20:22:44 Aidev numactl[115262]: initializing /usr/lib/wsl/lib/libcuda.so
Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuInit - 0x73202ec21dd0
Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuDriverGetVersion - 0x73202ec21d90
Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuDeviceGetCount - 0x73202ec21e0d
Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuDeviceGet - 0x73202ec21e07
Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuDeviceGetAttribute - 0x73202ec21cf0
Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuDeviceGetUuid - 0x73202ec21e19
Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuDeviceGetName - 0x73202ec21e13
Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuCtxCreate_v3 - 0x73202ec21e8b
Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuMemGetInfo_v2 - 0x73202ec21f8d
Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuCtxDestroy - 0x73202ec21e9d
Jun 29 20:22:44 Aidev numactl[115262]: calling cuInit
Jun 29 20:22:44 Aidev numactl[115262]: calling cuDriverGetVersion
Jun 29 20:22:44 Aidev numactl[115262]: raw version 0x2f3a
Jun 29 20:22:44 Aidev numactl[115262]: CUDA driver version: 12.9
Jun 29 20:22:44 Aidev numactl[115262]: calling cuDeviceGetCount
Jun 29 20:22:44 Aidev numactl[115262]: device count 2
Jun 29 20:22:44 Aidev numactl[115262]: time=2025-06-29T20:22:44.865-07:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-c6676d00-cd95-9a0e-46b3-f92becc20fea name="NVIDIA GeForce RTX 3060" overhead="0 B" before.total="12.0 GiB" before.free="11.0 GiB" now.total="12.0 GiB" now.free="11.0 GiB" now.used="1.0 GiB"
Jun 29 20:22:44 Aidev numactl[115262]: time=2025-06-29T20:22:44.957-07:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-aa5c4c9d-7ba4-53f2-f3d2-9b12ddf79c6f name="NVIDIA GeForce RTX 3060" overhead="0 B" before.total="12.0 GiB" before.free="11.0 GiB" now.total="12.0 GiB" now.free="11.0 GiB" now.used="1.0 GiB"
Jun 29 20:22:44 Aidev numactl[115262]: releasing cuda driver library
Jun 29 20:22:44 Aidev numactl[115262]: time=2025-06-29T20:22:44.958-07:00 level=DEBUG source=memory.go:111 msg=evaluating library=cuda gpu_count=1 available="[11.0 GiB]"
Jun 29 20:22:44 Aidev numactl[115262]: time=2025-06-29T20:22:44.958-07:00 level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="31.2 GiB" before.free="30.0 GiB" before.free_swap="8.0 GiB" now.total="31.2 GiB" now.free="30.0 GiB" now.free_swap="8.0 GiB"
Jun 29 20:22:44 Aidev numactl[115262]: initializing /usr/lib/wsl/lib/libcuda.so
Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuInit - 0x73202ec21dd0
Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuDriverGetVersion - 0x73202ec21d90
Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuDeviceGetCount - 0x73202ec21e0d
Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuDeviceGet - 0x73202ec21e07
Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuDeviceGetAttribute - 0x73202ec21cf0
Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuDeviceGetUuid - 0x73202ec21e19
Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuDeviceGetName - 0x73202ec21e13
Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuCtxCreate_v3 - 0x73202ec21e8b
Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuMemGetInfo_v2 - 0x73202ec21f8d
Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuCtxDestroy - 0x73202ec21e9d
Jun 29 20:22:44 Aidev numactl[115262]: calling cuInit
Jun 29 20:22:44 Aidev numactl[115262]: calling cuDriverGetVersion
Jun 29 20:22:44 Aidev numactl[115262]: raw version 0x2f3a
Jun 29 20:22:44 Aidev numactl[115262]: CUDA driver version: 12.9
Jun 29 20:22:44 Aidev numactl[115262]: calling cuDeviceGetCount
Jun 29 20:22:44 Aidev numactl[115262]: device count 2
Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.046-07:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-c6676d00-cd95-9a0e-46b3-f92becc20fea name="NVIDIA GeForce RTX 3060" overhead="0 B" before.total="12.0 GiB" before.free="11.0 GiB" now.total="12.0 GiB" now.free="11.0 GiB" now.used="1.0 GiB"
Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.131-07:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-aa5c4c9d-7ba4-53f2-f3d2-9b12ddf79c6f name="NVIDIA GeForce RTX 3060" overhead="0 B" before.total="12.0 GiB" before.free="11.0 GiB" now.total="12.0 GiB" now.free="11.0 GiB" now.used="1.0 GiB"
Jun 29 20:22:45 Aidev numactl[115262]: releasing cuda driver library
Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.132-07:00 level=DEBUG source=memory.go:111 msg=evaluating library=cuda gpu_count=2 available="[11.0 GiB 11.0 GiB]"
Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.133-07:00 level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="31.2 GiB" before.free="30.0 GiB" before.free_swap="8.0 GiB" now.total="31.2 GiB" now.free="30.0 GiB" now.free_swap="8.0 GiB"
Jun 29 20:22:45 Aidev numactl[115262]: initializing /usr/lib/wsl/lib/libcuda.so
Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuInit - 0x73202ec21dd0
Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuDriverGetVersion - 0x73202ec21d90
Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuDeviceGetCount - 0x73202ec21e0d
Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuDeviceGet - 0x73202ec21e07
Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuDeviceGetAttribute - 0x73202ec21cf0
Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuDeviceGetUuid - 0x73202ec21e19
Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuDeviceGetName - 0x73202ec21e13
Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuCtxCreate_v3 - 0x73202ec21e8b
Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuMemGetInfo_v2 - 0x73202ec21f8d
Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuCtxDestroy - 0x73202ec21e9d
Jun 29 20:22:45 Aidev numactl[115262]: calling cuInit
Jun 29 20:22:45 Aidev numactl[115262]: calling cuDriverGetVersion
Jun 29 20:22:45 Aidev numactl[115262]: raw version 0x2f3a
Jun 29 20:22:45 Aidev numactl[115262]: CUDA driver version: 12.9
Jun 29 20:22:45 Aidev numactl[115262]: calling cuDeviceGetCount
Jun 29 20:22:45 Aidev numactl[115262]: device count 2
Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.213-07:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-c6676d00-cd95-9a0e-46b3-f92becc20fea name="NVIDIA GeForce RTX 3060" overhead="0 B" before.total="12.0 GiB" before.free="11.0 GiB" now.total="12.0 GiB" now.free="11.0 GiB" now.used="1.0 GiB"
Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.298-07:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-aa5c4c9d-7ba4-53f2-f3d2-9b12ddf79c6f name="NVIDIA GeForce RTX 3060" overhead="0 B" before.total="12.0 GiB" before.free="11.0 GiB" now.total="12.0 GiB" now.free="11.0 GiB" now.used="1.0 GiB"
Jun 29 20:22:45 Aidev numactl[115262]: releasing cuda driver library
Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.298-07:00 level=DEBUG source=memory.go:111 msg=evaluating library=cuda gpu_count=2 available="[11.0 GiB 11.0 GiB]"
Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.299-07:00 level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="31.2 GiB" before.free="30.0 GiB" before.free_swap="8.0 GiB" now.total="31.2 GiB" now.free="30.0 GiB" now.free_swap="8.0 GiB"
Jun 29 20:22:45 Aidev numactl[115262]: initializing /usr/lib/wsl/lib/libcuda.so
Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuInit - 0x73202ec21dd0
Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuDriverGetVersion - 0x73202ec21d90
Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuDeviceGetCount - 0x73202ec21e0d
Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuDeviceGet - 0x73202ec21e07
Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuDeviceGetAttribute - 0x73202ec21cf0
Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuDeviceGetUuid - 0x73202ec21e19
Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuDeviceGetName - 0x73202ec21e13
Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuCtxCreate_v3 - 0x73202ec21e8b
Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuMemGetInfo_v2 - 0x73202ec21f8d
Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuCtxDestroy - 0x73202ec21e9d
Jun 29 20:22:45 Aidev numactl[115262]: calling cuInit
Jun 29 20:22:45 Aidev numactl[115262]: calling cuDriverGetVersion
Jun 29 20:22:45 Aidev numactl[115262]: raw version 0x2f3a
Jun 29 20:22:45 Aidev numactl[115262]: CUDA driver version: 12.9
Jun 29 20:22:45 Aidev numactl[115262]: calling cuDeviceGetCount
Jun 29 20:22:45 Aidev numactl[115262]: device count 2
Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.381-07:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-c6676d00-cd95-9a0e-46b3-f92becc20fea name="NVIDIA GeForce RTX 3060" overhead="0 B" before.total="12.0 GiB" before.free="11.0 GiB" now.total="12.0 GiB" now.free="11.0 GiB" now.used="1.0 GiB"
Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.465-07:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-aa5c4c9d-7ba4-53f2-f3d2-9b12ddf79c6f name="NVIDIA GeForce RTX 3060" overhead="0 B" before.total="12.0 GiB" before.free="11.0 GiB" now.total="12.0 GiB" now.free="11.0 GiB" now.used="1.0 GiB"
Jun 29 20:22:45 Aidev numactl[115262]: releasing cuda driver library
Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.466-07:00 level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="31.2 GiB" before.free="30.0 GiB" before.free_swap="8.0 GiB" now.total="31.2 GiB" now.free="30.0 GiB" now.free_swap="8.0 GiB"
Jun 29 20:22:45 Aidev numactl[115262]: initializing /usr/lib/wsl/lib/libcuda.so
Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuInit - 0x73202ec21dd0
Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuDriverGetVersion - 0x73202ec21d90
Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuDeviceGetCount - 0x73202ec21e0d
Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuDeviceGet - 0x73202ec21e07
Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuDeviceGetAttribute - 0x73202ec21cf0
Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuDeviceGetUuid - 0x73202ec21e19
Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuDeviceGetName - 0x73202ec21e13
Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuCtxCreate_v3 - 0x73202ec21e8b
Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuMemGetInfo_v2 - 0x73202ec21f8d
Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuCtxDestroy - 0x73202ec21e9d
Jun 29 20:22:45 Aidev numactl[115262]: calling cuInit
Jun 29 20:22:45 Aidev numactl[115262]: calling cuDriverGetVersion
Jun 29 20:22:45 Aidev numactl[115262]: raw version 0x2f3a
Jun 29 20:22:45 Aidev numactl[115262]: CUDA driver version: 12.9
Jun 29 20:22:45 Aidev numactl[115262]: calling cuDeviceGetCount
Jun 29 20:22:45 Aidev numactl[115262]: device count 2
Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.543-07:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-c6676d00-cd95-9a0e-46b3-f92becc20fea name="NVIDIA GeForce RTX 3060" overhead="0 B" before.total="12.0 GiB" before.free="11.0 GiB" now.total="12.0 GiB" now.free="11.0 GiB" now.used="1.0 GiB"
Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.633-07:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-aa5c4c9d-7ba4-53f2-f3d2-9b12ddf79c6f name="NVIDIA GeForce RTX 3060" overhead="0 B" before.total="12.0 GiB" before.free="11.0 GiB" now.total="12.0 GiB" now.free="11.0 GiB" now.used="1.0 GiB"
Jun 29 20:22:45 Aidev numactl[115262]: releasing cuda driver library
Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.633-07:00 level=INFO source=server.go:135 msg="system memory" total="31.2 GiB" free="30.0 GiB" free_swap="8.0 GiB"
Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.633-07:00 level=DEBUG source=memory.go:111 msg=evaluating library=cuda gpu_count=2 available="[11.0 GiB 11.0 GiB]"
Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.633-07:00 level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="31.2 GiB" before.free="30.0 GiB" before.free_swap="8.0 GiB" now.total="31.2 GiB" now.free="30.0 GiB" now.free_swap="8.0 GiB"
Jun 29 20:22:45 Aidev numactl[115262]: initializing /usr/lib/wsl/lib/libcuda.so
Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuInit - 0x73202ec21dd0
Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuDriverGetVersion - 0x73202ec21d90
Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuDeviceGetCount - 0x73202ec21e0d
Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuDeviceGet - 0x73202ec21e07
Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuDeviceGetAttribute - 0x73202ec21cf0
Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuDeviceGetUuid - 0x73202ec21e19
Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuDeviceGetName - 0x73202ec21e13
Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuCtxCreate_v3 - 0x73202ec21e8b
Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuMemGetInfo_v2 - 0x73202ec21f8d
Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuCtxDestroy - 0x73202ec21e9d
Jun 29 20:22:45 Aidev numactl[115262]: calling cuInit
Jun 29 20:22:45 Aidev numactl[115262]: calling cuDriverGetVersion
Jun 29 20:22:45 Aidev numactl[115262]: raw version 0x2f3a
Jun 29 20:22:45 Aidev numactl[115262]: CUDA driver version: 12.9
Jun 29 20:22:45 Aidev numactl[115262]: calling cuDeviceGetCount
Jun 29 20:22:45 Aidev numactl[115262]: device count 2
Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.710-07:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-c6676d00-cd95-9a0e-46b3-f92becc20fea name="NVIDIA GeForce RTX 3060" overhead="0 B" before.total="12.0 GiB" before.free="11.0 GiB" now.total="12.0 GiB" now.free="11.0 GiB" now.used="1.0 GiB"
Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.792-07:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-aa5c4c9d-7ba4-53f2-f3d2-9b12ddf79c6f name="NVIDIA GeForce RTX 3060" overhead="0 B" before.total="12.0 GiB" before.free="11.0 GiB" now.total="12.0 GiB" now.free="11.0 GiB" now.used="1.0 GiB"
Jun 29 20:22:45 Aidev numactl[115262]: releasing cuda driver library
Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.793-07:00 level=INFO source=server.go:168 msg=offload library=cuda layers.requested=999 layers.model=35 layers.offload=35 layers.split=18,17 memory.available="[11.0 GiB 11.0 GiB]" memory.gpu_overhead="0 B" memory.required.full="8.1 GiB" memory.required.partial="8.1 GiB" memory.required.kv="334.0 MiB" memory.required.allocations="[5.5 GiB 2.6 GiB]" memory.weights.total="2.9 GiB" memory.weights.repeating="1.7 GiB" memory.weights.nonrepeating="1.3 GiB" memory.graph.full="1.0 GiB" memory.graph.partial="1.0 GiB" projector.weights="806.2 MiB" projector.graph="1.0 GiB"
Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.793-07:00 level=INFO source=server.go:211 msg="enabling flash attention"
Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.793-07:00 level=DEBUG source=server.go:284 msg="compatible gpu libraries" compatible=[]
Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.814-07:00 level=DEBUG source=ggml.go:155 msg="key not found" key=general.alignment default=32
Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.815-07:00 level=DEBUG source=ggml.go:155 msg="key not found" key=tokenizer.ggml.eot_token_id default=106
Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.815-07:00 level=DEBUG source=ggml.go:155 msg="key not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}"
Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.816-07:00 level=DEBUG source=ggml.go:155 msg="key not found" key=gemma3.rope.local.freq_base default=10000
Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.816-07:00 level=DEBUG source=ggml.go:155 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06
Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.816-07:00 level=DEBUG source=ggml.go:155 msg="key not found" key=gemma3.rope.freq_scale default=1
Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.816-07:00 level=DEBUG source=ggml.go:155 msg="key not found" key=gemma3.mm_tokens_per_image default=256
Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.817-07:00 level=INFO source=server.go:431 msg="starting llama server" cmd="/usr/local/bin/ollama runner --ollama-engine --model /usr/share/ollama/.ollama/models/blobs/sha256-529850705c0884a283b87d3b261d36ee30821e16f0310962ba977b456ad3b8cd --ctx-size 8192 --batch-size 512 --n-gpu-layers 999 --threads 10 --flash-attn --kv-cache-type f16 --parallel 1 --tensor-split 18,17 --port 38963"
Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.817-07:00 level=DEBUG source=server.go:432 msg=subprocess PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/snap/bin OLLAMA_KEEP_ALIVE=-1 OLLAMA_FLASH_ATTENTION=1 OLLAMA_KV_CACHE_TYPE=f16 LD_LIBRARY_PATH=/usr/local/lib/ollama:/usr/share/ollama:/usr/local/lib/ollama OLLAMA_DEBUG=1 OLLAMA_MAX_LOADED_MODELS=6 OLLAMA_LIBRARY_PATH=/usr/local/lib/ollama CUDA_VISIBLE_DEVICES=GPU-c6676d00-cd95-9a0e-46b3-f92becc20fea,GPU-aa5c4c9d-7ba4-53f2-f3d2-9b12ddf79c6f
Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.817-07:00 level=INFO source=sched.go:483 msg="loaded runners" count=1
Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.817-07:00 level=INFO source=server.go:591 msg="waiting for llama runner to start responding"
Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.818-07:00 level=INFO source=server.go:625 msg="waiting for server to become available" status="llm server not responding"
Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.828-07:00 level=INFO source=runner.go:925 msg="starting ollama engine"
Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.831-07:00 level=INFO source=runner.go:983 msg="Server listening on 127.0.0.1:38963"
Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.853-07:00 level=DEBUG source=ggml.go:155 msg="key not found" key=general.alignment default=32
Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.853-07:00 level=DEBUG source=ggml.go:155 msg="key not found" key=general.description default=""
Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.853-07:00 level=INFO source=ggml.go:92 msg="" architecture=gemma3 file_type=Q4_0 name="Gemma3 4b It Qa_0 Qat Hf" description="" num_tensors=883 num_key_values=42
Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.853-07:00 level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/local/lib/ollama
Jun 29 20:22:45 Aidev numactl[115262]: ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
Jun 29 20:22:45 Aidev numactl[115262]: ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
Jun 29 20:22:45 Aidev numactl[115262]: ggml_cuda_init: found 2 CUDA devices:
Jun 29 20:22:45 Aidev numactl[115262]:   Device 0: NVIDIA GeForce RTX 3060, compute capability 8.6, VMM: yes
Jun 29 20:22:45 Aidev numactl[115262]:   Device 1: NVIDIA GeForce RTX 3060, compute capability 8.6, VMM: yes
Jun 29 20:22:46 Aidev numactl[115262]: load_backend: loaded CUDA backend from /usr/local/lib/ollama/libggml-cuda.so
Jun 29 20:22:46 Aidev numactl[115262]: load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-alderlake.so
Jun 29 20:22:46 Aidev numactl[115262]: time=2025-06-29T20:22:46.020-07:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX_VNNI=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=500,600,610,700,750,800,860,870,890,900,1200 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 CUDA.1.ARCHS=500,600,610,700,750,800,860,870,890,900,1200 CUDA.1.USE_GRAPHS=1 CUDA.1.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc)
Jun 29 20:22:46 Aidev numactl[115262]: time=2025-06-29T20:22:46.069-07:00 level=INFO source=server.go:625 msg="waiting for server to become available" status="llm server loading model"
Jun 29 20:22:46 Aidev numactl[115262]: time=2025-06-29T20:22:46.117-07:00 level=INFO source=ggml.go:351 msg="model weights" buffer=CUDA0 size="912.0 MiB"
Jun 29 20:22:46 Aidev numactl[115262]: time=2025-06-29T20:22:46.117-07:00 level=INFO source=ggml.go:351 msg="model weights" buffer=CUDA1 size="2.8 GiB"
Jun 29 20:22:46 Aidev numactl[115262]: time=2025-06-29T20:22:46.117-07:00 level=INFO source=ggml.go:351 msg="model weights" buffer=CPU size="1.3 GiB"
Jun 29 20:22:46 Aidev numactl[115262]: time=2025-06-29T20:22:46.118-07:00 level=DEBUG source=ggml.go:155 msg="key not found" key=tokenizer.ggml.eot_token_id default=106
Jun 29 20:22:46 Aidev numactl[115262]: time=2025-06-29T20:22:46.118-07:00 level=DEBUG source=ggml.go:155 msg="key not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}"
Jun 29 20:22:46 Aidev numactl[115262]: time=2025-06-29T20:22:46.119-07:00 level=DEBUG source=ggml.go:155 msg="key not found" key=gemma3.rope.local.freq_base default=10000
Jun 29 20:22:46 Aidev numactl[115262]: time=2025-06-29T20:22:46.119-07:00 level=DEBUG source=ggml.go:155 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06
Jun 29 20:22:46 Aidev numactl[115262]: time=2025-06-29T20:22:46.119-07:00 level=DEBUG source=ggml.go:155 msg="key not found" key=gemma3.rope.freq_scale default=1
Jun 29 20:22:46 Aidev numactl[115262]: time=2025-06-29T20:22:46.119-07:00 level=DEBUG source=ggml.go:155 msg="key not found" key=gemma3.mm_tokens_per_image default=256
Jun 29 20:22:46 Aidev numactl[115262]: time=2025-06-29T20:22:46.319-07:00 level=DEBUG source=ggml.go:620 msg="compute graph" nodes=972 splits=1
Jun 29 20:22:46 Aidev numactl[115262]: time=2025-06-29T20:22:46.319-07:00 level=INFO source=ggml.go:638 msg="compute graph" backend=CUDA0 buffer_type=CUDA0 size="0 B"
Jun 29 20:22:46 Aidev numactl[115262]: time=2025-06-29T20:22:46.319-07:00 level=INFO source=ggml.go:638 msg="compute graph" backend=CUDA1 buffer_type=CUDA1 size="1.1 GiB"
Jun 29 20:22:46 Aidev numactl[115262]: time=2025-06-29T20:22:46.319-07:00 level=INFO source=ggml.go:638 msg="compute graph" backend=CPU buffer_type=CPU size="0 B"
Jun 29 20:22:46 Aidev numactl[115262]: time=2025-06-29T20:22:46.368-07:00 level=DEBUG source=ggml.go:620 msg="compute graph" nodes=1369 splits=3
Jun 29 20:22:46 Aidev numactl[115262]: time=2025-06-29T20:22:46.369-07:00 level=INFO source=ggml.go:638 msg="compute graph" backend=CUDA0 buffer_type=CUDA0 size="131.0 MiB"
Jun 29 20:22:46 Aidev numactl[115262]: time=2025-06-29T20:22:46.369-07:00 level=INFO source=ggml.go:638 msg="compute graph" backend=CUDA1 buffer_type=CUDA1 size="1.1 GiB"
Jun 29 20:22:46 Aidev numactl[115262]: time=2025-06-29T20:22:46.369-07:00 level=INFO source=ggml.go:638 msg="compute graph" backend=CPU buffer_type=CPU size="5.0 MiB"
Jun 29 20:22:46 Aidev numactl[115262]: time=2025-06-29T20:22:46.369-07:00 level=DEBUG source=runner.go:883 msg=memory allocated.InputWeights=1342504960A allocated.CPU.Graph=5242880A allocated.CUDA0.Weights="[53127168A 53127168A 53127168A 53127168A 53127168A 53127168A 53127168A 53127168A 53127168A 53127168A 53127168A 53127168A 53127168A 53127168A 53127168A 53127168A 53127168A 53127168A 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U]" allocated.CUDA0.Cache="[6291456A 6291456A 6291456A 6291456A 6291456A 33554432A 6291456A 6291456A 6291456A 6291456A 6291456A 33554432A 6291456A 6291456A 6291456A 6291456A 6291456A 33554432A 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U]" allocated.CUDA0.Graph=137371648A allocated.CUDA1.Weights="[0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 53127168A 53127168A 53127168A 53127168A 53127168A 53127168A 53127168A 53127168A 53127168A 53127168A 53127168A 53127168A 53127168A 53127168A 53127168A 53127168A 2193742976A]" allocated.CUDA1.Cache="[0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 6291456A 6291456A 6291456A 6291456A 6291456A 33554432A 6291456A 6291456A 6291456A 6291456A 6291456A 33554432A 6291456A 6291456A 6291456A 6291456A 0U]" allocated.CUDA1.Graph=1212612608A
Jun 29 20:22:46 Aidev numactl[115262]: time=2025-06-29T20:22:46.570-07:00 level=DEBUG source=server.go:636 msg="model load progress 0.12"
Jun 29 20:22:46 Aidev numactl[115262]: time=2025-06-29T20:22:46.822-07:00 level=DEBUG source=server.go:636 msg="model load progress 0.30"
Jun 29 20:22:47 Aidev numactl[115262]: time=2025-06-29T20:22:47.073-07:00 level=DEBUG source=server.go:636 msg="model load progress 0.41"
Jun 29 20:22:47 Aidev numactl[115262]: time=2025-06-29T20:22:47.324-07:00 level=DEBUG source=server.go:636 msg="model load progress 0.53"
Jun 29 20:22:47 Aidev numactl[115262]: time=2025-06-29T20:22:47.575-07:00 level=DEBUG source=server.go:636 msg="model load progress 0.64"
Jun 29 20:22:47 Aidev numactl[115262]: time=2025-06-29T20:22:47.826-07:00 level=DEBUG source=server.go:636 msg="model load progress 0.74"
Jun 29 20:22:48 Aidev numactl[115262]: time=2025-06-29T20:22:48.077-07:00 level=DEBUG source=server.go:636 msg="model load progress 0.78"
Jun 29 20:22:48 Aidev numactl[115262]: time=2025-06-29T20:22:48.328-07:00 level=DEBUG source=server.go:636 msg="model load progress 0.81"
Jun 29 20:22:48 Aidev numactl[115262]: time=2025-06-29T20:22:48.579-07:00 level=DEBUG source=server.go:636 msg="model load progress 0.85"
Jun 29 20:22:48 Aidev numactl[115262]: time=2025-06-29T20:22:48.830-07:00 level=DEBUG source=server.go:636 msg="model load progress 0.89"
Jun 29 20:22:49 Aidev numactl[115262]: time=2025-06-29T20:22:49.081-07:00 level=DEBUG source=server.go:636 msg="model load progress 0.93"
Jun 29 20:22:49 Aidev numactl[115262]: time=2025-06-29T20:22:49.331-07:00 level=DEBUG source=server.go:636 msg="model load progress 0.97"
Jun 29 20:22:49 Aidev numactl[115262]: time=2025-06-29T20:22:49.582-07:00 level=INFO source=server.go:630 msg="llama runner started in 3.76 seconds"
Jun 29 20:22:49 Aidev numactl[115262]: time=2025-06-29T20:22:49.582-07:00 level=DEBUG source=sched.go:495 msg="finished setting up" runner.name=registry.ollama.ai/library/hfs-ai-model-test:latest runner.inference=cuda runner.devices=2 runner.size="8.1 GiB" runner.vram="8.1 GiB" runner.parallel=1 runner.pid=115323 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-529850705c0884a283b87d3b261d36ee30821e16f0310962ba977b456ad3b8cd runner.num_ctx=8192
Jun 29 20:22:49 Aidev numactl[115262]: time=2025-06-29T20:22:49.743-07:00 level=DEBUG source=server.go:729 msg="completion request" images=0 prompt=3145 format=""
Jun 29 20:22:49 Aidev numactl[115262]: time=2025-06-29T20:22:49.902-07:00 level=DEBUG source=vocabulary.go:52 msg="adding bos token to prompt" id=[2]
Jun 29 20:22:49 Aidev numactl[115262]: time=2025-06-29T20:22:49.902-07:00 level=DEBUG source=cache.go:136 msg="loading cache slot" id=0 cache=0 prompt=764 used=0 remaining=764
Jun 29 20:22:52 Aidev numactl[115262]: [GIN] 2025/06/29 - 20:22:52 | 200 |  6.509760274s |       127.0.0.1 | POST     "/v1/chat/completions"
Jun 29 20:22:52 Aidev numactl[115262]: time=2025-06-29T20:22:52.781-07:00 level=DEBUG source=sched.go:503 msg="context for request finished"
Jun 29 20:22:52 Aidev numactl[115262]: time=2025-06-29T20:22:52.781-07:00 level=DEBUG source=sched.go:343 msg="runner with non-zero duration has gone idle, adding timer" runner.name=registry.ollama.ai/library/hfs-ai-model-test:latest runner.inference=cuda runner.devices=2 runner.size="8.1 GiB" runner.vram="8.1 GiB" runner.parallel=1 runner.pid=115323 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-529850705c0884a283b87d3b261d36ee30821e16f0310962ba977b456ad3b8cd runner.num_ctx=8192 duration=2562047h47m16.854775807s
Jun 29 20:22:52 Aidev numactl[115262]: time=2025-06-29T20:22:52.781-07:00 level=DEBUG source=sched.go:361 msg="after processing request finished event" runner.name=registry.ollama.ai/library/hfs-ai-model-test:latest runner.inference=cuda runner.devices=2 runner.size="8.1 GiB" runner.vram="8.1 GiB" runner.parallel=1 runner.pid=115323 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-529850705c0884a283b87d3b261d36ee30821e16f0310962ba977b456ad3b8cd runner.num_ctx=8192 refCount=0
Jun 29 20:22:52 Aidev numactl[115262]: [GIN] 2025/06/29 - 20:22:52 | 200 |      28.105µs |       127.0.0.1 | GET      "/"
Jun 29 20:23:02 Aidev numactl[115262]: [GIN] 2025/06/29 - 20:23:02 | 200 |      26.642µs |       127.0.0.1 | GET      "/"
Jun 29 20:23:02 Aidev numactl[115262]: time=2025-06-29T20:23:02.624-07:00 level=DEBUG source=ggml.go:155 msg="key not found" key=general.alignment default=32
Jun 29 20:23:02 Aidev numactl[115262]: time=2025-06-29T20:23:02.625-07:00 level=DEBUG source=sched.go:615 msg="evaluating already loaded" model=/usr/share/ollama/.ollama/models/blobs/sha256-529850705c0884a283b87d3b261d36ee30821e16f0310962ba977b456ad3b8cd
Jun 29 20:23:45 Aidev numactl[115262]: time=2025-06-29T20:23:45.176-07:00 level=DEBUG source=server.go:729 msg="completion request" images=0 prompt=18213 format=""
Jun 29 20:23:55 Aidev numactl[115262]: time=2025-06-29T20:23:55.303-07:00 level=DEBUG source=vocabulary.go:52 msg="adding bos token to prompt" id=[2]
Jun 29 20:23:55 Aidev numactl[115262]: time=2025-06-29T20:23:55.303-07:00 level=DEBUG source=cache.go:136 msg="loading cache slot" id=0 cache=792 prompt=5355 used=63 remaining=5292
Jun 29 20:23:56 Aidev numactl[115262]: time=2025-06-29T20:23:56.054-07:00 level=DEBUG source=causal.go:386 msg="defragmenting kv cache"
Jun 29 20:23:56 Aidev numactl[115262]: time=2025-06-29T20:23:56.342-07:00 level=DEBUG source=causal.go:386 msg="defragmenting kv cache"
Jun 29 20:23:56 Aidev numactl[115262]: time=2025-06-29T20:23:56.544-07:00 level=DEBUG source=causal.go:386 msg="defragmenting kv cache"
Jun 29 20:23:56 Aidev numactl[115262]: time=2025-06-29T20:23:56.892-07:00 level=DEBUG source=causal.go:386 msg="defragmenting kv cache"
Jun 29 20:23:57 Aidev numactl[115262]: time=2025-06-29T20:23:57.101-07:00 level=DEBUG source=causal.go:386 msg="defragmenting kv cache"

OS

Windows

GPU

Nvidia

CPU

Intel

Ollama version

Logs from 0.9.2 which I had downgraded to from 0.9.3 in trying to troubleshoot.

Originally created by @eowensai on GitHub (Jun 30, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/11240 ### What is the issue? **Environment:** Host OS: Windows 11 Pro WSL2 Distribution: Ubuntu 24.04 GPU: 2x NVIDIA GeForce RTX 3060 (12GB) NVIDIA Driver Version: 576.80 (on Windows Host) Model: gemma-3-4b-it-qat.gguf **What happened?** In this WSL2 environment, Ollama consistently uses the CPU for the initial prefill phase of prompt processing. For large prompts, this CPU-bound prefill can take several minutes to complete, during which nvidia-smi reports 0% GPU utilization (although the full model can be seen sharded onto the gpus). Once this phase is over, token generation correctly utilizes the GPUs and achieves expected performance. I've only seen this with Ollama running in the WSL2 environment, it is not reproducible when running Ollama natively on the Windows host. The behavior persists even when num_ctx is reduced from 128k to 8k. The issue also occurs in v0.9.3 (what I started with, going down to 0.9.2 was a troubleshooting step) Several environment and system configurations were also tested with no change in the prefill behavior, including: - Using numactl to set the memory interleave policy. - Explicitly setting CUDA_VISIBLE_DEVICES=0,1. - Forcing the KV cache type with OLLAMA_KV_CACHE_TYPE=f16 in an attempt to address a "kv cache type not supported" warning found in logs. - Attempting to resolve potential library path issues via symbolic links (ln -s) and by explicitly setting LD_LIBRARY_PATH. The core issue appears to be a failure within the Ollama runner process to initialize a GPU-compatible backend for the prefill task, forcing a fallback to the CPU, despite being able to use the GPU for the decoding task. **What did you expect to happen?** The prefill computation for all prompts should be executed on the GPU, leveraging the available VRAM and compute capabilities. CPU usage should remain low during prefill, and nvidia-smi should show high GPU utilization, resulting in fast prompt processing times, consistent with the performance of native Windows Ollama. ### Relevant log output ```shell Jun 29 20:22:04 Aidev systemd[1]: Stopping ollama.service - Ollama Service (v0.9.2 Manual Install)... Jun 29 20:22:04 Aidev numactl[97909]: time=2025-06-29T20:22:04.904-07:00 level=DEBUG source=sched.go:872 msg="shutting down runner" model=/usr/share/ollama/.ollama/models/blobs/sha256-529850705c0884a283b87d3b261d36ee30821e16f0310962ba977b456ad3b8cd Jun 29 20:22:04 Aidev numactl[97909]: time=2025-06-29T20:22:04.904-07:00 level=DEBUG source=server.go:1024 msg="stopping llama server" pid=110636 Jun 29 20:22:04 Aidev numactl[97909]: time=2025-06-29T20:22:04.904-07:00 level=DEBUG source=server.go:1030 msg="waiting for llama server to exit" pid=110636 Jun 29 20:22:04 Aidev numactl[97909]: time=2025-06-29T20:22:04.904-07:00 level=DEBUG source=sched.go:322 msg="shutting down scheduler completed loop" Jun 29 20:22:04 Aidev numactl[97909]: time=2025-06-29T20:22:04.904-07:00 level=DEBUG source=sched.go:122 msg="shutting down scheduler pending loop" Jun 29 20:22:05 Aidev numactl[97909]: time=2025-06-29T20:22:05.043-07:00 level=DEBUG source=server.go:1034 msg="llama server stopped" pid=110636 Jun 29 20:22:05 Aidev systemd[1]: ollama.service: Deactivated successfully. Jun 29 20:22:05 Aidev systemd[1]: Stopped ollama.service - Ollama Service (v0.9.2 Manual Install). Jun 29 20:22:05 Aidev systemd[1]: ollama.service: Consumed 17min 59.851s CPU time, 5.5G memory peak, 0B memory swap peak. Jun 29 20:22:05 Aidev systemd[1]: Started ollama.service - Ollama Service (v0.9.2 Manual Install). Jun 29 20:22:05 Aidev numactl[115262]: time=2025-06-29T20:22:05.110-07:00 level=INFO source=routes.go:1235 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:DEBUG OLLAMA_FLASH_ATTENTION:true OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:2562047h47m16.854775807s OLLAMA_KV_CACHE_TYPE:f16 OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/usr/share/ollama/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]" Jun 29 20:22:05 Aidev numactl[115262]: time=2025-06-29T20:22:05.111-07:00 level=INFO source=images.go:480 msg="total blobs: 8" Jun 29 20:22:05 Aidev numactl[115262]: time=2025-06-29T20:22:05.111-07:00 level=INFO source=images.go:487 msg="total unused blobs removed: 0" Jun 29 20:22:05 Aidev numactl[115262]: time=2025-06-29T20:22:05.111-07:00 level=INFO source=routes.go:1288 msg="Listening on 127.0.0.1:11434 (version 0.9.2)" Jun 29 20:22:05 Aidev numactl[115262]: time=2025-06-29T20:22:05.112-07:00 level=DEBUG source=sched.go:108 msg="starting llm scheduler" Jun 29 20:22:05 Aidev numactl[115262]: time=2025-06-29T20:22:05.112-07:00 level=INFO source=gpu.go:217 msg="looking for compatible GPUs" Jun 29 20:22:05 Aidev numactl[115262]: time=2025-06-29T20:22:05.112-07:00 level=DEBUG source=gpu.go:98 msg="searching for GPU discovery libraries for NVIDIA" Jun 29 20:22:05 Aidev numactl[115262]: time=2025-06-29T20:22:05.112-07:00 level=DEBUG source=gpu.go:501 msg="Searching for GPU library" name=libcuda.so* Jun 29 20:22:05 Aidev numactl[115262]: time=2025-06-29T20:22:05.112-07:00 level=DEBUG source=gpu.go:525 msg="gpu library search" globs="[/usr/local/lib/ollama/libcuda.so* /usr/share/ollama/libcuda.so* /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]" Jun 29 20:22:05 Aidev numactl[115262]: time=2025-06-29T20:22:05.940-07:00 level=DEBUG source=gpu.go:558 msg="discovered GPU libraries" paths="[/usr/lib/x86_64-linux-gnu/libcuda.so.535.230.02 /usr/lib/wsl/lib/libcuda.so /usr/lib/wsl/lib/libcuda.so.1 /usr/lib/wsl/lib/libcuda.so.1.1 /usr/lib/wsl/drivers/nv_dispig.inf_amd64_0afec3f2050014a0/libcuda.so.1.1 /usr/lib/wsl/drivers/nv_dispsi.inf_amd64_bc81edc432675578/libcuda.so.1.1]" Jun 29 20:22:05 Aidev numactl[115262]: initializing /usr/lib/x86_64-linux-gnu/libcuda.so.535.230.02 Jun 29 20:22:05 Aidev numactl[115262]: dlsym: cuInit - 0x73202d4c2470 Jun 29 20:22:05 Aidev numactl[115262]: dlsym: cuDriverGetVersion - 0x73202d4c2490 Jun 29 20:22:05 Aidev numactl[115262]: dlsym: cuDeviceGetCount - 0x73202d4c24d0 Jun 29 20:22:05 Aidev numactl[115262]: dlsym: cuDeviceGet - 0x73202d4c24b0 Jun 29 20:22:05 Aidev numactl[115262]: dlsym: cuDeviceGetAttribute - 0x73202d4c25b0 Jun 29 20:22:05 Aidev numactl[115262]: dlsym: cuDeviceGetUuid - 0x73202d4c2510 Jun 29 20:22:05 Aidev numactl[115262]: dlsym: cuDeviceGetName - 0x73202d4c24f0 Jun 29 20:22:05 Aidev numactl[115262]: dlsym: cuCtxCreate_v3 - 0x73202d4ca170 Jun 29 20:22:05 Aidev numactl[115262]: dlsym: cuMemGetInfo_v2 - 0x73202d4d5640 Jun 29 20:22:05 Aidev numactl[115262]: dlsym: cuCtxDestroy - 0x73202d524640 Jun 29 20:22:05 Aidev numactl[115262]: calling cuInit Jun 29 20:22:05 Aidev numactl[115262]: cuInit err: 100 Jun 29 20:22:05 Aidev numactl[115262]: time=2025-06-29T20:22:05.960-07:00 level=INFO source=gpu.go:602 msg="no nvidia devices detected by library /usr/lib/x86_64-linux-gnu/libcuda.so.535.230.02" Jun 29 20:22:05 Aidev numactl[115262]: initializing /usr/lib/wsl/lib/libcuda.so Jun 29 20:22:05 Aidev numactl[115262]: dlsym: cuInit - 0x73202ec21dd0 Jun 29 20:22:05 Aidev numactl[115262]: dlsym: cuDriverGetVersion - 0x73202ec21d90 Jun 29 20:22:05 Aidev numactl[115262]: dlsym: cuDeviceGetCount - 0x73202ec21e0d Jun 29 20:22:05 Aidev numactl[115262]: dlsym: cuDeviceGet - 0x73202ec21e07 Jun 29 20:22:05 Aidev numactl[115262]: dlsym: cuDeviceGetAttribute - 0x73202ec21cf0 Jun 29 20:22:05 Aidev numactl[115262]: dlsym: cuDeviceGetUuid - 0x73202ec21e19 Jun 29 20:22:05 Aidev numactl[115262]: dlsym: cuDeviceGetName - 0x73202ec21e13 Jun 29 20:22:05 Aidev numactl[115262]: dlsym: cuCtxCreate_v3 - 0x73202ec21e8b Jun 29 20:22:05 Aidev numactl[115262]: dlsym: cuMemGetInfo_v2 - 0x73202ec21f8d Jun 29 20:22:05 Aidev numactl[115262]: dlsym: cuCtxDestroy - 0x73202ec21e9d Jun 29 20:22:05 Aidev numactl[115262]: calling cuInit Jun 29 20:22:06 Aidev numactl[115262]: calling cuDriverGetVersion Jun 29 20:22:06 Aidev numactl[115262]: raw version 0x2f3a Jun 29 20:22:06 Aidev numactl[115262]: CUDA driver version: 12.9 Jun 29 20:22:06 Aidev numactl[115262]: calling cuDeviceGetCount Jun 29 20:22:06 Aidev numactl[115262]: device count 2 Jun 29 20:22:06 Aidev numactl[115262]: time=2025-06-29T20:22:06.011-07:00 level=DEBUG source=gpu.go:125 msg="detected GPUs" count=2 library=/usr/lib/wsl/lib/libcuda.so Jun 29 20:22:06 Aidev numactl[115262]: [GPU-c6676d00-cd95-9a0e-46b3-f92becc20fea] CUDA totalMem 12287mb Jun 29 20:22:06 Aidev numactl[115262]: [GPU-c6676d00-cd95-9a0e-46b3-f92becc20fea] CUDA freeMem 11247mb Jun 29 20:22:06 Aidev numactl[115262]: [GPU-c6676d00-cd95-9a0e-46b3-f92becc20fea] Compute Capability 8.6 Jun 29 20:22:06 Aidev numactl[115262]: [GPU-aa5c4c9d-7ba4-53f2-f3d2-9b12ddf79c6f] CUDA totalMem 12287mb Jun 29 20:22:06 Aidev numactl[115262]: [GPU-aa5c4c9d-7ba4-53f2-f3d2-9b12ddf79c6f] CUDA freeMem 11242mb Jun 29 20:22:06 Aidev numactl[115262]: [GPU-aa5c4c9d-7ba4-53f2-f3d2-9b12ddf79c6f] Compute Capability 8.6 Jun 29 20:22:06 Aidev numactl[115262]: time=2025-06-29T20:22:06.252-07:00 level=DEBUG source=amd_linux.go:419 msg="amdgpu driver not detected /sys/module/amdgpu" Jun 29 20:22:06 Aidev numactl[115262]: releasing cuda driver library Jun 29 20:22:06 Aidev numactl[115262]: time=2025-06-29T20:22:06.252-07:00 level=INFO source=types.go:130 msg="inference compute" id=GPU-c6676d00-cd95-9a0e-46b3-f92becc20fea library=cuda variant=v12 compute=8.6 driver=12.9 name="NVIDIA GeForce RTX 3060" total="12.0 GiB" available="11.0 GiB" Jun 29 20:22:06 Aidev numactl[115262]: time=2025-06-29T20:22:06.252-07:00 level=INFO source=types.go:130 msg="inference compute" id=GPU-aa5c4c9d-7ba4-53f2-f3d2-9b12ddf79c6f library=cuda variant=v12 compute=8.6 driver=12.9 name="NVIDIA GeForce RTX 3060" total="12.0 GiB" available="11.0 GiB" Jun 29 20:22:39 Aidev numactl[115262]: [GIN] 2025/06/29 - 20:22:39 | 200 | 81.082µs | 127.0.0.1 | GET "/" Jun 29 20:22:44 Aidev numactl[115262]: [GIN] 2025/06/29 - 20:22:44 | 200 | 22.772µs | 127.0.0.1 | GET "/" Jun 29 20:22:44 Aidev numactl[115262]: [GIN] 2025/06/29 - 20:22:44 | 200 | 19.863µs | 127.0.0.1 | GET "/" Jun 29 20:22:44 Aidev numactl[115262]: time=2025-06-29T20:22:44.166-07:00 level=DEBUG source=ggml.go:155 msg="key not found" key=general.alignment default=32 Jun 29 20:22:44 Aidev numactl[115262]: time=2025-06-29T20:22:44.166-07:00 level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="31.2 GiB" before.free="30.0 GiB" before.free_swap="8.0 GiB" now.total="31.2 GiB" now.free="30.0 GiB" now.free_swap="8.0 GiB" Jun 29 20:22:44 Aidev numactl[115262]: initializing /usr/lib/wsl/lib/libcuda.so Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuInit - 0x73202ec21dd0 Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuDriverGetVersion - 0x73202ec21d90 Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuDeviceGetCount - 0x73202ec21e0d Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuDeviceGet - 0x73202ec21e07 Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuDeviceGetAttribute - 0x73202ec21cf0 Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuDeviceGetUuid - 0x73202ec21e19 Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuDeviceGetName - 0x73202ec21e13 Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuCtxCreate_v3 - 0x73202ec21e8b Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuMemGetInfo_v2 - 0x73202ec21f8d Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuCtxDestroy - 0x73202ec21e9d Jun 29 20:22:44 Aidev numactl[115262]: calling cuInit Jun 29 20:22:44 Aidev numactl[115262]: calling cuDriverGetVersion Jun 29 20:22:44 Aidev numactl[115262]: raw version 0x2f3a Jun 29 20:22:44 Aidev numactl[115262]: CUDA driver version: 12.9 Jun 29 20:22:44 Aidev numactl[115262]: calling cuDeviceGetCount Jun 29 20:22:44 Aidev numactl[115262]: device count 2 Jun 29 20:22:44 Aidev numactl[115262]: time=2025-06-29T20:22:44.265-07:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-c6676d00-cd95-9a0e-46b3-f92becc20fea name="NVIDIA GeForce RTX 3060" overhead="0 B" before.total="12.0 GiB" before.free="11.0 GiB" now.total="12.0 GiB" now.free="11.0 GiB" now.used="1.0 GiB" Jun 29 20:22:44 Aidev numactl[115262]: time=2025-06-29T20:22:44.373-07:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-aa5c4c9d-7ba4-53f2-f3d2-9b12ddf79c6f name="NVIDIA GeForce RTX 3060" overhead="0 B" before.total="12.0 GiB" before.free="11.0 GiB" now.total="12.0 GiB" now.free="11.0 GiB" now.used="1.0 GiB" Jun 29 20:22:44 Aidev numactl[115262]: releasing cuda driver library Jun 29 20:22:44 Aidev numactl[115262]: time=2025-06-29T20:22:44.373-07:00 level=DEBUG source=sched.go:185 msg="updating default concurrency" OLLAMA_MAX_LOADED_MODELS=6 gpu_count=2 Jun 29 20:22:44 Aidev numactl[115262]: time=2025-06-29T20:22:44.388-07:00 level=DEBUG source=ggml.go:155 msg="key not found" key=general.alignment default=32 Jun 29 20:22:44 Aidev numactl[115262]: time=2025-06-29T20:22:44.402-07:00 level=DEBUG source=ggml.go:155 msg="key not found" key=general.alignment default=32 Jun 29 20:22:44 Aidev numactl[115262]: time=2025-06-29T20:22:44.403-07:00 level=DEBUG source=sched.go:228 msg="loading first model" model=/usr/share/ollama/.ollama/models/blobs/sha256-529850705c0884a283b87d3b261d36ee30821e16f0310962ba977b456ad3b8cd Jun 29 20:22:44 Aidev numactl[115262]: time=2025-06-29T20:22:44.403-07:00 level=DEBUG source=memory.go:111 msg=evaluating library=cuda gpu_count=1 available="[11.0 GiB]" Jun 29 20:22:44 Aidev numactl[115262]: time=2025-06-29T20:22:44.404-07:00 level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="31.2 GiB" before.free="30.0 GiB" before.free_swap="8.0 GiB" now.total="31.2 GiB" now.free="30.0 GiB" now.free_swap="8.0 GiB" Jun 29 20:22:44 Aidev numactl[115262]: initializing /usr/lib/wsl/lib/libcuda.so Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuInit - 0x73202ec21dd0 Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuDriverGetVersion - 0x73202ec21d90 Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuDeviceGetCount - 0x73202ec21e0d Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuDeviceGet - 0x73202ec21e07 Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuDeviceGetAttribute - 0x73202ec21cf0 Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuDeviceGetUuid - 0x73202ec21e19 Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuDeviceGetName - 0x73202ec21e13 Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuCtxCreate_v3 - 0x73202ec21e8b Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuMemGetInfo_v2 - 0x73202ec21f8d Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuCtxDestroy - 0x73202ec21e9d Jun 29 20:22:44 Aidev numactl[115262]: calling cuInit Jun 29 20:22:44 Aidev numactl[115262]: calling cuDriverGetVersion Jun 29 20:22:44 Aidev numactl[115262]: raw version 0x2f3a Jun 29 20:22:44 Aidev numactl[115262]: CUDA driver version: 12.9 Jun 29 20:22:44 Aidev numactl[115262]: calling cuDeviceGetCount Jun 29 20:22:44 Aidev numactl[115262]: device count 2 Jun 29 20:22:44 Aidev numactl[115262]: time=2025-06-29T20:22:44.505-07:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-c6676d00-cd95-9a0e-46b3-f92becc20fea name="NVIDIA GeForce RTX 3060" overhead="0 B" before.total="12.0 GiB" before.free="11.0 GiB" now.total="12.0 GiB" now.free="11.0 GiB" now.used="1.0 GiB" Jun 29 20:22:44 Aidev numactl[115262]: time=2025-06-29T20:22:44.596-07:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-aa5c4c9d-7ba4-53f2-f3d2-9b12ddf79c6f name="NVIDIA GeForce RTX 3060" overhead="0 B" before.total="12.0 GiB" before.free="11.0 GiB" now.total="12.0 GiB" now.free="11.0 GiB" now.used="1.0 GiB" Jun 29 20:22:44 Aidev numactl[115262]: releasing cuda driver library Jun 29 20:22:44 Aidev numactl[115262]: time=2025-06-29T20:22:44.597-07:00 level=DEBUG source=memory.go:111 msg=evaluating library=cuda gpu_count=1 available="[11.0 GiB]" Jun 29 20:22:44 Aidev numactl[115262]: time=2025-06-29T20:22:44.597-07:00 level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="31.2 GiB" before.free="30.0 GiB" before.free_swap="8.0 GiB" now.total="31.2 GiB" now.free="30.0 GiB" now.free_swap="8.0 GiB" Jun 29 20:22:44 Aidev numactl[115262]: initializing /usr/lib/wsl/lib/libcuda.so Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuInit - 0x73202ec21dd0 Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuDriverGetVersion - 0x73202ec21d90 Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuDeviceGetCount - 0x73202ec21e0d Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuDeviceGet - 0x73202ec21e07 Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuDeviceGetAttribute - 0x73202ec21cf0 Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuDeviceGetUuid - 0x73202ec21e19 Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuDeviceGetName - 0x73202ec21e13 Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuCtxCreate_v3 - 0x73202ec21e8b Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuMemGetInfo_v2 - 0x73202ec21f8d Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuCtxDestroy - 0x73202ec21e9d Jun 29 20:22:44 Aidev numactl[115262]: calling cuInit Jun 29 20:22:44 Aidev numactl[115262]: calling cuDriverGetVersion Jun 29 20:22:44 Aidev numactl[115262]: raw version 0x2f3a Jun 29 20:22:44 Aidev numactl[115262]: CUDA driver version: 12.9 Jun 29 20:22:44 Aidev numactl[115262]: calling cuDeviceGetCount Jun 29 20:22:44 Aidev numactl[115262]: device count 2 Jun 29 20:22:44 Aidev numactl[115262]: time=2025-06-29T20:22:44.686-07:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-c6676d00-cd95-9a0e-46b3-f92becc20fea name="NVIDIA GeForce RTX 3060" overhead="0 B" before.total="12.0 GiB" before.free="11.0 GiB" now.total="12.0 GiB" now.free="11.0 GiB" now.used="1.0 GiB" Jun 29 20:22:44 Aidev numactl[115262]: time=2025-06-29T20:22:44.776-07:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-aa5c4c9d-7ba4-53f2-f3d2-9b12ddf79c6f name="NVIDIA GeForce RTX 3060" overhead="0 B" before.total="12.0 GiB" before.free="11.0 GiB" now.total="12.0 GiB" now.free="11.0 GiB" now.used="1.0 GiB" Jun 29 20:22:44 Aidev numactl[115262]: releasing cuda driver library Jun 29 20:22:44 Aidev numactl[115262]: time=2025-06-29T20:22:44.776-07:00 level=DEBUG source=memory.go:111 msg=evaluating library=cuda gpu_count=1 available="[11.0 GiB]" Jun 29 20:22:44 Aidev numactl[115262]: time=2025-06-29T20:22:44.777-07:00 level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="31.2 GiB" before.free="30.0 GiB" before.free_swap="8.0 GiB" now.total="31.2 GiB" now.free="30.0 GiB" now.free_swap="8.0 GiB" Jun 29 20:22:44 Aidev numactl[115262]: initializing /usr/lib/wsl/lib/libcuda.so Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuInit - 0x73202ec21dd0 Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuDriverGetVersion - 0x73202ec21d90 Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuDeviceGetCount - 0x73202ec21e0d Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuDeviceGet - 0x73202ec21e07 Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuDeviceGetAttribute - 0x73202ec21cf0 Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuDeviceGetUuid - 0x73202ec21e19 Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuDeviceGetName - 0x73202ec21e13 Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuCtxCreate_v3 - 0x73202ec21e8b Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuMemGetInfo_v2 - 0x73202ec21f8d Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuCtxDestroy - 0x73202ec21e9d Jun 29 20:22:44 Aidev numactl[115262]: calling cuInit Jun 29 20:22:44 Aidev numactl[115262]: calling cuDriverGetVersion Jun 29 20:22:44 Aidev numactl[115262]: raw version 0x2f3a Jun 29 20:22:44 Aidev numactl[115262]: CUDA driver version: 12.9 Jun 29 20:22:44 Aidev numactl[115262]: calling cuDeviceGetCount Jun 29 20:22:44 Aidev numactl[115262]: device count 2 Jun 29 20:22:44 Aidev numactl[115262]: time=2025-06-29T20:22:44.865-07:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-c6676d00-cd95-9a0e-46b3-f92becc20fea name="NVIDIA GeForce RTX 3060" overhead="0 B" before.total="12.0 GiB" before.free="11.0 GiB" now.total="12.0 GiB" now.free="11.0 GiB" now.used="1.0 GiB" Jun 29 20:22:44 Aidev numactl[115262]: time=2025-06-29T20:22:44.957-07:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-aa5c4c9d-7ba4-53f2-f3d2-9b12ddf79c6f name="NVIDIA GeForce RTX 3060" overhead="0 B" before.total="12.0 GiB" before.free="11.0 GiB" now.total="12.0 GiB" now.free="11.0 GiB" now.used="1.0 GiB" Jun 29 20:22:44 Aidev numactl[115262]: releasing cuda driver library Jun 29 20:22:44 Aidev numactl[115262]: time=2025-06-29T20:22:44.958-07:00 level=DEBUG source=memory.go:111 msg=evaluating library=cuda gpu_count=1 available="[11.0 GiB]" Jun 29 20:22:44 Aidev numactl[115262]: time=2025-06-29T20:22:44.958-07:00 level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="31.2 GiB" before.free="30.0 GiB" before.free_swap="8.0 GiB" now.total="31.2 GiB" now.free="30.0 GiB" now.free_swap="8.0 GiB" Jun 29 20:22:44 Aidev numactl[115262]: initializing /usr/lib/wsl/lib/libcuda.so Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuInit - 0x73202ec21dd0 Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuDriverGetVersion - 0x73202ec21d90 Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuDeviceGetCount - 0x73202ec21e0d Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuDeviceGet - 0x73202ec21e07 Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuDeviceGetAttribute - 0x73202ec21cf0 Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuDeviceGetUuid - 0x73202ec21e19 Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuDeviceGetName - 0x73202ec21e13 Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuCtxCreate_v3 - 0x73202ec21e8b Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuMemGetInfo_v2 - 0x73202ec21f8d Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuCtxDestroy - 0x73202ec21e9d Jun 29 20:22:44 Aidev numactl[115262]: calling cuInit Jun 29 20:22:44 Aidev numactl[115262]: calling cuDriverGetVersion Jun 29 20:22:44 Aidev numactl[115262]: raw version 0x2f3a Jun 29 20:22:44 Aidev numactl[115262]: CUDA driver version: 12.9 Jun 29 20:22:44 Aidev numactl[115262]: calling cuDeviceGetCount Jun 29 20:22:44 Aidev numactl[115262]: device count 2 Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.046-07:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-c6676d00-cd95-9a0e-46b3-f92becc20fea name="NVIDIA GeForce RTX 3060" overhead="0 B" before.total="12.0 GiB" before.free="11.0 GiB" now.total="12.0 GiB" now.free="11.0 GiB" now.used="1.0 GiB" Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.131-07:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-aa5c4c9d-7ba4-53f2-f3d2-9b12ddf79c6f name="NVIDIA GeForce RTX 3060" overhead="0 B" before.total="12.0 GiB" before.free="11.0 GiB" now.total="12.0 GiB" now.free="11.0 GiB" now.used="1.0 GiB" Jun 29 20:22:45 Aidev numactl[115262]: releasing cuda driver library Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.132-07:00 level=DEBUG source=memory.go:111 msg=evaluating library=cuda gpu_count=2 available="[11.0 GiB 11.0 GiB]" Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.133-07:00 level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="31.2 GiB" before.free="30.0 GiB" before.free_swap="8.0 GiB" now.total="31.2 GiB" now.free="30.0 GiB" now.free_swap="8.0 GiB" Jun 29 20:22:45 Aidev numactl[115262]: initializing /usr/lib/wsl/lib/libcuda.so Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuInit - 0x73202ec21dd0 Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuDriverGetVersion - 0x73202ec21d90 Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuDeviceGetCount - 0x73202ec21e0d Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuDeviceGet - 0x73202ec21e07 Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuDeviceGetAttribute - 0x73202ec21cf0 Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuDeviceGetUuid - 0x73202ec21e19 Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuDeviceGetName - 0x73202ec21e13 Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuCtxCreate_v3 - 0x73202ec21e8b Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuMemGetInfo_v2 - 0x73202ec21f8d Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuCtxDestroy - 0x73202ec21e9d Jun 29 20:22:45 Aidev numactl[115262]: calling cuInit Jun 29 20:22:45 Aidev numactl[115262]: calling cuDriverGetVersion Jun 29 20:22:45 Aidev numactl[115262]: raw version 0x2f3a Jun 29 20:22:45 Aidev numactl[115262]: CUDA driver version: 12.9 Jun 29 20:22:45 Aidev numactl[115262]: calling cuDeviceGetCount Jun 29 20:22:45 Aidev numactl[115262]: device count 2 Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.213-07:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-c6676d00-cd95-9a0e-46b3-f92becc20fea name="NVIDIA GeForce RTX 3060" overhead="0 B" before.total="12.0 GiB" before.free="11.0 GiB" now.total="12.0 GiB" now.free="11.0 GiB" now.used="1.0 GiB" Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.298-07:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-aa5c4c9d-7ba4-53f2-f3d2-9b12ddf79c6f name="NVIDIA GeForce RTX 3060" overhead="0 B" before.total="12.0 GiB" before.free="11.0 GiB" now.total="12.0 GiB" now.free="11.0 GiB" now.used="1.0 GiB" Jun 29 20:22:45 Aidev numactl[115262]: releasing cuda driver library Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.298-07:00 level=DEBUG source=memory.go:111 msg=evaluating library=cuda gpu_count=2 available="[11.0 GiB 11.0 GiB]" Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.299-07:00 level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="31.2 GiB" before.free="30.0 GiB" before.free_swap="8.0 GiB" now.total="31.2 GiB" now.free="30.0 GiB" now.free_swap="8.0 GiB" Jun 29 20:22:45 Aidev numactl[115262]: initializing /usr/lib/wsl/lib/libcuda.so Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuInit - 0x73202ec21dd0 Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuDriverGetVersion - 0x73202ec21d90 Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuDeviceGetCount - 0x73202ec21e0d Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuDeviceGet - 0x73202ec21e07 Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuDeviceGetAttribute - 0x73202ec21cf0 Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuDeviceGetUuid - 0x73202ec21e19 Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuDeviceGetName - 0x73202ec21e13 Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuCtxCreate_v3 - 0x73202ec21e8b Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuMemGetInfo_v2 - 0x73202ec21f8d Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuCtxDestroy - 0x73202ec21e9d Jun 29 20:22:45 Aidev numactl[115262]: calling cuInit Jun 29 20:22:45 Aidev numactl[115262]: calling cuDriverGetVersion Jun 29 20:22:45 Aidev numactl[115262]: raw version 0x2f3a Jun 29 20:22:45 Aidev numactl[115262]: CUDA driver version: 12.9 Jun 29 20:22:45 Aidev numactl[115262]: calling cuDeviceGetCount Jun 29 20:22:45 Aidev numactl[115262]: device count 2 Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.381-07:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-c6676d00-cd95-9a0e-46b3-f92becc20fea name="NVIDIA GeForce RTX 3060" overhead="0 B" before.total="12.0 GiB" before.free="11.0 GiB" now.total="12.0 GiB" now.free="11.0 GiB" now.used="1.0 GiB" Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.465-07:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-aa5c4c9d-7ba4-53f2-f3d2-9b12ddf79c6f name="NVIDIA GeForce RTX 3060" overhead="0 B" before.total="12.0 GiB" before.free="11.0 GiB" now.total="12.0 GiB" now.free="11.0 GiB" now.used="1.0 GiB" Jun 29 20:22:45 Aidev numactl[115262]: releasing cuda driver library Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.466-07:00 level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="31.2 GiB" before.free="30.0 GiB" before.free_swap="8.0 GiB" now.total="31.2 GiB" now.free="30.0 GiB" now.free_swap="8.0 GiB" Jun 29 20:22:45 Aidev numactl[115262]: initializing /usr/lib/wsl/lib/libcuda.so Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuInit - 0x73202ec21dd0 Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuDriverGetVersion - 0x73202ec21d90 Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuDeviceGetCount - 0x73202ec21e0d Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuDeviceGet - 0x73202ec21e07 Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuDeviceGetAttribute - 0x73202ec21cf0 Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuDeviceGetUuid - 0x73202ec21e19 Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuDeviceGetName - 0x73202ec21e13 Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuCtxCreate_v3 - 0x73202ec21e8b Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuMemGetInfo_v2 - 0x73202ec21f8d Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuCtxDestroy - 0x73202ec21e9d Jun 29 20:22:45 Aidev numactl[115262]: calling cuInit Jun 29 20:22:45 Aidev numactl[115262]: calling cuDriverGetVersion Jun 29 20:22:45 Aidev numactl[115262]: raw version 0x2f3a Jun 29 20:22:45 Aidev numactl[115262]: CUDA driver version: 12.9 Jun 29 20:22:45 Aidev numactl[115262]: calling cuDeviceGetCount Jun 29 20:22:45 Aidev numactl[115262]: device count 2 Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.543-07:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-c6676d00-cd95-9a0e-46b3-f92becc20fea name="NVIDIA GeForce RTX 3060" overhead="0 B" before.total="12.0 GiB" before.free="11.0 GiB" now.total="12.0 GiB" now.free="11.0 GiB" now.used="1.0 GiB" Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.633-07:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-aa5c4c9d-7ba4-53f2-f3d2-9b12ddf79c6f name="NVIDIA GeForce RTX 3060" overhead="0 B" before.total="12.0 GiB" before.free="11.0 GiB" now.total="12.0 GiB" now.free="11.0 GiB" now.used="1.0 GiB" Jun 29 20:22:45 Aidev numactl[115262]: releasing cuda driver library Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.633-07:00 level=INFO source=server.go:135 msg="system memory" total="31.2 GiB" free="30.0 GiB" free_swap="8.0 GiB" Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.633-07:00 level=DEBUG source=memory.go:111 msg=evaluating library=cuda gpu_count=2 available="[11.0 GiB 11.0 GiB]" Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.633-07:00 level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="31.2 GiB" before.free="30.0 GiB" before.free_swap="8.0 GiB" now.total="31.2 GiB" now.free="30.0 GiB" now.free_swap="8.0 GiB" Jun 29 20:22:45 Aidev numactl[115262]: initializing /usr/lib/wsl/lib/libcuda.so Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuInit - 0x73202ec21dd0 Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuDriverGetVersion - 0x73202ec21d90 Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuDeviceGetCount - 0x73202ec21e0d Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuDeviceGet - 0x73202ec21e07 Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuDeviceGetAttribute - 0x73202ec21cf0 Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuDeviceGetUuid - 0x73202ec21e19 Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuDeviceGetName - 0x73202ec21e13 Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuCtxCreate_v3 - 0x73202ec21e8b Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuMemGetInfo_v2 - 0x73202ec21f8d Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuCtxDestroy - 0x73202ec21e9d Jun 29 20:22:45 Aidev numactl[115262]: calling cuInit Jun 29 20:22:45 Aidev numactl[115262]: calling cuDriverGetVersion Jun 29 20:22:45 Aidev numactl[115262]: raw version 0x2f3a Jun 29 20:22:45 Aidev numactl[115262]: CUDA driver version: 12.9 Jun 29 20:22:45 Aidev numactl[115262]: calling cuDeviceGetCount Jun 29 20:22:45 Aidev numactl[115262]: device count 2 Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.710-07:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-c6676d00-cd95-9a0e-46b3-f92becc20fea name="NVIDIA GeForce RTX 3060" overhead="0 B" before.total="12.0 GiB" before.free="11.0 GiB" now.total="12.0 GiB" now.free="11.0 GiB" now.used="1.0 GiB" Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.792-07:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-aa5c4c9d-7ba4-53f2-f3d2-9b12ddf79c6f name="NVIDIA GeForce RTX 3060" overhead="0 B" before.total="12.0 GiB" before.free="11.0 GiB" now.total="12.0 GiB" now.free="11.0 GiB" now.used="1.0 GiB" Jun 29 20:22:45 Aidev numactl[115262]: releasing cuda driver library Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.793-07:00 level=INFO source=server.go:168 msg=offload library=cuda layers.requested=999 layers.model=35 layers.offload=35 layers.split=18,17 memory.available="[11.0 GiB 11.0 GiB]" memory.gpu_overhead="0 B" memory.required.full="8.1 GiB" memory.required.partial="8.1 GiB" memory.required.kv="334.0 MiB" memory.required.allocations="[5.5 GiB 2.6 GiB]" memory.weights.total="2.9 GiB" memory.weights.repeating="1.7 GiB" memory.weights.nonrepeating="1.3 GiB" memory.graph.full="1.0 GiB" memory.graph.partial="1.0 GiB" projector.weights="806.2 MiB" projector.graph="1.0 GiB" Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.793-07:00 level=INFO source=server.go:211 msg="enabling flash attention" Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.793-07:00 level=DEBUG source=server.go:284 msg="compatible gpu libraries" compatible=[] Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.814-07:00 level=DEBUG source=ggml.go:155 msg="key not found" key=general.alignment default=32 Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.815-07:00 level=DEBUG source=ggml.go:155 msg="key not found" key=tokenizer.ggml.eot_token_id default=106 Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.815-07:00 level=DEBUG source=ggml.go:155 msg="key not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}" Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.816-07:00 level=DEBUG source=ggml.go:155 msg="key not found" key=gemma3.rope.local.freq_base default=10000 Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.816-07:00 level=DEBUG source=ggml.go:155 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06 Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.816-07:00 level=DEBUG source=ggml.go:155 msg="key not found" key=gemma3.rope.freq_scale default=1 Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.816-07:00 level=DEBUG source=ggml.go:155 msg="key not found" key=gemma3.mm_tokens_per_image default=256 Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.817-07:00 level=INFO source=server.go:431 msg="starting llama server" cmd="/usr/local/bin/ollama runner --ollama-engine --model /usr/share/ollama/.ollama/models/blobs/sha256-529850705c0884a283b87d3b261d36ee30821e16f0310962ba977b456ad3b8cd --ctx-size 8192 --batch-size 512 --n-gpu-layers 999 --threads 10 --flash-attn --kv-cache-type f16 --parallel 1 --tensor-split 18,17 --port 38963" Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.817-07:00 level=DEBUG source=server.go:432 msg=subprocess PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/snap/bin OLLAMA_KEEP_ALIVE=-1 OLLAMA_FLASH_ATTENTION=1 OLLAMA_KV_CACHE_TYPE=f16 LD_LIBRARY_PATH=/usr/local/lib/ollama:/usr/share/ollama:/usr/local/lib/ollama OLLAMA_DEBUG=1 OLLAMA_MAX_LOADED_MODELS=6 OLLAMA_LIBRARY_PATH=/usr/local/lib/ollama CUDA_VISIBLE_DEVICES=GPU-c6676d00-cd95-9a0e-46b3-f92becc20fea,GPU-aa5c4c9d-7ba4-53f2-f3d2-9b12ddf79c6f Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.817-07:00 level=INFO source=sched.go:483 msg="loaded runners" count=1 Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.817-07:00 level=INFO source=server.go:591 msg="waiting for llama runner to start responding" Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.818-07:00 level=INFO source=server.go:625 msg="waiting for server to become available" status="llm server not responding" Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.828-07:00 level=INFO source=runner.go:925 msg="starting ollama engine" Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.831-07:00 level=INFO source=runner.go:983 msg="Server listening on 127.0.0.1:38963" Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.853-07:00 level=DEBUG source=ggml.go:155 msg="key not found" key=general.alignment default=32 Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.853-07:00 level=DEBUG source=ggml.go:155 msg="key not found" key=general.description default="" Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.853-07:00 level=INFO source=ggml.go:92 msg="" architecture=gemma3 file_type=Q4_0 name="Gemma3 4b It Qa_0 Qat Hf" description="" num_tensors=883 num_key_values=42 Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.853-07:00 level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/local/lib/ollama Jun 29 20:22:45 Aidev numactl[115262]: ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no Jun 29 20:22:45 Aidev numactl[115262]: ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no Jun 29 20:22:45 Aidev numactl[115262]: ggml_cuda_init: found 2 CUDA devices: Jun 29 20:22:45 Aidev numactl[115262]: Device 0: NVIDIA GeForce RTX 3060, compute capability 8.6, VMM: yes Jun 29 20:22:45 Aidev numactl[115262]: Device 1: NVIDIA GeForce RTX 3060, compute capability 8.6, VMM: yes Jun 29 20:22:46 Aidev numactl[115262]: load_backend: loaded CUDA backend from /usr/local/lib/ollama/libggml-cuda.so Jun 29 20:22:46 Aidev numactl[115262]: load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-alderlake.so Jun 29 20:22:46 Aidev numactl[115262]: time=2025-06-29T20:22:46.020-07:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX_VNNI=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=500,600,610,700,750,800,860,870,890,900,1200 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 CUDA.1.ARCHS=500,600,610,700,750,800,860,870,890,900,1200 CUDA.1.USE_GRAPHS=1 CUDA.1.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc) Jun 29 20:22:46 Aidev numactl[115262]: time=2025-06-29T20:22:46.069-07:00 level=INFO source=server.go:625 msg="waiting for server to become available" status="llm server loading model" Jun 29 20:22:46 Aidev numactl[115262]: time=2025-06-29T20:22:46.117-07:00 level=INFO source=ggml.go:351 msg="model weights" buffer=CUDA0 size="912.0 MiB" Jun 29 20:22:46 Aidev numactl[115262]: time=2025-06-29T20:22:46.117-07:00 level=INFO source=ggml.go:351 msg="model weights" buffer=CUDA1 size="2.8 GiB" Jun 29 20:22:46 Aidev numactl[115262]: time=2025-06-29T20:22:46.117-07:00 level=INFO source=ggml.go:351 msg="model weights" buffer=CPU size="1.3 GiB" Jun 29 20:22:46 Aidev numactl[115262]: time=2025-06-29T20:22:46.118-07:00 level=DEBUG source=ggml.go:155 msg="key not found" key=tokenizer.ggml.eot_token_id default=106 Jun 29 20:22:46 Aidev numactl[115262]: time=2025-06-29T20:22:46.118-07:00 level=DEBUG source=ggml.go:155 msg="key not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}" Jun 29 20:22:46 Aidev numactl[115262]: time=2025-06-29T20:22:46.119-07:00 level=DEBUG source=ggml.go:155 msg="key not found" key=gemma3.rope.local.freq_base default=10000 Jun 29 20:22:46 Aidev numactl[115262]: time=2025-06-29T20:22:46.119-07:00 level=DEBUG source=ggml.go:155 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06 Jun 29 20:22:46 Aidev numactl[115262]: time=2025-06-29T20:22:46.119-07:00 level=DEBUG source=ggml.go:155 msg="key not found" key=gemma3.rope.freq_scale default=1 Jun 29 20:22:46 Aidev numactl[115262]: time=2025-06-29T20:22:46.119-07:00 level=DEBUG source=ggml.go:155 msg="key not found" key=gemma3.mm_tokens_per_image default=256 Jun 29 20:22:46 Aidev numactl[115262]: time=2025-06-29T20:22:46.319-07:00 level=DEBUG source=ggml.go:620 msg="compute graph" nodes=972 splits=1 Jun 29 20:22:46 Aidev numactl[115262]: time=2025-06-29T20:22:46.319-07:00 level=INFO source=ggml.go:638 msg="compute graph" backend=CUDA0 buffer_type=CUDA0 size="0 B" Jun 29 20:22:46 Aidev numactl[115262]: time=2025-06-29T20:22:46.319-07:00 level=INFO source=ggml.go:638 msg="compute graph" backend=CUDA1 buffer_type=CUDA1 size="1.1 GiB" Jun 29 20:22:46 Aidev numactl[115262]: time=2025-06-29T20:22:46.319-07:00 level=INFO source=ggml.go:638 msg="compute graph" backend=CPU buffer_type=CPU size="0 B" Jun 29 20:22:46 Aidev numactl[115262]: time=2025-06-29T20:22:46.368-07:00 level=DEBUG source=ggml.go:620 msg="compute graph" nodes=1369 splits=3 Jun 29 20:22:46 Aidev numactl[115262]: time=2025-06-29T20:22:46.369-07:00 level=INFO source=ggml.go:638 msg="compute graph" backend=CUDA0 buffer_type=CUDA0 size="131.0 MiB" Jun 29 20:22:46 Aidev numactl[115262]: time=2025-06-29T20:22:46.369-07:00 level=INFO source=ggml.go:638 msg="compute graph" backend=CUDA1 buffer_type=CUDA1 size="1.1 GiB" Jun 29 20:22:46 Aidev numactl[115262]: time=2025-06-29T20:22:46.369-07:00 level=INFO source=ggml.go:638 msg="compute graph" backend=CPU buffer_type=CPU size="5.0 MiB" Jun 29 20:22:46 Aidev numactl[115262]: time=2025-06-29T20:22:46.369-07:00 level=DEBUG source=runner.go:883 msg=memory allocated.InputWeights=1342504960A allocated.CPU.Graph=5242880A allocated.CUDA0.Weights="[53127168A 53127168A 53127168A 53127168A 53127168A 53127168A 53127168A 53127168A 53127168A 53127168A 53127168A 53127168A 53127168A 53127168A 53127168A 53127168A 53127168A 53127168A 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U]" allocated.CUDA0.Cache="[6291456A 6291456A 6291456A 6291456A 6291456A 33554432A 6291456A 6291456A 6291456A 6291456A 6291456A 33554432A 6291456A 6291456A 6291456A 6291456A 6291456A 33554432A 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U]" allocated.CUDA0.Graph=137371648A allocated.CUDA1.Weights="[0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 53127168A 53127168A 53127168A 53127168A 53127168A 53127168A 53127168A 53127168A 53127168A 53127168A 53127168A 53127168A 53127168A 53127168A 53127168A 53127168A 2193742976A]" allocated.CUDA1.Cache="[0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 6291456A 6291456A 6291456A 6291456A 6291456A 33554432A 6291456A 6291456A 6291456A 6291456A 6291456A 33554432A 6291456A 6291456A 6291456A 6291456A 0U]" allocated.CUDA1.Graph=1212612608A Jun 29 20:22:46 Aidev numactl[115262]: time=2025-06-29T20:22:46.570-07:00 level=DEBUG source=server.go:636 msg="model load progress 0.12" Jun 29 20:22:46 Aidev numactl[115262]: time=2025-06-29T20:22:46.822-07:00 level=DEBUG source=server.go:636 msg="model load progress 0.30" Jun 29 20:22:47 Aidev numactl[115262]: time=2025-06-29T20:22:47.073-07:00 level=DEBUG source=server.go:636 msg="model load progress 0.41" Jun 29 20:22:47 Aidev numactl[115262]: time=2025-06-29T20:22:47.324-07:00 level=DEBUG source=server.go:636 msg="model load progress 0.53" Jun 29 20:22:47 Aidev numactl[115262]: time=2025-06-29T20:22:47.575-07:00 level=DEBUG source=server.go:636 msg="model load progress 0.64" Jun 29 20:22:47 Aidev numactl[115262]: time=2025-06-29T20:22:47.826-07:00 level=DEBUG source=server.go:636 msg="model load progress 0.74" Jun 29 20:22:48 Aidev numactl[115262]: time=2025-06-29T20:22:48.077-07:00 level=DEBUG source=server.go:636 msg="model load progress 0.78" Jun 29 20:22:48 Aidev numactl[115262]: time=2025-06-29T20:22:48.328-07:00 level=DEBUG source=server.go:636 msg="model load progress 0.81" Jun 29 20:22:48 Aidev numactl[115262]: time=2025-06-29T20:22:48.579-07:00 level=DEBUG source=server.go:636 msg="model load progress 0.85" Jun 29 20:22:48 Aidev numactl[115262]: time=2025-06-29T20:22:48.830-07:00 level=DEBUG source=server.go:636 msg="model load progress 0.89" Jun 29 20:22:49 Aidev numactl[115262]: time=2025-06-29T20:22:49.081-07:00 level=DEBUG source=server.go:636 msg="model load progress 0.93" Jun 29 20:22:49 Aidev numactl[115262]: time=2025-06-29T20:22:49.331-07:00 level=DEBUG source=server.go:636 msg="model load progress 0.97" Jun 29 20:22:49 Aidev numactl[115262]: time=2025-06-29T20:22:49.582-07:00 level=INFO source=server.go:630 msg="llama runner started in 3.76 seconds" Jun 29 20:22:49 Aidev numactl[115262]: time=2025-06-29T20:22:49.582-07:00 level=DEBUG source=sched.go:495 msg="finished setting up" runner.name=registry.ollama.ai/library/hfs-ai-model-test:latest runner.inference=cuda runner.devices=2 runner.size="8.1 GiB" runner.vram="8.1 GiB" runner.parallel=1 runner.pid=115323 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-529850705c0884a283b87d3b261d36ee30821e16f0310962ba977b456ad3b8cd runner.num_ctx=8192 Jun 29 20:22:49 Aidev numactl[115262]: time=2025-06-29T20:22:49.743-07:00 level=DEBUG source=server.go:729 msg="completion request" images=0 prompt=3145 format="" Jun 29 20:22:49 Aidev numactl[115262]: time=2025-06-29T20:22:49.902-07:00 level=DEBUG source=vocabulary.go:52 msg="adding bos token to prompt" id=[2] Jun 29 20:22:49 Aidev numactl[115262]: time=2025-06-29T20:22:49.902-07:00 level=DEBUG source=cache.go:136 msg="loading cache slot" id=0 cache=0 prompt=764 used=0 remaining=764 Jun 29 20:22:52 Aidev numactl[115262]: [GIN] 2025/06/29 - 20:22:52 | 200 | 6.509760274s | 127.0.0.1 | POST "/v1/chat/completions" Jun 29 20:22:52 Aidev numactl[115262]: time=2025-06-29T20:22:52.781-07:00 level=DEBUG source=sched.go:503 msg="context for request finished" Jun 29 20:22:52 Aidev numactl[115262]: time=2025-06-29T20:22:52.781-07:00 level=DEBUG source=sched.go:343 msg="runner with non-zero duration has gone idle, adding timer" runner.name=registry.ollama.ai/library/hfs-ai-model-test:latest runner.inference=cuda runner.devices=2 runner.size="8.1 GiB" runner.vram="8.1 GiB" runner.parallel=1 runner.pid=115323 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-529850705c0884a283b87d3b261d36ee30821e16f0310962ba977b456ad3b8cd runner.num_ctx=8192 duration=2562047h47m16.854775807s Jun 29 20:22:52 Aidev numactl[115262]: time=2025-06-29T20:22:52.781-07:00 level=DEBUG source=sched.go:361 msg="after processing request finished event" runner.name=registry.ollama.ai/library/hfs-ai-model-test:latest runner.inference=cuda runner.devices=2 runner.size="8.1 GiB" runner.vram="8.1 GiB" runner.parallel=1 runner.pid=115323 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-529850705c0884a283b87d3b261d36ee30821e16f0310962ba977b456ad3b8cd runner.num_ctx=8192 refCount=0 Jun 29 20:22:52 Aidev numactl[115262]: [GIN] 2025/06/29 - 20:22:52 | 200 | 28.105µs | 127.0.0.1 | GET "/" Jun 29 20:23:02 Aidev numactl[115262]: [GIN] 2025/06/29 - 20:23:02 | 200 | 26.642µs | 127.0.0.1 | GET "/" Jun 29 20:23:02 Aidev numactl[115262]: time=2025-06-29T20:23:02.624-07:00 level=DEBUG source=ggml.go:155 msg="key not found" key=general.alignment default=32 Jun 29 20:23:02 Aidev numactl[115262]: time=2025-06-29T20:23:02.625-07:00 level=DEBUG source=sched.go:615 msg="evaluating already loaded" model=/usr/share/ollama/.ollama/models/blobs/sha256-529850705c0884a283b87d3b261d36ee30821e16f0310962ba977b456ad3b8cd Jun 29 20:23:45 Aidev numactl[115262]: time=2025-06-29T20:23:45.176-07:00 level=DEBUG source=server.go:729 msg="completion request" images=0 prompt=18213 format="" Jun 29 20:23:55 Aidev numactl[115262]: time=2025-06-29T20:23:55.303-07:00 level=DEBUG source=vocabulary.go:52 msg="adding bos token to prompt" id=[2] Jun 29 20:23:55 Aidev numactl[115262]: time=2025-06-29T20:23:55.303-07:00 level=DEBUG source=cache.go:136 msg="loading cache slot" id=0 cache=792 prompt=5355 used=63 remaining=5292 Jun 29 20:23:56 Aidev numactl[115262]: time=2025-06-29T20:23:56.054-07:00 level=DEBUG source=causal.go:386 msg="defragmenting kv cache" Jun 29 20:23:56 Aidev numactl[115262]: time=2025-06-29T20:23:56.342-07:00 level=DEBUG source=causal.go:386 msg="defragmenting kv cache" Jun 29 20:23:56 Aidev numactl[115262]: time=2025-06-29T20:23:56.544-07:00 level=DEBUG source=causal.go:386 msg="defragmenting kv cache" Jun 29 20:23:56 Aidev numactl[115262]: time=2025-06-29T20:23:56.892-07:00 level=DEBUG source=causal.go:386 msg="defragmenting kv cache" Jun 29 20:23:57 Aidev numactl[115262]: time=2025-06-29T20:23:57.101-07:00 level=DEBUG source=causal.go:386 msg="defragmenting kv cache" ``` ### OS Windows ### GPU Nvidia ### CPU Intel ### Ollama version Logs from 0.9.2 which I had downgraded to from 0.9.3 in trying to troubleshoot.

GiteaMirror added the bug label 2026-05-04 18:11:22 -05:00

GiteaMirror commented

2026-05-04 18:11:23 -05:00

@eowensai commented on GitHub (Jul 3, 2025):

I'm providing an update to this issue that resolved one problem seen in the logs, although the CPU-only prefill persists:
Initially, the Ollama logs consistently showed an empty list of compatible GPU libraries:
“source=server.go:284 msg="compatible gpu libraries" compatible=[]”
I think this was due to a conflict where libnvidia-ml.so from Ubuntu's libnvidia-compute-535 package was being prioritized over the necessary WSL2 stubs.
The following was done:
Isolated Conflicting Libraries: The libnvidia-ml.so* files from /usr/lib/x86_64-linux-gnu/ were moved to a backup directory, followed by sudo ldconfig.
Installed Ollama's CUDA Backend: Downloaded the NVIDIA-enabled ollama-linux-amd64.tgz build and extracted them. Created /usr/local/lib/ollama/cuda_v12 and moved the CUDA 12 libs there (they had extracted under /usr/lib/ollama).
Configured Systemd Service: Created a systemd override (/etc/systemd/system/ollama.service.d/override.conf) to explicitly set the following environment variables:
OLLAMA_LIBRARY_PATH=/usr/local/lib/ollama/cuda_v12
LD_LIBRARY_PATH=/usr/lib/wsl/lib:/usr/local/lib/ollama/cuda_v12
After a daemon-reload and service restart, the logs now correctly show that the CUDA backend is found:
“compatible gpu libraries" compatible=[cuda_v12]

@eowensai commented on GitHub (Jul 3, 2025): I'm providing an update to this issue that resolved one problem seen in the logs, **although the CPU-only prefill persists**: Initially, the Ollama logs consistently showed an empty list of compatible GPU libraries: _“source=server.go:284 msg="compatible gpu libraries" compatible=[]”_ I think this was due to a conflict where libnvidia-ml.so from Ubuntu's libnvidia-compute-535 package was being prioritized over the necessary WSL2 stubs. **The following was done:** **Isolated Conflicting Libraries**: The libnvidia-ml.so* files from /usr/lib/x86_64-linux-gnu/ were moved to a backup directory, followed by sudo ldconfig. **Installed Ollama's CUDA Backend**: Downloaded the NVIDIA-enabled ollama-linux-amd64.tgz build and extracted them. Created /usr/local/lib/ollama/cuda_v12 and moved the CUDA 12 libs there (they had extracted under /usr/lib/ollama). **Configured Systemd Service**: Created a systemd override (/etc/systemd/system/ollama.service.d/override.conf) to explicitly set the following environment variables: OLLAMA_LIBRARY_PATH=/usr/local/lib/ollama/cuda_v12 LD_LIBRARY_PATH=/usr/lib/wsl/lib:/usr/local/lib/ollama/cuda_v12 After a daemon-reload and service restart, the logs now correctly show that the CUDA backend is found: _“compatible gpu libraries" compatible=[cuda_v12]_

GiteaMirror commented

2026-05-04 18:11:24 -05:00

@eowensai commented on GitHub (Jul 3, 2025):

Today's attempts, no luck getting prefill to work on the gpu(s)

Rebooted (it's Windows after all)
Devices=0 (from 0,1)
Ollama flash attention = 0 (was 1)
Tried gemma3 12b version of the it qat model (from 4b)
OLLAMA_KV_CACHE_TYPE=f16 (previously didn't include defining this to anything. Logs now report "OLLAMA_KV_CACHE_TYPE:f16" where before it was "OLLAMA_KV_CACHE_TYPE: ")

@eowensai commented on GitHub (Jul 3, 2025): Today's attempts, no luck getting prefill to work on the gpu(s) - Rebooted (it's Windows after all) - Devices=0 (from 0,1) - Ollama flash attention = 0 (was 1) - Tried gemma3 12b version of the it qat model (from 4b) - OLLAMA_KV_CACHE_TYPE=f16 (previously didn't include defining this to anything. Logs now report "OLLAMA_KV_CACHE_TYPE:f16" where before it was "OLLAMA_KV_CACHE_TYPE: ")

GiteaMirror commented

2026-05-04 18:11:25 -05:00

@Atliac commented on GitHub (Jul 11, 2025):

I have the same issue.

Host OS: Windows 10 (native)
ollama version: 0.9.6
GPU: NVIDIA GeForce RTX 4060 TI 8G
Model: gemma-3-4b-it-qat.gguf

@Atliac commented on GitHub (Jul 11, 2025): I have the same issue. Host OS: Windows 10 (native) ollama version: 0.9.6 GPU: NVIDIA GeForce RTX 4060 TI 8G Model: gemma-3-4b-it-qat.gguf

Sign in to join this conversation.

Branches Tags

main

dhiltgen/ci

parth-launch-plan-gating

hoyyeva/anthropic-reference-images-path

parth-anthropic-reference-images-path

brucemacd/download-before-remove

hoyyeva/editor-config-repair

parth-mlx-decode-checkpoints

parth-launch-codex-app

hoyyeva/fix-codex-model-metadata-warning

hoyyeva/qwen

parth/hide-claude-desktop-till-release

hoyyeva/opencode-image-modality

parth-add-claude-code-autoinstall

release_v0.22.0

pdevine/manifest-list

codex/fix-codex-model-metadata-warning

pdevine/addressable-manifest

brucemacd/launch-fetch-reccomended

jmorganca/llama-compat

launch-copilot-cli

hoyyeva/opencode-thinking

release_v0.20.7

parth-auto-save-backup

parth-test

jmorganca/gemma4-audio-replacements

fix-manifest-digest-on-pull

hoyyeva/vscode-improve

brucemacd/install-server-wait

parth/update-claude-docs

brucemac/start-ap-install

pdevine/mlx-update

pdevine/qwen35_vision

drifkin/api-show-fallback

mintlify/image-generation-1773352582

hoyyeva/server-context-length-local-config

jmorganca/faster-reptition-penalties

jmorganca/convert-nemotron

parth-pi-thinking

pdevine/sampling-penalties

jmorganca/fix-create-quantization-memory

dongchen/resumable_transfer_fix

pdevine/sampling-cache-error

jessegross/mlx-usage

hoyyeva/openclaw-config

hoyyeva/app-html

pdevine/qwen3next

brucemacd/sign-sh-install

brucemacd/tui-update

brucemacd/usage-api

jmorganca/launch-empty

fix-app-dist-embed

mxyng/mlx-compile

mxyng/mlx-quant

mxyng/mlx-glm4.7

mxyng/mlx

brucemacd/simplify-model-picker

jmorganca/qwen3-concurrent

fix-glm-4.7-flash-mla-config

drifkin/qwen3-coder-opening-tag

brucemacd/usage-cli

fix-cuda12-fattn-shmem

ollama-imagegen-docs

parth/fix-multiline-inputs

brucemacd/config-docs

mxyng/model-files

mxyng/simple-execute

fix-imagegen-ollama-models

mxyng/async-upload

jmorganca/lazy-no-dtype-changes

imagegen-auto-detect-create

parth/decrease-concurrent-download-hf

fix-mlx-quantize-init

jmorganca/x-cleanup

usage

imagegen-readme

jmorganca/glm-image

mlx-gpu-cd

jmorganca/imagegen-modelfile

parth/agent-skills

parth/agent-allowlist

parth/signed-in-offline

parth/agents

parth/fix-context-chopping

improve-cloud-flow

parth/add-models-websearch

parth/prompt-renderer-mcp

jmorganca/native-settings

jmorganca/download-stream-hash

jmorganca/client2-rebased

brucemacd/oai-chat-req-multipart

jessegross/multi_chunk_reserve

grace/additional-omit-empty

grace/mistral-3-large

mxyng/tokenizer2

mxyng/tokenizer

jessegross/flash

hoyyeva/windows-nacked-app

mxyng/cleanup-attention

grace/deepseek-parser

hoyyeva/remember-unsent-prompt

parth/add-lfs-pointer-error-conversion

parth/olmo2-test2

hoyyeva/ollama-launchagent-plist

nicole/olmo-model

parth/olmo-test

mxyng/remove-embedded

parth/render-template

jmorganca/intellect-3

parth/remove-prealloc-linter

jmorganca/cmd-eval

nicole/nomic-embed-text-fix

mxyng/lint-2

hoyyeva/add-gemini-3-pro-preview

hoyyeva/load-model-list

mxyng/expand-path

mxyng/environ-2

hoyyeva/deeplink-json-encoding

parth/improve-tool-calling-tests

hoyyeva/conversation

hoyyeva/assistant-edit-response

hoyyeva/thinking

origin/brucemacd/invalid-char-i-err

parth/improve-tool-calling

jmorganca/required-omitempty

grace/qwen3-vl-tests

mxyng/iter-client

parth/docs-readme

nicole/embed-test

pdevine/integration-benchstat

parth/remove-generate-cmd

parth/add-toolcall-id

mxyng/server-tests

jmorganca/glm-4.6

jmorganca/gin-h-compat

drifkin/stable-tool-args

pdevine/qwen3-more-thinking

parth/add-websearch-client

nicole/websearch_local

jmorganca/qwen3-coder-updates

grace/deepseek-v3-migration-tests

mxyng/fix-create

jmorganca/cloud-errors

pdevine/parser-tidy

revert-12233-parth/simplify-entrypoints-runner

parth/enable-so-gpt-oss

brucemacd/qwen3vl

jmorganca/readme-simplify

parth/gpt-oss-structured-outputs

revert-12039-jmorganca/tools-braces

mxyng/embeddings

mxyng/gguf

mxyng/benchmark

mxyng/types-null

parth/move-parsing

mxyng/gemma2

jmorganca/docs

mxyng/16-bit

mxyng/create-stdin

pdevine/authorizedkeys

mxyng/quant

parth/opt-in-error-context-window

brucemacd/cache-models

brucemacd/runner-completion

jmorganca/llama-update-6

brucemacd/benchmark-list

brucemacd/partial-read-caps

parth/deepseek-r1-tools

mxyng/omit-array

parth/tool-prefix-temp

brucemacd/runner-test

jmorganca/qwen25vl

brucemacd/model-forward-test-ext

parth/python-function-parsing

jmorganca/cuda-compression-none

drifkin/num-parallel

drifkin/chat-truncation-fix

jmorganca/sync

parth/python-tools-calling

drifkin/array-head-count

brucemacd/create-no-loop

parth/server-enable-content-stream-with-tools

qwen25omni

mxyng/v3

brucemacd/ropeconfig

jmorganca/silence-tokenizer

parth/sample-so-test

parth/sampling-structured-outputs

brucemacd/doc-go-engine

parth/constrained-sampling-json

jmorganca/mistral-wip

brucemacd/mistral-small-convert

parth/sample-unmarshal-json-for-params

brucemacd/jomorganca/mistral

pdevine/bfloat16

jmorganca/mistral

brucemacd/mistral

pdevine/logging

parth/sample-correctness-fix

parth/sample-fix-sorting

jmorgan/sample-fix-sorting-extras

jmorganca/temp-0-images

brucemacd/parallel-embed-models

brucemacd/shim-grammar

jmorganca/fix-gguf-error

bmizerany/nameswork

jmorganca/faster-releases

bmizerany/validatenames

brucemacd/err-no-vocab

brucemacd/rope-config

brucemacd/err-hint

brucemacd/qwen2_5

brucemacd/logprobs

brucemacd/new_runner_graph_bench

progress-flicker

brucemacd/forward-test

brucemacd/go_qwen2

pdevine/gemma2

jmorganca/add-missing-symlink-eval

mxyng/next-debug

parth/set-context-size-openai

brucemacd/next-bpe-bench

brucemacd/next-bpe-test

brucemacd/new_runner_e2e

brucemacd/new_runner_qwen2

pdevine/convert-cohere2

brucemacd/convert-cli

parth/log-probs

mxyng/next-mlx

mxyng/cmd-history

parth/templating

parth/tokenize-detokenize

brucemacd/check-key-register

bmizerany/grammar

jmorganca/vendor-081b29bd

mxyng/func-checks

jmorganca/fix-null-format

parth/fix-default-to-warn-json

jmorganca/qwen2vl

jmorganca/no-concat

parth/cmd-cleanup-SO

brucemacd/check-key-register-structured-err

parth/openai-stream-usage

parth/fix-referencing-so

stream-tools-stop

jmorganca/degin-1

brucemacd/install-path-clean

brucemacd/push-name-validation

brucemacd/browser-key-register

jmorganca/openai-fix-first-message

jmorganca/fix-proxy

jessegross/sample

parth/disallow-streaming-tools

dhiltgen/remove_submodule

jmorganca/ga

jmorganca/mllama

pdevine/newlines

pdevine/geems-2b

jmorganca/llama-bump

mxyng/modelname-7

mxyng/gin-slog

mxyng/modelname-6

jyan/convert-prog

jyan/quant5

paligemma-support

pdevine/import-docs

jmorganca/openai-context

jyan/paligemma

jyan/p2

jyan/palitest

bmizerany/embedspeedup

jmorganca/llama-vit

brucemacd/allow-ollama

royh/ep-methods

royh/whisper

mxyng/api-models

mxyng/fix-memory

jyan/q4_4/8

jyan/ollama-v

royh/stream-tools

roy-embed-parallel

bmizerany/hrm

revert-5963-revert-5924-mxyng/llama3.1-rope

royh/embed-viz

jyan/local2

jyan/auth

jyan/local

jyan/parse-temp

jmorganca/template-mistral

jyan/reord-g

royh-openai-suffixdocs

royh-imgembed

royh-embed-parallel

jyan/quant4

royh-precision

jyan/progress

pdevine/fix-template

jyan/quant3

pdevine/ggla

mxyng/update-registry-domain

jmorganca/ggml-static

mxyng/create-context

jyan/v0.146

mxyng/layers-from-files

build_dist

bmizerany/noseek

royh-ls

royh-name

timeout

mxyng/server-timestamp

bmizerany/nosillyggufslurps

royh-params

jmorganca/llama-cpp-7c26775

royh-openai-delete

royh-show-rigid

jmorganca/enable-fa

jmorganca/no-error-template

jyan/format

royh-testdelete

bmizerany/fastverify

language_support

pdevine/ps-glitches

brucemacd/tokenize

bruce/iq-quants

bmizerany/filepathwithcoloninhost

mxyng/split-bin

bmizerany/client-registry

jmorganca/if-none-match

native

jmorganca/native

jmorganca/batch-embeddings

jmorganca/initcmake

jmorganca/mm

pdevine/showggmlinfo

modenameenforcealphanum

bmizerany/modenameenforcealphanum

jmorganca/done-reason

jmorganca/llama-cpp-8960fe8

ollama.com

bmizerany/filepathnobuild

bmizerany/types/model/defaultfix

rmdisplaylong

nogogen

bmizerany/x

modelfile-readme

bmizerany/replacecolon

jmorganca/limit

jmorganca/execstack

jmorganca/replace-assets

mxyng/tune-concurrency

jmorganca/testing

whitespace-detection

jmorganca/options

upgrade-all

scratch

cuda-search

mattw/airenamer

mattw/allmodelsonhuggingface

mattw/quantcontext

mattw/whatneedstorun

brucemacd/llama-mem-calc

mattw/faq-context

mattw/communitylinks

mattw/noprune

mattw/python-functioncalling

rename

mxyng/install

pulse

remove-first

editor

mattw/selfqueryingretrieval

cgo

mattw/howtoquant

api

matt/streamingapi

format-config

mxyng/extra-args

shell

update-nous-hermes

cp-model

upload-progress

fix-unknown-model

fix-model-names

delete-fix

insecure-registry

ls

deletemodels

progressbar

readme-updates

license-layers

skip-list

list-models

modelpath

matt/examplemodelfiles

distribution

go-opts

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/ollama#69464