[GH-ISSUE #11240] WSL2 Multi-GPU: Prefill failing to CPU-only, token generation uses GPUs normally #69464

Open
opened 2026-05-04 18:11:22 -05:00 by GiteaMirror · 3 comments
Owner

Originally created by @eowensai on GitHub (Jun 30, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/11240

What is the issue?

Environment:
Host OS: Windows 11 Pro
WSL2 Distribution: Ubuntu 24.04
GPU: 2x NVIDIA GeForce RTX 3060 (12GB)
NVIDIA Driver Version: 576.80 (on Windows Host)
Model: gemma-3-4b-it-qat.gguf

What happened?
In this WSL2 environment, Ollama consistently uses the CPU for the initial prefill phase of prompt processing. For large prompts, this CPU-bound prefill can take several minutes to complete, during which nvidia-smi reports 0% GPU utilization (although the full model can be seen sharded onto the gpus). Once this phase is over, token generation correctly utilizes the GPUs and achieves expected performance.

I've only seen this with Ollama running in the WSL2 environment, it is not reproducible when running Ollama natively on the Windows host.

The behavior persists even when num_ctx is reduced from 128k to 8k. The issue also occurs in v0.9.3 (what I started with, going down to 0.9.2 was a troubleshooting step)

Several environment and system configurations were also tested with no change in the prefill behavior, including:

  • Using numactl to set the memory interleave policy.
  • Explicitly setting CUDA_VISIBLE_DEVICES=0,1.
  • Forcing the KV cache type with OLLAMA_KV_CACHE_TYPE=f16 in an attempt to address a "kv cache type not supported" warning found in logs.
  • Attempting to resolve potential library path issues via symbolic links (ln -s) and by explicitly setting LD_LIBRARY_PATH.

The core issue appears to be a failure within the Ollama runner process to initialize a GPU-compatible backend for the prefill task, forcing a fallback to the CPU, despite being able to use the GPU for the decoding task.

What did you expect to happen?
The prefill computation for all prompts should be executed on the GPU, leveraging the available VRAM and compute capabilities. CPU usage should remain low during prefill, and nvidia-smi should show high GPU utilization, resulting in fast prompt processing times, consistent with the performance of native Windows Ollama.

Relevant log output

Jun 29 20:22:04 Aidev systemd[1]: Stopping ollama.service - Ollama Service (v0.9.2 Manual Install)...
Jun 29 20:22:04 Aidev numactl[97909]: time=2025-06-29T20:22:04.904-07:00 level=DEBUG source=sched.go:872 msg="shutting down runner" model=/usr/share/ollama/.ollama/models/blobs/sha256-529850705c0884a283b87d3b261d36ee30821e16f0310962ba977b456ad3b8cd
Jun 29 20:22:04 Aidev numactl[97909]: time=2025-06-29T20:22:04.904-07:00 level=DEBUG source=server.go:1024 msg="stopping llama server" pid=110636
Jun 29 20:22:04 Aidev numactl[97909]: time=2025-06-29T20:22:04.904-07:00 level=DEBUG source=server.go:1030 msg="waiting for llama server to exit" pid=110636
Jun 29 20:22:04 Aidev numactl[97909]: time=2025-06-29T20:22:04.904-07:00 level=DEBUG source=sched.go:322 msg="shutting down scheduler completed loop"
Jun 29 20:22:04 Aidev numactl[97909]: time=2025-06-29T20:22:04.904-07:00 level=DEBUG source=sched.go:122 msg="shutting down scheduler pending loop"
Jun 29 20:22:05 Aidev numactl[97909]: time=2025-06-29T20:22:05.043-07:00 level=DEBUG source=server.go:1034 msg="llama server stopped" pid=110636
Jun 29 20:22:05 Aidev systemd[1]: ollama.service: Deactivated successfully.
Jun 29 20:22:05 Aidev systemd[1]: Stopped ollama.service - Ollama Service (v0.9.2 Manual Install).
Jun 29 20:22:05 Aidev systemd[1]: ollama.service: Consumed 17min 59.851s CPU time, 5.5G memory peak, 0B memory swap peak.
Jun 29 20:22:05 Aidev systemd[1]: Started ollama.service - Ollama Service (v0.9.2 Manual Install).
Jun 29 20:22:05 Aidev numactl[115262]: time=2025-06-29T20:22:05.110-07:00 level=INFO source=routes.go:1235 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:DEBUG OLLAMA_FLASH_ATTENTION:true OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:2562047h47m16.854775807s OLLAMA_KV_CACHE_TYPE:f16 OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/usr/share/ollama/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
Jun 29 20:22:05 Aidev numactl[115262]: time=2025-06-29T20:22:05.111-07:00 level=INFO source=images.go:480 msg="total blobs: 8"
Jun 29 20:22:05 Aidev numactl[115262]: time=2025-06-29T20:22:05.111-07:00 level=INFO source=images.go:487 msg="total unused blobs removed: 0"
Jun 29 20:22:05 Aidev numactl[115262]: time=2025-06-29T20:22:05.111-07:00 level=INFO source=routes.go:1288 msg="Listening on 127.0.0.1:11434 (version 0.9.2)"
Jun 29 20:22:05 Aidev numactl[115262]: time=2025-06-29T20:22:05.112-07:00 level=DEBUG source=sched.go:108 msg="starting llm scheduler"
Jun 29 20:22:05 Aidev numactl[115262]: time=2025-06-29T20:22:05.112-07:00 level=INFO source=gpu.go:217 msg="looking for compatible GPUs"
Jun 29 20:22:05 Aidev numactl[115262]: time=2025-06-29T20:22:05.112-07:00 level=DEBUG source=gpu.go:98 msg="searching for GPU discovery libraries for NVIDIA"
Jun 29 20:22:05 Aidev numactl[115262]: time=2025-06-29T20:22:05.112-07:00 level=DEBUG source=gpu.go:501 msg="Searching for GPU library" name=libcuda.so*
Jun 29 20:22:05 Aidev numactl[115262]: time=2025-06-29T20:22:05.112-07:00 level=DEBUG source=gpu.go:525 msg="gpu library search" globs="[/usr/local/lib/ollama/libcuda.so* /usr/share/ollama/libcuda.so* /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]"
Jun 29 20:22:05 Aidev numactl[115262]: time=2025-06-29T20:22:05.940-07:00 level=DEBUG source=gpu.go:558 msg="discovered GPU libraries" paths="[/usr/lib/x86_64-linux-gnu/libcuda.so.535.230.02 /usr/lib/wsl/lib/libcuda.so /usr/lib/wsl/lib/libcuda.so.1 /usr/lib/wsl/lib/libcuda.so.1.1 /usr/lib/wsl/drivers/nv_dispig.inf_amd64_0afec3f2050014a0/libcuda.so.1.1 /usr/lib/wsl/drivers/nv_dispsi.inf_amd64_bc81edc432675578/libcuda.so.1.1]"
Jun 29 20:22:05 Aidev numactl[115262]: initializing /usr/lib/x86_64-linux-gnu/libcuda.so.535.230.02
Jun 29 20:22:05 Aidev numactl[115262]: dlsym: cuInit - 0x73202d4c2470
Jun 29 20:22:05 Aidev numactl[115262]: dlsym: cuDriverGetVersion - 0x73202d4c2490
Jun 29 20:22:05 Aidev numactl[115262]: dlsym: cuDeviceGetCount - 0x73202d4c24d0
Jun 29 20:22:05 Aidev numactl[115262]: dlsym: cuDeviceGet - 0x73202d4c24b0
Jun 29 20:22:05 Aidev numactl[115262]: dlsym: cuDeviceGetAttribute - 0x73202d4c25b0
Jun 29 20:22:05 Aidev numactl[115262]: dlsym: cuDeviceGetUuid - 0x73202d4c2510
Jun 29 20:22:05 Aidev numactl[115262]: dlsym: cuDeviceGetName - 0x73202d4c24f0
Jun 29 20:22:05 Aidev numactl[115262]: dlsym: cuCtxCreate_v3 - 0x73202d4ca170
Jun 29 20:22:05 Aidev numactl[115262]: dlsym: cuMemGetInfo_v2 - 0x73202d4d5640
Jun 29 20:22:05 Aidev numactl[115262]: dlsym: cuCtxDestroy - 0x73202d524640
Jun 29 20:22:05 Aidev numactl[115262]: calling cuInit
Jun 29 20:22:05 Aidev numactl[115262]: cuInit err: 100
Jun 29 20:22:05 Aidev numactl[115262]: time=2025-06-29T20:22:05.960-07:00 level=INFO source=gpu.go:602 msg="no nvidia devices detected by library /usr/lib/x86_64-linux-gnu/libcuda.so.535.230.02"
Jun 29 20:22:05 Aidev numactl[115262]: initializing /usr/lib/wsl/lib/libcuda.so
Jun 29 20:22:05 Aidev numactl[115262]: dlsym: cuInit - 0x73202ec21dd0
Jun 29 20:22:05 Aidev numactl[115262]: dlsym: cuDriverGetVersion - 0x73202ec21d90
Jun 29 20:22:05 Aidev numactl[115262]: dlsym: cuDeviceGetCount - 0x73202ec21e0d
Jun 29 20:22:05 Aidev numactl[115262]: dlsym: cuDeviceGet - 0x73202ec21e07
Jun 29 20:22:05 Aidev numactl[115262]: dlsym: cuDeviceGetAttribute - 0x73202ec21cf0
Jun 29 20:22:05 Aidev numactl[115262]: dlsym: cuDeviceGetUuid - 0x73202ec21e19
Jun 29 20:22:05 Aidev numactl[115262]: dlsym: cuDeviceGetName - 0x73202ec21e13
Jun 29 20:22:05 Aidev numactl[115262]: dlsym: cuCtxCreate_v3 - 0x73202ec21e8b
Jun 29 20:22:05 Aidev numactl[115262]: dlsym: cuMemGetInfo_v2 - 0x73202ec21f8d
Jun 29 20:22:05 Aidev numactl[115262]: dlsym: cuCtxDestroy - 0x73202ec21e9d
Jun 29 20:22:05 Aidev numactl[115262]: calling cuInit
Jun 29 20:22:06 Aidev numactl[115262]: calling cuDriverGetVersion
Jun 29 20:22:06 Aidev numactl[115262]: raw version 0x2f3a
Jun 29 20:22:06 Aidev numactl[115262]: CUDA driver version: 12.9
Jun 29 20:22:06 Aidev numactl[115262]: calling cuDeviceGetCount
Jun 29 20:22:06 Aidev numactl[115262]: device count 2
Jun 29 20:22:06 Aidev numactl[115262]: time=2025-06-29T20:22:06.011-07:00 level=DEBUG source=gpu.go:125 msg="detected GPUs" count=2 library=/usr/lib/wsl/lib/libcuda.so
Jun 29 20:22:06 Aidev numactl[115262]: [GPU-c6676d00-cd95-9a0e-46b3-f92becc20fea] CUDA totalMem 12287mb
Jun 29 20:22:06 Aidev numactl[115262]: [GPU-c6676d00-cd95-9a0e-46b3-f92becc20fea] CUDA freeMem 11247mb
Jun 29 20:22:06 Aidev numactl[115262]: [GPU-c6676d00-cd95-9a0e-46b3-f92becc20fea] Compute Capability 8.6
Jun 29 20:22:06 Aidev numactl[115262]: [GPU-aa5c4c9d-7ba4-53f2-f3d2-9b12ddf79c6f] CUDA totalMem 12287mb
Jun 29 20:22:06 Aidev numactl[115262]: [GPU-aa5c4c9d-7ba4-53f2-f3d2-9b12ddf79c6f] CUDA freeMem 11242mb
Jun 29 20:22:06 Aidev numactl[115262]: [GPU-aa5c4c9d-7ba4-53f2-f3d2-9b12ddf79c6f] Compute Capability 8.6
Jun 29 20:22:06 Aidev numactl[115262]: time=2025-06-29T20:22:06.252-07:00 level=DEBUG source=amd_linux.go:419 msg="amdgpu driver not detected /sys/module/amdgpu"
Jun 29 20:22:06 Aidev numactl[115262]: releasing cuda driver library
Jun 29 20:22:06 Aidev numactl[115262]: time=2025-06-29T20:22:06.252-07:00 level=INFO source=types.go:130 msg="inference compute" id=GPU-c6676d00-cd95-9a0e-46b3-f92becc20fea library=cuda variant=v12 compute=8.6 driver=12.9 name="NVIDIA GeForce RTX 3060" total="12.0 GiB" available="11.0 GiB"
Jun 29 20:22:06 Aidev numactl[115262]: time=2025-06-29T20:22:06.252-07:00 level=INFO source=types.go:130 msg="inference compute" id=GPU-aa5c4c9d-7ba4-53f2-f3d2-9b12ddf79c6f library=cuda variant=v12 compute=8.6 driver=12.9 name="NVIDIA GeForce RTX 3060" total="12.0 GiB" available="11.0 GiB"
Jun 29 20:22:39 Aidev numactl[115262]: [GIN] 2025/06/29 - 20:22:39 | 200 |      81.082µs |       127.0.0.1 | GET      "/"
Jun 29 20:22:44 Aidev numactl[115262]: [GIN] 2025/06/29 - 20:22:44 | 200 |      22.772µs |       127.0.0.1 | GET      "/"
Jun 29 20:22:44 Aidev numactl[115262]: [GIN] 2025/06/29 - 20:22:44 | 200 |      19.863µs |       127.0.0.1 | GET      "/"
Jun 29 20:22:44 Aidev numactl[115262]: time=2025-06-29T20:22:44.166-07:00 level=DEBUG source=ggml.go:155 msg="key not found" key=general.alignment default=32
Jun 29 20:22:44 Aidev numactl[115262]: time=2025-06-29T20:22:44.166-07:00 level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="31.2 GiB" before.free="30.0 GiB" before.free_swap="8.0 GiB" now.total="31.2 GiB" now.free="30.0 GiB" now.free_swap="8.0 GiB"
Jun 29 20:22:44 Aidev numactl[115262]: initializing /usr/lib/wsl/lib/libcuda.so
Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuInit - 0x73202ec21dd0
Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuDriverGetVersion - 0x73202ec21d90
Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuDeviceGetCount - 0x73202ec21e0d
Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuDeviceGet - 0x73202ec21e07
Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuDeviceGetAttribute - 0x73202ec21cf0
Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuDeviceGetUuid - 0x73202ec21e19
Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuDeviceGetName - 0x73202ec21e13
Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuCtxCreate_v3 - 0x73202ec21e8b
Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuMemGetInfo_v2 - 0x73202ec21f8d
Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuCtxDestroy - 0x73202ec21e9d
Jun 29 20:22:44 Aidev numactl[115262]: calling cuInit
Jun 29 20:22:44 Aidev numactl[115262]: calling cuDriverGetVersion
Jun 29 20:22:44 Aidev numactl[115262]: raw version 0x2f3a
Jun 29 20:22:44 Aidev numactl[115262]: CUDA driver version: 12.9
Jun 29 20:22:44 Aidev numactl[115262]: calling cuDeviceGetCount
Jun 29 20:22:44 Aidev numactl[115262]: device count 2
Jun 29 20:22:44 Aidev numactl[115262]: time=2025-06-29T20:22:44.265-07:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-c6676d00-cd95-9a0e-46b3-f92becc20fea name="NVIDIA GeForce RTX 3060" overhead="0 B" before.total="12.0 GiB" before.free="11.0 GiB" now.total="12.0 GiB" now.free="11.0 GiB" now.used="1.0 GiB"
Jun 29 20:22:44 Aidev numactl[115262]: time=2025-06-29T20:22:44.373-07:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-aa5c4c9d-7ba4-53f2-f3d2-9b12ddf79c6f name="NVIDIA GeForce RTX 3060" overhead="0 B" before.total="12.0 GiB" before.free="11.0 GiB" now.total="12.0 GiB" now.free="11.0 GiB" now.used="1.0 GiB"
Jun 29 20:22:44 Aidev numactl[115262]: releasing cuda driver library
Jun 29 20:22:44 Aidev numactl[115262]: time=2025-06-29T20:22:44.373-07:00 level=DEBUG source=sched.go:185 msg="updating default concurrency" OLLAMA_MAX_LOADED_MODELS=6 gpu_count=2
Jun 29 20:22:44 Aidev numactl[115262]: time=2025-06-29T20:22:44.388-07:00 level=DEBUG source=ggml.go:155 msg="key not found" key=general.alignment default=32
Jun 29 20:22:44 Aidev numactl[115262]: time=2025-06-29T20:22:44.402-07:00 level=DEBUG source=ggml.go:155 msg="key not found" key=general.alignment default=32
Jun 29 20:22:44 Aidev numactl[115262]: time=2025-06-29T20:22:44.403-07:00 level=DEBUG source=sched.go:228 msg="loading first model" model=/usr/share/ollama/.ollama/models/blobs/sha256-529850705c0884a283b87d3b261d36ee30821e16f0310962ba977b456ad3b8cd
Jun 29 20:22:44 Aidev numactl[115262]: time=2025-06-29T20:22:44.403-07:00 level=DEBUG source=memory.go:111 msg=evaluating library=cuda gpu_count=1 available="[11.0 GiB]"
Jun 29 20:22:44 Aidev numactl[115262]: time=2025-06-29T20:22:44.404-07:00 level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="31.2 GiB" before.free="30.0 GiB" before.free_swap="8.0 GiB" now.total="31.2 GiB" now.free="30.0 GiB" now.free_swap="8.0 GiB"
Jun 29 20:22:44 Aidev numactl[115262]: initializing /usr/lib/wsl/lib/libcuda.so
Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuInit - 0x73202ec21dd0
Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuDriverGetVersion - 0x73202ec21d90
Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuDeviceGetCount - 0x73202ec21e0d
Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuDeviceGet - 0x73202ec21e07
Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuDeviceGetAttribute - 0x73202ec21cf0
Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuDeviceGetUuid - 0x73202ec21e19
Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuDeviceGetName - 0x73202ec21e13
Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuCtxCreate_v3 - 0x73202ec21e8b
Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuMemGetInfo_v2 - 0x73202ec21f8d
Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuCtxDestroy - 0x73202ec21e9d
Jun 29 20:22:44 Aidev numactl[115262]: calling cuInit
Jun 29 20:22:44 Aidev numactl[115262]: calling cuDriverGetVersion
Jun 29 20:22:44 Aidev numactl[115262]: raw version 0x2f3a
Jun 29 20:22:44 Aidev numactl[115262]: CUDA driver version: 12.9
Jun 29 20:22:44 Aidev numactl[115262]: calling cuDeviceGetCount
Jun 29 20:22:44 Aidev numactl[115262]: device count 2
Jun 29 20:22:44 Aidev numactl[115262]: time=2025-06-29T20:22:44.505-07:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-c6676d00-cd95-9a0e-46b3-f92becc20fea name="NVIDIA GeForce RTX 3060" overhead="0 B" before.total="12.0 GiB" before.free="11.0 GiB" now.total="12.0 GiB" now.free="11.0 GiB" now.used="1.0 GiB"
Jun 29 20:22:44 Aidev numactl[115262]: time=2025-06-29T20:22:44.596-07:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-aa5c4c9d-7ba4-53f2-f3d2-9b12ddf79c6f name="NVIDIA GeForce RTX 3060" overhead="0 B" before.total="12.0 GiB" before.free="11.0 GiB" now.total="12.0 GiB" now.free="11.0 GiB" now.used="1.0 GiB"
Jun 29 20:22:44 Aidev numactl[115262]: releasing cuda driver library
Jun 29 20:22:44 Aidev numactl[115262]: time=2025-06-29T20:22:44.597-07:00 level=DEBUG source=memory.go:111 msg=evaluating library=cuda gpu_count=1 available="[11.0 GiB]"
Jun 29 20:22:44 Aidev numactl[115262]: time=2025-06-29T20:22:44.597-07:00 level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="31.2 GiB" before.free="30.0 GiB" before.free_swap="8.0 GiB" now.total="31.2 GiB" now.free="30.0 GiB" now.free_swap="8.0 GiB"
Jun 29 20:22:44 Aidev numactl[115262]: initializing /usr/lib/wsl/lib/libcuda.so
Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuInit - 0x73202ec21dd0
Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuDriverGetVersion - 0x73202ec21d90
Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuDeviceGetCount - 0x73202ec21e0d
Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuDeviceGet - 0x73202ec21e07
Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuDeviceGetAttribute - 0x73202ec21cf0
Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuDeviceGetUuid - 0x73202ec21e19
Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuDeviceGetName - 0x73202ec21e13
Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuCtxCreate_v3 - 0x73202ec21e8b
Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuMemGetInfo_v2 - 0x73202ec21f8d
Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuCtxDestroy - 0x73202ec21e9d
Jun 29 20:22:44 Aidev numactl[115262]: calling cuInit
Jun 29 20:22:44 Aidev numactl[115262]: calling cuDriverGetVersion
Jun 29 20:22:44 Aidev numactl[115262]: raw version 0x2f3a
Jun 29 20:22:44 Aidev numactl[115262]: CUDA driver version: 12.9
Jun 29 20:22:44 Aidev numactl[115262]: calling cuDeviceGetCount
Jun 29 20:22:44 Aidev numactl[115262]: device count 2
Jun 29 20:22:44 Aidev numactl[115262]: time=2025-06-29T20:22:44.686-07:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-c6676d00-cd95-9a0e-46b3-f92becc20fea name="NVIDIA GeForce RTX 3060" overhead="0 B" before.total="12.0 GiB" before.free="11.0 GiB" now.total="12.0 GiB" now.free="11.0 GiB" now.used="1.0 GiB"
Jun 29 20:22:44 Aidev numactl[115262]: time=2025-06-29T20:22:44.776-07:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-aa5c4c9d-7ba4-53f2-f3d2-9b12ddf79c6f name="NVIDIA GeForce RTX 3060" overhead="0 B" before.total="12.0 GiB" before.free="11.0 GiB" now.total="12.0 GiB" now.free="11.0 GiB" now.used="1.0 GiB"
Jun 29 20:22:44 Aidev numactl[115262]: releasing cuda driver library
Jun 29 20:22:44 Aidev numactl[115262]: time=2025-06-29T20:22:44.776-07:00 level=DEBUG source=memory.go:111 msg=evaluating library=cuda gpu_count=1 available="[11.0 GiB]"
Jun 29 20:22:44 Aidev numactl[115262]: time=2025-06-29T20:22:44.777-07:00 level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="31.2 GiB" before.free="30.0 GiB" before.free_swap="8.0 GiB" now.total="31.2 GiB" now.free="30.0 GiB" now.free_swap="8.0 GiB"
Jun 29 20:22:44 Aidev numactl[115262]: initializing /usr/lib/wsl/lib/libcuda.so
Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuInit - 0x73202ec21dd0
Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuDriverGetVersion - 0x73202ec21d90
Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuDeviceGetCount - 0x73202ec21e0d
Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuDeviceGet - 0x73202ec21e07
Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuDeviceGetAttribute - 0x73202ec21cf0
Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuDeviceGetUuid - 0x73202ec21e19
Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuDeviceGetName - 0x73202ec21e13
Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuCtxCreate_v3 - 0x73202ec21e8b
Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuMemGetInfo_v2 - 0x73202ec21f8d
Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuCtxDestroy - 0x73202ec21e9d
Jun 29 20:22:44 Aidev numactl[115262]: calling cuInit
Jun 29 20:22:44 Aidev numactl[115262]: calling cuDriverGetVersion
Jun 29 20:22:44 Aidev numactl[115262]: raw version 0x2f3a
Jun 29 20:22:44 Aidev numactl[115262]: CUDA driver version: 12.9
Jun 29 20:22:44 Aidev numactl[115262]: calling cuDeviceGetCount
Jun 29 20:22:44 Aidev numactl[115262]: device count 2
Jun 29 20:22:44 Aidev numactl[115262]: time=2025-06-29T20:22:44.865-07:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-c6676d00-cd95-9a0e-46b3-f92becc20fea name="NVIDIA GeForce RTX 3060" overhead="0 B" before.total="12.0 GiB" before.free="11.0 GiB" now.total="12.0 GiB" now.free="11.0 GiB" now.used="1.0 GiB"
Jun 29 20:22:44 Aidev numactl[115262]: time=2025-06-29T20:22:44.957-07:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-aa5c4c9d-7ba4-53f2-f3d2-9b12ddf79c6f name="NVIDIA GeForce RTX 3060" overhead="0 B" before.total="12.0 GiB" before.free="11.0 GiB" now.total="12.0 GiB" now.free="11.0 GiB" now.used="1.0 GiB"
Jun 29 20:22:44 Aidev numactl[115262]: releasing cuda driver library
Jun 29 20:22:44 Aidev numactl[115262]: time=2025-06-29T20:22:44.958-07:00 level=DEBUG source=memory.go:111 msg=evaluating library=cuda gpu_count=1 available="[11.0 GiB]"
Jun 29 20:22:44 Aidev numactl[115262]: time=2025-06-29T20:22:44.958-07:00 level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="31.2 GiB" before.free="30.0 GiB" before.free_swap="8.0 GiB" now.total="31.2 GiB" now.free="30.0 GiB" now.free_swap="8.0 GiB"
Jun 29 20:22:44 Aidev numactl[115262]: initializing /usr/lib/wsl/lib/libcuda.so
Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuInit - 0x73202ec21dd0
Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuDriverGetVersion - 0x73202ec21d90
Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuDeviceGetCount - 0x73202ec21e0d
Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuDeviceGet - 0x73202ec21e07
Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuDeviceGetAttribute - 0x73202ec21cf0
Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuDeviceGetUuid - 0x73202ec21e19
Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuDeviceGetName - 0x73202ec21e13
Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuCtxCreate_v3 - 0x73202ec21e8b
Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuMemGetInfo_v2 - 0x73202ec21f8d
Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuCtxDestroy - 0x73202ec21e9d
Jun 29 20:22:44 Aidev numactl[115262]: calling cuInit
Jun 29 20:22:44 Aidev numactl[115262]: calling cuDriverGetVersion
Jun 29 20:22:44 Aidev numactl[115262]: raw version 0x2f3a
Jun 29 20:22:44 Aidev numactl[115262]: CUDA driver version: 12.9
Jun 29 20:22:44 Aidev numactl[115262]: calling cuDeviceGetCount
Jun 29 20:22:44 Aidev numactl[115262]: device count 2
Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.046-07:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-c6676d00-cd95-9a0e-46b3-f92becc20fea name="NVIDIA GeForce RTX 3060" overhead="0 B" before.total="12.0 GiB" before.free="11.0 GiB" now.total="12.0 GiB" now.free="11.0 GiB" now.used="1.0 GiB"
Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.131-07:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-aa5c4c9d-7ba4-53f2-f3d2-9b12ddf79c6f name="NVIDIA GeForce RTX 3060" overhead="0 B" before.total="12.0 GiB" before.free="11.0 GiB" now.total="12.0 GiB" now.free="11.0 GiB" now.used="1.0 GiB"
Jun 29 20:22:45 Aidev numactl[115262]: releasing cuda driver library
Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.132-07:00 level=DEBUG source=memory.go:111 msg=evaluating library=cuda gpu_count=2 available="[11.0 GiB 11.0 GiB]"
Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.133-07:00 level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="31.2 GiB" before.free="30.0 GiB" before.free_swap="8.0 GiB" now.total="31.2 GiB" now.free="30.0 GiB" now.free_swap="8.0 GiB"
Jun 29 20:22:45 Aidev numactl[115262]: initializing /usr/lib/wsl/lib/libcuda.so
Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuInit - 0x73202ec21dd0
Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuDriverGetVersion - 0x73202ec21d90
Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuDeviceGetCount - 0x73202ec21e0d
Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuDeviceGet - 0x73202ec21e07
Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuDeviceGetAttribute - 0x73202ec21cf0
Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuDeviceGetUuid - 0x73202ec21e19
Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuDeviceGetName - 0x73202ec21e13
Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuCtxCreate_v3 - 0x73202ec21e8b
Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuMemGetInfo_v2 - 0x73202ec21f8d
Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuCtxDestroy - 0x73202ec21e9d
Jun 29 20:22:45 Aidev numactl[115262]: calling cuInit
Jun 29 20:22:45 Aidev numactl[115262]: calling cuDriverGetVersion
Jun 29 20:22:45 Aidev numactl[115262]: raw version 0x2f3a
Jun 29 20:22:45 Aidev numactl[115262]: CUDA driver version: 12.9
Jun 29 20:22:45 Aidev numactl[115262]: calling cuDeviceGetCount
Jun 29 20:22:45 Aidev numactl[115262]: device count 2
Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.213-07:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-c6676d00-cd95-9a0e-46b3-f92becc20fea name="NVIDIA GeForce RTX 3060" overhead="0 B" before.total="12.0 GiB" before.free="11.0 GiB" now.total="12.0 GiB" now.free="11.0 GiB" now.used="1.0 GiB"
Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.298-07:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-aa5c4c9d-7ba4-53f2-f3d2-9b12ddf79c6f name="NVIDIA GeForce RTX 3060" overhead="0 B" before.total="12.0 GiB" before.free="11.0 GiB" now.total="12.0 GiB" now.free="11.0 GiB" now.used="1.0 GiB"
Jun 29 20:22:45 Aidev numactl[115262]: releasing cuda driver library
Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.298-07:00 level=DEBUG source=memory.go:111 msg=evaluating library=cuda gpu_count=2 available="[11.0 GiB 11.0 GiB]"
Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.299-07:00 level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="31.2 GiB" before.free="30.0 GiB" before.free_swap="8.0 GiB" now.total="31.2 GiB" now.free="30.0 GiB" now.free_swap="8.0 GiB"
Jun 29 20:22:45 Aidev numactl[115262]: initializing /usr/lib/wsl/lib/libcuda.so
Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuInit - 0x73202ec21dd0
Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuDriverGetVersion - 0x73202ec21d90
Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuDeviceGetCount - 0x73202ec21e0d
Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuDeviceGet - 0x73202ec21e07
Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuDeviceGetAttribute - 0x73202ec21cf0
Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuDeviceGetUuid - 0x73202ec21e19
Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuDeviceGetName - 0x73202ec21e13
Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuCtxCreate_v3 - 0x73202ec21e8b
Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuMemGetInfo_v2 - 0x73202ec21f8d
Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuCtxDestroy - 0x73202ec21e9d
Jun 29 20:22:45 Aidev numactl[115262]: calling cuInit
Jun 29 20:22:45 Aidev numactl[115262]: calling cuDriverGetVersion
Jun 29 20:22:45 Aidev numactl[115262]: raw version 0x2f3a
Jun 29 20:22:45 Aidev numactl[115262]: CUDA driver version: 12.9
Jun 29 20:22:45 Aidev numactl[115262]: calling cuDeviceGetCount
Jun 29 20:22:45 Aidev numactl[115262]: device count 2
Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.381-07:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-c6676d00-cd95-9a0e-46b3-f92becc20fea name="NVIDIA GeForce RTX 3060" overhead="0 B" before.total="12.0 GiB" before.free="11.0 GiB" now.total="12.0 GiB" now.free="11.0 GiB" now.used="1.0 GiB"
Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.465-07:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-aa5c4c9d-7ba4-53f2-f3d2-9b12ddf79c6f name="NVIDIA GeForce RTX 3060" overhead="0 B" before.total="12.0 GiB" before.free="11.0 GiB" now.total="12.0 GiB" now.free="11.0 GiB" now.used="1.0 GiB"
Jun 29 20:22:45 Aidev numactl[115262]: releasing cuda driver library
Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.466-07:00 level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="31.2 GiB" before.free="30.0 GiB" before.free_swap="8.0 GiB" now.total="31.2 GiB" now.free="30.0 GiB" now.free_swap="8.0 GiB"
Jun 29 20:22:45 Aidev numactl[115262]: initializing /usr/lib/wsl/lib/libcuda.so
Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuInit - 0x73202ec21dd0
Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuDriverGetVersion - 0x73202ec21d90
Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuDeviceGetCount - 0x73202ec21e0d
Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuDeviceGet - 0x73202ec21e07
Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuDeviceGetAttribute - 0x73202ec21cf0
Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuDeviceGetUuid - 0x73202ec21e19
Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuDeviceGetName - 0x73202ec21e13
Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuCtxCreate_v3 - 0x73202ec21e8b
Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuMemGetInfo_v2 - 0x73202ec21f8d
Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuCtxDestroy - 0x73202ec21e9d
Jun 29 20:22:45 Aidev numactl[115262]: calling cuInit
Jun 29 20:22:45 Aidev numactl[115262]: calling cuDriverGetVersion
Jun 29 20:22:45 Aidev numactl[115262]: raw version 0x2f3a
Jun 29 20:22:45 Aidev numactl[115262]: CUDA driver version: 12.9
Jun 29 20:22:45 Aidev numactl[115262]: calling cuDeviceGetCount
Jun 29 20:22:45 Aidev numactl[115262]: device count 2
Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.543-07:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-c6676d00-cd95-9a0e-46b3-f92becc20fea name="NVIDIA GeForce RTX 3060" overhead="0 B" before.total="12.0 GiB" before.free="11.0 GiB" now.total="12.0 GiB" now.free="11.0 GiB" now.used="1.0 GiB"
Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.633-07:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-aa5c4c9d-7ba4-53f2-f3d2-9b12ddf79c6f name="NVIDIA GeForce RTX 3060" overhead="0 B" before.total="12.0 GiB" before.free="11.0 GiB" now.total="12.0 GiB" now.free="11.0 GiB" now.used="1.0 GiB"
Jun 29 20:22:45 Aidev numactl[115262]: releasing cuda driver library
Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.633-07:00 level=INFO source=server.go:135 msg="system memory" total="31.2 GiB" free="30.0 GiB" free_swap="8.0 GiB"
Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.633-07:00 level=DEBUG source=memory.go:111 msg=evaluating library=cuda gpu_count=2 available="[11.0 GiB 11.0 GiB]"
Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.633-07:00 level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="31.2 GiB" before.free="30.0 GiB" before.free_swap="8.0 GiB" now.total="31.2 GiB" now.free="30.0 GiB" now.free_swap="8.0 GiB"
Jun 29 20:22:45 Aidev numactl[115262]: initializing /usr/lib/wsl/lib/libcuda.so
Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuInit - 0x73202ec21dd0
Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuDriverGetVersion - 0x73202ec21d90
Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuDeviceGetCount - 0x73202ec21e0d
Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuDeviceGet - 0x73202ec21e07
Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuDeviceGetAttribute - 0x73202ec21cf0
Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuDeviceGetUuid - 0x73202ec21e19
Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuDeviceGetName - 0x73202ec21e13
Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuCtxCreate_v3 - 0x73202ec21e8b
Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuMemGetInfo_v2 - 0x73202ec21f8d
Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuCtxDestroy - 0x73202ec21e9d
Jun 29 20:22:45 Aidev numactl[115262]: calling cuInit
Jun 29 20:22:45 Aidev numactl[115262]: calling cuDriverGetVersion
Jun 29 20:22:45 Aidev numactl[115262]: raw version 0x2f3a
Jun 29 20:22:45 Aidev numactl[115262]: CUDA driver version: 12.9
Jun 29 20:22:45 Aidev numactl[115262]: calling cuDeviceGetCount
Jun 29 20:22:45 Aidev numactl[115262]: device count 2
Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.710-07:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-c6676d00-cd95-9a0e-46b3-f92becc20fea name="NVIDIA GeForce RTX 3060" overhead="0 B" before.total="12.0 GiB" before.free="11.0 GiB" now.total="12.0 GiB" now.free="11.0 GiB" now.used="1.0 GiB"
Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.792-07:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-aa5c4c9d-7ba4-53f2-f3d2-9b12ddf79c6f name="NVIDIA GeForce RTX 3060" overhead="0 B" before.total="12.0 GiB" before.free="11.0 GiB" now.total="12.0 GiB" now.free="11.0 GiB" now.used="1.0 GiB"
Jun 29 20:22:45 Aidev numactl[115262]: releasing cuda driver library
Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.793-07:00 level=INFO source=server.go:168 msg=offload library=cuda layers.requested=999 layers.model=35 layers.offload=35 layers.split=18,17 memory.available="[11.0 GiB 11.0 GiB]" memory.gpu_overhead="0 B" memory.required.full="8.1 GiB" memory.required.partial="8.1 GiB" memory.required.kv="334.0 MiB" memory.required.allocations="[5.5 GiB 2.6 GiB]" memory.weights.total="2.9 GiB" memory.weights.repeating="1.7 GiB" memory.weights.nonrepeating="1.3 GiB" memory.graph.full="1.0 GiB" memory.graph.partial="1.0 GiB" projector.weights="806.2 MiB" projector.graph="1.0 GiB"
Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.793-07:00 level=INFO source=server.go:211 msg="enabling flash attention"
Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.793-07:00 level=DEBUG source=server.go:284 msg="compatible gpu libraries" compatible=[]
Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.814-07:00 level=DEBUG source=ggml.go:155 msg="key not found" key=general.alignment default=32
Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.815-07:00 level=DEBUG source=ggml.go:155 msg="key not found" key=tokenizer.ggml.eot_token_id default=106
Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.815-07:00 level=DEBUG source=ggml.go:155 msg="key not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}"
Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.816-07:00 level=DEBUG source=ggml.go:155 msg="key not found" key=gemma3.rope.local.freq_base default=10000
Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.816-07:00 level=DEBUG source=ggml.go:155 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06
Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.816-07:00 level=DEBUG source=ggml.go:155 msg="key not found" key=gemma3.rope.freq_scale default=1
Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.816-07:00 level=DEBUG source=ggml.go:155 msg="key not found" key=gemma3.mm_tokens_per_image default=256
Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.817-07:00 level=INFO source=server.go:431 msg="starting llama server" cmd="/usr/local/bin/ollama runner --ollama-engine --model /usr/share/ollama/.ollama/models/blobs/sha256-529850705c0884a283b87d3b261d36ee30821e16f0310962ba977b456ad3b8cd --ctx-size 8192 --batch-size 512 --n-gpu-layers 999 --threads 10 --flash-attn --kv-cache-type f16 --parallel 1 --tensor-split 18,17 --port 38963"
Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.817-07:00 level=DEBUG source=server.go:432 msg=subprocess PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/snap/bin OLLAMA_KEEP_ALIVE=-1 OLLAMA_FLASH_ATTENTION=1 OLLAMA_KV_CACHE_TYPE=f16 LD_LIBRARY_PATH=/usr/local/lib/ollama:/usr/share/ollama:/usr/local/lib/ollama OLLAMA_DEBUG=1 OLLAMA_MAX_LOADED_MODELS=6 OLLAMA_LIBRARY_PATH=/usr/local/lib/ollama CUDA_VISIBLE_DEVICES=GPU-c6676d00-cd95-9a0e-46b3-f92becc20fea,GPU-aa5c4c9d-7ba4-53f2-f3d2-9b12ddf79c6f
Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.817-07:00 level=INFO source=sched.go:483 msg="loaded runners" count=1
Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.817-07:00 level=INFO source=server.go:591 msg="waiting for llama runner to start responding"
Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.818-07:00 level=INFO source=server.go:625 msg="waiting for server to become available" status="llm server not responding"
Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.828-07:00 level=INFO source=runner.go:925 msg="starting ollama engine"
Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.831-07:00 level=INFO source=runner.go:983 msg="Server listening on 127.0.0.1:38963"
Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.853-07:00 level=DEBUG source=ggml.go:155 msg="key not found" key=general.alignment default=32
Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.853-07:00 level=DEBUG source=ggml.go:155 msg="key not found" key=general.description default=""
Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.853-07:00 level=INFO source=ggml.go:92 msg="" architecture=gemma3 file_type=Q4_0 name="Gemma3 4b It Qa_0 Qat Hf" description="" num_tensors=883 num_key_values=42
Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.853-07:00 level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/local/lib/ollama
Jun 29 20:22:45 Aidev numactl[115262]: ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
Jun 29 20:22:45 Aidev numactl[115262]: ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
Jun 29 20:22:45 Aidev numactl[115262]: ggml_cuda_init: found 2 CUDA devices:
Jun 29 20:22:45 Aidev numactl[115262]:   Device 0: NVIDIA GeForce RTX 3060, compute capability 8.6, VMM: yes
Jun 29 20:22:45 Aidev numactl[115262]:   Device 1: NVIDIA GeForce RTX 3060, compute capability 8.6, VMM: yes
Jun 29 20:22:46 Aidev numactl[115262]: load_backend: loaded CUDA backend from /usr/local/lib/ollama/libggml-cuda.so
Jun 29 20:22:46 Aidev numactl[115262]: load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-alderlake.so
Jun 29 20:22:46 Aidev numactl[115262]: time=2025-06-29T20:22:46.020-07:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX_VNNI=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=500,600,610,700,750,800,860,870,890,900,1200 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 CUDA.1.ARCHS=500,600,610,700,750,800,860,870,890,900,1200 CUDA.1.USE_GRAPHS=1 CUDA.1.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc)
Jun 29 20:22:46 Aidev numactl[115262]: time=2025-06-29T20:22:46.069-07:00 level=INFO source=server.go:625 msg="waiting for server to become available" status="llm server loading model"
Jun 29 20:22:46 Aidev numactl[115262]: time=2025-06-29T20:22:46.117-07:00 level=INFO source=ggml.go:351 msg="model weights" buffer=CUDA0 size="912.0 MiB"
Jun 29 20:22:46 Aidev numactl[115262]: time=2025-06-29T20:22:46.117-07:00 level=INFO source=ggml.go:351 msg="model weights" buffer=CUDA1 size="2.8 GiB"
Jun 29 20:22:46 Aidev numactl[115262]: time=2025-06-29T20:22:46.117-07:00 level=INFO source=ggml.go:351 msg="model weights" buffer=CPU size="1.3 GiB"
Jun 29 20:22:46 Aidev numactl[115262]: time=2025-06-29T20:22:46.118-07:00 level=DEBUG source=ggml.go:155 msg="key not found" key=tokenizer.ggml.eot_token_id default=106
Jun 29 20:22:46 Aidev numactl[115262]: time=2025-06-29T20:22:46.118-07:00 level=DEBUG source=ggml.go:155 msg="key not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}"
Jun 29 20:22:46 Aidev numactl[115262]: time=2025-06-29T20:22:46.119-07:00 level=DEBUG source=ggml.go:155 msg="key not found" key=gemma3.rope.local.freq_base default=10000
Jun 29 20:22:46 Aidev numactl[115262]: time=2025-06-29T20:22:46.119-07:00 level=DEBUG source=ggml.go:155 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06
Jun 29 20:22:46 Aidev numactl[115262]: time=2025-06-29T20:22:46.119-07:00 level=DEBUG source=ggml.go:155 msg="key not found" key=gemma3.rope.freq_scale default=1
Jun 29 20:22:46 Aidev numactl[115262]: time=2025-06-29T20:22:46.119-07:00 level=DEBUG source=ggml.go:155 msg="key not found" key=gemma3.mm_tokens_per_image default=256
Jun 29 20:22:46 Aidev numactl[115262]: time=2025-06-29T20:22:46.319-07:00 level=DEBUG source=ggml.go:620 msg="compute graph" nodes=972 splits=1
Jun 29 20:22:46 Aidev numactl[115262]: time=2025-06-29T20:22:46.319-07:00 level=INFO source=ggml.go:638 msg="compute graph" backend=CUDA0 buffer_type=CUDA0 size="0 B"
Jun 29 20:22:46 Aidev numactl[115262]: time=2025-06-29T20:22:46.319-07:00 level=INFO source=ggml.go:638 msg="compute graph" backend=CUDA1 buffer_type=CUDA1 size="1.1 GiB"
Jun 29 20:22:46 Aidev numactl[115262]: time=2025-06-29T20:22:46.319-07:00 level=INFO source=ggml.go:638 msg="compute graph" backend=CPU buffer_type=CPU size="0 B"
Jun 29 20:22:46 Aidev numactl[115262]: time=2025-06-29T20:22:46.368-07:00 level=DEBUG source=ggml.go:620 msg="compute graph" nodes=1369 splits=3
Jun 29 20:22:46 Aidev numactl[115262]: time=2025-06-29T20:22:46.369-07:00 level=INFO source=ggml.go:638 msg="compute graph" backend=CUDA0 buffer_type=CUDA0 size="131.0 MiB"
Jun 29 20:22:46 Aidev numactl[115262]: time=2025-06-29T20:22:46.369-07:00 level=INFO source=ggml.go:638 msg="compute graph" backend=CUDA1 buffer_type=CUDA1 size="1.1 GiB"
Jun 29 20:22:46 Aidev numactl[115262]: time=2025-06-29T20:22:46.369-07:00 level=INFO source=ggml.go:638 msg="compute graph" backend=CPU buffer_type=CPU size="5.0 MiB"
Jun 29 20:22:46 Aidev numactl[115262]: time=2025-06-29T20:22:46.369-07:00 level=DEBUG source=runner.go:883 msg=memory allocated.InputWeights=1342504960A allocated.CPU.Graph=5242880A allocated.CUDA0.Weights="[53127168A 53127168A 53127168A 53127168A 53127168A 53127168A 53127168A 53127168A 53127168A 53127168A 53127168A 53127168A 53127168A 53127168A 53127168A 53127168A 53127168A 53127168A 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U]" allocated.CUDA0.Cache="[6291456A 6291456A 6291456A 6291456A 6291456A 33554432A 6291456A 6291456A 6291456A 6291456A 6291456A 33554432A 6291456A 6291456A 6291456A 6291456A 6291456A 33554432A 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U]" allocated.CUDA0.Graph=137371648A allocated.CUDA1.Weights="[0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 53127168A 53127168A 53127168A 53127168A 53127168A 53127168A 53127168A 53127168A 53127168A 53127168A 53127168A 53127168A 53127168A 53127168A 53127168A 53127168A 2193742976A]" allocated.CUDA1.Cache="[0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 6291456A 6291456A 6291456A 6291456A 6291456A 33554432A 6291456A 6291456A 6291456A 6291456A 6291456A 33554432A 6291456A 6291456A 6291456A 6291456A 0U]" allocated.CUDA1.Graph=1212612608A
Jun 29 20:22:46 Aidev numactl[115262]: time=2025-06-29T20:22:46.570-07:00 level=DEBUG source=server.go:636 msg="model load progress 0.12"
Jun 29 20:22:46 Aidev numactl[115262]: time=2025-06-29T20:22:46.822-07:00 level=DEBUG source=server.go:636 msg="model load progress 0.30"
Jun 29 20:22:47 Aidev numactl[115262]: time=2025-06-29T20:22:47.073-07:00 level=DEBUG source=server.go:636 msg="model load progress 0.41"
Jun 29 20:22:47 Aidev numactl[115262]: time=2025-06-29T20:22:47.324-07:00 level=DEBUG source=server.go:636 msg="model load progress 0.53"
Jun 29 20:22:47 Aidev numactl[115262]: time=2025-06-29T20:22:47.575-07:00 level=DEBUG source=server.go:636 msg="model load progress 0.64"
Jun 29 20:22:47 Aidev numactl[115262]: time=2025-06-29T20:22:47.826-07:00 level=DEBUG source=server.go:636 msg="model load progress 0.74"
Jun 29 20:22:48 Aidev numactl[115262]: time=2025-06-29T20:22:48.077-07:00 level=DEBUG source=server.go:636 msg="model load progress 0.78"
Jun 29 20:22:48 Aidev numactl[115262]: time=2025-06-29T20:22:48.328-07:00 level=DEBUG source=server.go:636 msg="model load progress 0.81"
Jun 29 20:22:48 Aidev numactl[115262]: time=2025-06-29T20:22:48.579-07:00 level=DEBUG source=server.go:636 msg="model load progress 0.85"
Jun 29 20:22:48 Aidev numactl[115262]: time=2025-06-29T20:22:48.830-07:00 level=DEBUG source=server.go:636 msg="model load progress 0.89"
Jun 29 20:22:49 Aidev numactl[115262]: time=2025-06-29T20:22:49.081-07:00 level=DEBUG source=server.go:636 msg="model load progress 0.93"
Jun 29 20:22:49 Aidev numactl[115262]: time=2025-06-29T20:22:49.331-07:00 level=DEBUG source=server.go:636 msg="model load progress 0.97"
Jun 29 20:22:49 Aidev numactl[115262]: time=2025-06-29T20:22:49.582-07:00 level=INFO source=server.go:630 msg="llama runner started in 3.76 seconds"
Jun 29 20:22:49 Aidev numactl[115262]: time=2025-06-29T20:22:49.582-07:00 level=DEBUG source=sched.go:495 msg="finished setting up" runner.name=registry.ollama.ai/library/hfs-ai-model-test:latest runner.inference=cuda runner.devices=2 runner.size="8.1 GiB" runner.vram="8.1 GiB" runner.parallel=1 runner.pid=115323 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-529850705c0884a283b87d3b261d36ee30821e16f0310962ba977b456ad3b8cd runner.num_ctx=8192
Jun 29 20:22:49 Aidev numactl[115262]: time=2025-06-29T20:22:49.743-07:00 level=DEBUG source=server.go:729 msg="completion request" images=0 prompt=3145 format=""
Jun 29 20:22:49 Aidev numactl[115262]: time=2025-06-29T20:22:49.902-07:00 level=DEBUG source=vocabulary.go:52 msg="adding bos token to prompt" id=[2]
Jun 29 20:22:49 Aidev numactl[115262]: time=2025-06-29T20:22:49.902-07:00 level=DEBUG source=cache.go:136 msg="loading cache slot" id=0 cache=0 prompt=764 used=0 remaining=764
Jun 29 20:22:52 Aidev numactl[115262]: [GIN] 2025/06/29 - 20:22:52 | 200 |  6.509760274s |       127.0.0.1 | POST     "/v1/chat/completions"
Jun 29 20:22:52 Aidev numactl[115262]: time=2025-06-29T20:22:52.781-07:00 level=DEBUG source=sched.go:503 msg="context for request finished"
Jun 29 20:22:52 Aidev numactl[115262]: time=2025-06-29T20:22:52.781-07:00 level=DEBUG source=sched.go:343 msg="runner with non-zero duration has gone idle, adding timer" runner.name=registry.ollama.ai/library/hfs-ai-model-test:latest runner.inference=cuda runner.devices=2 runner.size="8.1 GiB" runner.vram="8.1 GiB" runner.parallel=1 runner.pid=115323 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-529850705c0884a283b87d3b261d36ee30821e16f0310962ba977b456ad3b8cd runner.num_ctx=8192 duration=2562047h47m16.854775807s
Jun 29 20:22:52 Aidev numactl[115262]: time=2025-06-29T20:22:52.781-07:00 level=DEBUG source=sched.go:361 msg="after processing request finished event" runner.name=registry.ollama.ai/library/hfs-ai-model-test:latest runner.inference=cuda runner.devices=2 runner.size="8.1 GiB" runner.vram="8.1 GiB" runner.parallel=1 runner.pid=115323 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-529850705c0884a283b87d3b261d36ee30821e16f0310962ba977b456ad3b8cd runner.num_ctx=8192 refCount=0
Jun 29 20:22:52 Aidev numactl[115262]: [GIN] 2025/06/29 - 20:22:52 | 200 |      28.105µs |       127.0.0.1 | GET      "/"
Jun 29 20:23:02 Aidev numactl[115262]: [GIN] 2025/06/29 - 20:23:02 | 200 |      26.642µs |       127.0.0.1 | GET      "/"
Jun 29 20:23:02 Aidev numactl[115262]: time=2025-06-29T20:23:02.624-07:00 level=DEBUG source=ggml.go:155 msg="key not found" key=general.alignment default=32
Jun 29 20:23:02 Aidev numactl[115262]: time=2025-06-29T20:23:02.625-07:00 level=DEBUG source=sched.go:615 msg="evaluating already loaded" model=/usr/share/ollama/.ollama/models/blobs/sha256-529850705c0884a283b87d3b261d36ee30821e16f0310962ba977b456ad3b8cd
Jun 29 20:23:45 Aidev numactl[115262]: time=2025-06-29T20:23:45.176-07:00 level=DEBUG source=server.go:729 msg="completion request" images=0 prompt=18213 format=""
Jun 29 20:23:55 Aidev numactl[115262]: time=2025-06-29T20:23:55.303-07:00 level=DEBUG source=vocabulary.go:52 msg="adding bos token to prompt" id=[2]
Jun 29 20:23:55 Aidev numactl[115262]: time=2025-06-29T20:23:55.303-07:00 level=DEBUG source=cache.go:136 msg="loading cache slot" id=0 cache=792 prompt=5355 used=63 remaining=5292
Jun 29 20:23:56 Aidev numactl[115262]: time=2025-06-29T20:23:56.054-07:00 level=DEBUG source=causal.go:386 msg="defragmenting kv cache"
Jun 29 20:23:56 Aidev numactl[115262]: time=2025-06-29T20:23:56.342-07:00 level=DEBUG source=causal.go:386 msg="defragmenting kv cache"
Jun 29 20:23:56 Aidev numactl[115262]: time=2025-06-29T20:23:56.544-07:00 level=DEBUG source=causal.go:386 msg="defragmenting kv cache"
Jun 29 20:23:56 Aidev numactl[115262]: time=2025-06-29T20:23:56.892-07:00 level=DEBUG source=causal.go:386 msg="defragmenting kv cache"
Jun 29 20:23:57 Aidev numactl[115262]: time=2025-06-29T20:23:57.101-07:00 level=DEBUG source=causal.go:386 msg="defragmenting kv cache"

OS

Windows

GPU

Nvidia

CPU

Intel

Ollama version

Logs from 0.9.2 which I had downgraded to from 0.9.3 in trying to troubleshoot.

Originally created by @eowensai on GitHub (Jun 30, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/11240 ### What is the issue? **Environment:** Host OS: Windows 11 Pro WSL2 Distribution: Ubuntu 24.04 GPU: 2x NVIDIA GeForce RTX 3060 (12GB) NVIDIA Driver Version: 576.80 (on Windows Host) Model: gemma-3-4b-it-qat.gguf **What happened?** In this WSL2 environment, Ollama consistently uses the CPU for the initial prefill phase of prompt processing. For large prompts, this CPU-bound prefill can take several minutes to complete, during which nvidia-smi reports 0% GPU utilization (although the full model can be seen sharded onto the gpus). Once this phase is over, token generation correctly utilizes the GPUs and achieves expected performance. I've only seen this with Ollama running in the WSL2 environment, it is not reproducible when running Ollama natively on the Windows host. The behavior persists even when num_ctx is reduced from 128k to 8k. The issue also occurs in v0.9.3 (what I started with, going down to 0.9.2 was a troubleshooting step) Several environment and system configurations were also tested with no change in the prefill behavior, including: - Using numactl to set the memory interleave policy. - Explicitly setting CUDA_VISIBLE_DEVICES=0,1. - Forcing the KV cache type with OLLAMA_KV_CACHE_TYPE=f16 in an attempt to address a "kv cache type not supported" warning found in logs. - Attempting to resolve potential library path issues via symbolic links (ln -s) and by explicitly setting LD_LIBRARY_PATH. The core issue appears to be a failure within the Ollama runner process to initialize a GPU-compatible backend for the prefill task, forcing a fallback to the CPU, despite being able to use the GPU for the decoding task. **What did you expect to happen?** The prefill computation for all prompts should be executed on the GPU, leveraging the available VRAM and compute capabilities. CPU usage should remain low during prefill, and nvidia-smi should show high GPU utilization, resulting in fast prompt processing times, consistent with the performance of native Windows Ollama. ### Relevant log output ```shell Jun 29 20:22:04 Aidev systemd[1]: Stopping ollama.service - Ollama Service (v0.9.2 Manual Install)... Jun 29 20:22:04 Aidev numactl[97909]: time=2025-06-29T20:22:04.904-07:00 level=DEBUG source=sched.go:872 msg="shutting down runner" model=/usr/share/ollama/.ollama/models/blobs/sha256-529850705c0884a283b87d3b261d36ee30821e16f0310962ba977b456ad3b8cd Jun 29 20:22:04 Aidev numactl[97909]: time=2025-06-29T20:22:04.904-07:00 level=DEBUG source=server.go:1024 msg="stopping llama server" pid=110636 Jun 29 20:22:04 Aidev numactl[97909]: time=2025-06-29T20:22:04.904-07:00 level=DEBUG source=server.go:1030 msg="waiting for llama server to exit" pid=110636 Jun 29 20:22:04 Aidev numactl[97909]: time=2025-06-29T20:22:04.904-07:00 level=DEBUG source=sched.go:322 msg="shutting down scheduler completed loop" Jun 29 20:22:04 Aidev numactl[97909]: time=2025-06-29T20:22:04.904-07:00 level=DEBUG source=sched.go:122 msg="shutting down scheduler pending loop" Jun 29 20:22:05 Aidev numactl[97909]: time=2025-06-29T20:22:05.043-07:00 level=DEBUG source=server.go:1034 msg="llama server stopped" pid=110636 Jun 29 20:22:05 Aidev systemd[1]: ollama.service: Deactivated successfully. Jun 29 20:22:05 Aidev systemd[1]: Stopped ollama.service - Ollama Service (v0.9.2 Manual Install). Jun 29 20:22:05 Aidev systemd[1]: ollama.service: Consumed 17min 59.851s CPU time, 5.5G memory peak, 0B memory swap peak. Jun 29 20:22:05 Aidev systemd[1]: Started ollama.service - Ollama Service (v0.9.2 Manual Install). Jun 29 20:22:05 Aidev numactl[115262]: time=2025-06-29T20:22:05.110-07:00 level=INFO source=routes.go:1235 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:DEBUG OLLAMA_FLASH_ATTENTION:true OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:2562047h47m16.854775807s OLLAMA_KV_CACHE_TYPE:f16 OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/usr/share/ollama/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]" Jun 29 20:22:05 Aidev numactl[115262]: time=2025-06-29T20:22:05.111-07:00 level=INFO source=images.go:480 msg="total blobs: 8" Jun 29 20:22:05 Aidev numactl[115262]: time=2025-06-29T20:22:05.111-07:00 level=INFO source=images.go:487 msg="total unused blobs removed: 0" Jun 29 20:22:05 Aidev numactl[115262]: time=2025-06-29T20:22:05.111-07:00 level=INFO source=routes.go:1288 msg="Listening on 127.0.0.1:11434 (version 0.9.2)" Jun 29 20:22:05 Aidev numactl[115262]: time=2025-06-29T20:22:05.112-07:00 level=DEBUG source=sched.go:108 msg="starting llm scheduler" Jun 29 20:22:05 Aidev numactl[115262]: time=2025-06-29T20:22:05.112-07:00 level=INFO source=gpu.go:217 msg="looking for compatible GPUs" Jun 29 20:22:05 Aidev numactl[115262]: time=2025-06-29T20:22:05.112-07:00 level=DEBUG source=gpu.go:98 msg="searching for GPU discovery libraries for NVIDIA" Jun 29 20:22:05 Aidev numactl[115262]: time=2025-06-29T20:22:05.112-07:00 level=DEBUG source=gpu.go:501 msg="Searching for GPU library" name=libcuda.so* Jun 29 20:22:05 Aidev numactl[115262]: time=2025-06-29T20:22:05.112-07:00 level=DEBUG source=gpu.go:525 msg="gpu library search" globs="[/usr/local/lib/ollama/libcuda.so* /usr/share/ollama/libcuda.so* /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]" Jun 29 20:22:05 Aidev numactl[115262]: time=2025-06-29T20:22:05.940-07:00 level=DEBUG source=gpu.go:558 msg="discovered GPU libraries" paths="[/usr/lib/x86_64-linux-gnu/libcuda.so.535.230.02 /usr/lib/wsl/lib/libcuda.so /usr/lib/wsl/lib/libcuda.so.1 /usr/lib/wsl/lib/libcuda.so.1.1 /usr/lib/wsl/drivers/nv_dispig.inf_amd64_0afec3f2050014a0/libcuda.so.1.1 /usr/lib/wsl/drivers/nv_dispsi.inf_amd64_bc81edc432675578/libcuda.so.1.1]" Jun 29 20:22:05 Aidev numactl[115262]: initializing /usr/lib/x86_64-linux-gnu/libcuda.so.535.230.02 Jun 29 20:22:05 Aidev numactl[115262]: dlsym: cuInit - 0x73202d4c2470 Jun 29 20:22:05 Aidev numactl[115262]: dlsym: cuDriverGetVersion - 0x73202d4c2490 Jun 29 20:22:05 Aidev numactl[115262]: dlsym: cuDeviceGetCount - 0x73202d4c24d0 Jun 29 20:22:05 Aidev numactl[115262]: dlsym: cuDeviceGet - 0x73202d4c24b0 Jun 29 20:22:05 Aidev numactl[115262]: dlsym: cuDeviceGetAttribute - 0x73202d4c25b0 Jun 29 20:22:05 Aidev numactl[115262]: dlsym: cuDeviceGetUuid - 0x73202d4c2510 Jun 29 20:22:05 Aidev numactl[115262]: dlsym: cuDeviceGetName - 0x73202d4c24f0 Jun 29 20:22:05 Aidev numactl[115262]: dlsym: cuCtxCreate_v3 - 0x73202d4ca170 Jun 29 20:22:05 Aidev numactl[115262]: dlsym: cuMemGetInfo_v2 - 0x73202d4d5640 Jun 29 20:22:05 Aidev numactl[115262]: dlsym: cuCtxDestroy - 0x73202d524640 Jun 29 20:22:05 Aidev numactl[115262]: calling cuInit Jun 29 20:22:05 Aidev numactl[115262]: cuInit err: 100 Jun 29 20:22:05 Aidev numactl[115262]: time=2025-06-29T20:22:05.960-07:00 level=INFO source=gpu.go:602 msg="no nvidia devices detected by library /usr/lib/x86_64-linux-gnu/libcuda.so.535.230.02" Jun 29 20:22:05 Aidev numactl[115262]: initializing /usr/lib/wsl/lib/libcuda.so Jun 29 20:22:05 Aidev numactl[115262]: dlsym: cuInit - 0x73202ec21dd0 Jun 29 20:22:05 Aidev numactl[115262]: dlsym: cuDriverGetVersion - 0x73202ec21d90 Jun 29 20:22:05 Aidev numactl[115262]: dlsym: cuDeviceGetCount - 0x73202ec21e0d Jun 29 20:22:05 Aidev numactl[115262]: dlsym: cuDeviceGet - 0x73202ec21e07 Jun 29 20:22:05 Aidev numactl[115262]: dlsym: cuDeviceGetAttribute - 0x73202ec21cf0 Jun 29 20:22:05 Aidev numactl[115262]: dlsym: cuDeviceGetUuid - 0x73202ec21e19 Jun 29 20:22:05 Aidev numactl[115262]: dlsym: cuDeviceGetName - 0x73202ec21e13 Jun 29 20:22:05 Aidev numactl[115262]: dlsym: cuCtxCreate_v3 - 0x73202ec21e8b Jun 29 20:22:05 Aidev numactl[115262]: dlsym: cuMemGetInfo_v2 - 0x73202ec21f8d Jun 29 20:22:05 Aidev numactl[115262]: dlsym: cuCtxDestroy - 0x73202ec21e9d Jun 29 20:22:05 Aidev numactl[115262]: calling cuInit Jun 29 20:22:06 Aidev numactl[115262]: calling cuDriverGetVersion Jun 29 20:22:06 Aidev numactl[115262]: raw version 0x2f3a Jun 29 20:22:06 Aidev numactl[115262]: CUDA driver version: 12.9 Jun 29 20:22:06 Aidev numactl[115262]: calling cuDeviceGetCount Jun 29 20:22:06 Aidev numactl[115262]: device count 2 Jun 29 20:22:06 Aidev numactl[115262]: time=2025-06-29T20:22:06.011-07:00 level=DEBUG source=gpu.go:125 msg="detected GPUs" count=2 library=/usr/lib/wsl/lib/libcuda.so Jun 29 20:22:06 Aidev numactl[115262]: [GPU-c6676d00-cd95-9a0e-46b3-f92becc20fea] CUDA totalMem 12287mb Jun 29 20:22:06 Aidev numactl[115262]: [GPU-c6676d00-cd95-9a0e-46b3-f92becc20fea] CUDA freeMem 11247mb Jun 29 20:22:06 Aidev numactl[115262]: [GPU-c6676d00-cd95-9a0e-46b3-f92becc20fea] Compute Capability 8.6 Jun 29 20:22:06 Aidev numactl[115262]: [GPU-aa5c4c9d-7ba4-53f2-f3d2-9b12ddf79c6f] CUDA totalMem 12287mb Jun 29 20:22:06 Aidev numactl[115262]: [GPU-aa5c4c9d-7ba4-53f2-f3d2-9b12ddf79c6f] CUDA freeMem 11242mb Jun 29 20:22:06 Aidev numactl[115262]: [GPU-aa5c4c9d-7ba4-53f2-f3d2-9b12ddf79c6f] Compute Capability 8.6 Jun 29 20:22:06 Aidev numactl[115262]: time=2025-06-29T20:22:06.252-07:00 level=DEBUG source=amd_linux.go:419 msg="amdgpu driver not detected /sys/module/amdgpu" Jun 29 20:22:06 Aidev numactl[115262]: releasing cuda driver library Jun 29 20:22:06 Aidev numactl[115262]: time=2025-06-29T20:22:06.252-07:00 level=INFO source=types.go:130 msg="inference compute" id=GPU-c6676d00-cd95-9a0e-46b3-f92becc20fea library=cuda variant=v12 compute=8.6 driver=12.9 name="NVIDIA GeForce RTX 3060" total="12.0 GiB" available="11.0 GiB" Jun 29 20:22:06 Aidev numactl[115262]: time=2025-06-29T20:22:06.252-07:00 level=INFO source=types.go:130 msg="inference compute" id=GPU-aa5c4c9d-7ba4-53f2-f3d2-9b12ddf79c6f library=cuda variant=v12 compute=8.6 driver=12.9 name="NVIDIA GeForce RTX 3060" total="12.0 GiB" available="11.0 GiB" Jun 29 20:22:39 Aidev numactl[115262]: [GIN] 2025/06/29 - 20:22:39 | 200 | 81.082µs | 127.0.0.1 | GET "/" Jun 29 20:22:44 Aidev numactl[115262]: [GIN] 2025/06/29 - 20:22:44 | 200 | 22.772µs | 127.0.0.1 | GET "/" Jun 29 20:22:44 Aidev numactl[115262]: [GIN] 2025/06/29 - 20:22:44 | 200 | 19.863µs | 127.0.0.1 | GET "/" Jun 29 20:22:44 Aidev numactl[115262]: time=2025-06-29T20:22:44.166-07:00 level=DEBUG source=ggml.go:155 msg="key not found" key=general.alignment default=32 Jun 29 20:22:44 Aidev numactl[115262]: time=2025-06-29T20:22:44.166-07:00 level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="31.2 GiB" before.free="30.0 GiB" before.free_swap="8.0 GiB" now.total="31.2 GiB" now.free="30.0 GiB" now.free_swap="8.0 GiB" Jun 29 20:22:44 Aidev numactl[115262]: initializing /usr/lib/wsl/lib/libcuda.so Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuInit - 0x73202ec21dd0 Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuDriverGetVersion - 0x73202ec21d90 Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuDeviceGetCount - 0x73202ec21e0d Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuDeviceGet - 0x73202ec21e07 Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuDeviceGetAttribute - 0x73202ec21cf0 Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuDeviceGetUuid - 0x73202ec21e19 Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuDeviceGetName - 0x73202ec21e13 Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuCtxCreate_v3 - 0x73202ec21e8b Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuMemGetInfo_v2 - 0x73202ec21f8d Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuCtxDestroy - 0x73202ec21e9d Jun 29 20:22:44 Aidev numactl[115262]: calling cuInit Jun 29 20:22:44 Aidev numactl[115262]: calling cuDriverGetVersion Jun 29 20:22:44 Aidev numactl[115262]: raw version 0x2f3a Jun 29 20:22:44 Aidev numactl[115262]: CUDA driver version: 12.9 Jun 29 20:22:44 Aidev numactl[115262]: calling cuDeviceGetCount Jun 29 20:22:44 Aidev numactl[115262]: device count 2 Jun 29 20:22:44 Aidev numactl[115262]: time=2025-06-29T20:22:44.265-07:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-c6676d00-cd95-9a0e-46b3-f92becc20fea name="NVIDIA GeForce RTX 3060" overhead="0 B" before.total="12.0 GiB" before.free="11.0 GiB" now.total="12.0 GiB" now.free="11.0 GiB" now.used="1.0 GiB" Jun 29 20:22:44 Aidev numactl[115262]: time=2025-06-29T20:22:44.373-07:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-aa5c4c9d-7ba4-53f2-f3d2-9b12ddf79c6f name="NVIDIA GeForce RTX 3060" overhead="0 B" before.total="12.0 GiB" before.free="11.0 GiB" now.total="12.0 GiB" now.free="11.0 GiB" now.used="1.0 GiB" Jun 29 20:22:44 Aidev numactl[115262]: releasing cuda driver library Jun 29 20:22:44 Aidev numactl[115262]: time=2025-06-29T20:22:44.373-07:00 level=DEBUG source=sched.go:185 msg="updating default concurrency" OLLAMA_MAX_LOADED_MODELS=6 gpu_count=2 Jun 29 20:22:44 Aidev numactl[115262]: time=2025-06-29T20:22:44.388-07:00 level=DEBUG source=ggml.go:155 msg="key not found" key=general.alignment default=32 Jun 29 20:22:44 Aidev numactl[115262]: time=2025-06-29T20:22:44.402-07:00 level=DEBUG source=ggml.go:155 msg="key not found" key=general.alignment default=32 Jun 29 20:22:44 Aidev numactl[115262]: time=2025-06-29T20:22:44.403-07:00 level=DEBUG source=sched.go:228 msg="loading first model" model=/usr/share/ollama/.ollama/models/blobs/sha256-529850705c0884a283b87d3b261d36ee30821e16f0310962ba977b456ad3b8cd Jun 29 20:22:44 Aidev numactl[115262]: time=2025-06-29T20:22:44.403-07:00 level=DEBUG source=memory.go:111 msg=evaluating library=cuda gpu_count=1 available="[11.0 GiB]" Jun 29 20:22:44 Aidev numactl[115262]: time=2025-06-29T20:22:44.404-07:00 level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="31.2 GiB" before.free="30.0 GiB" before.free_swap="8.0 GiB" now.total="31.2 GiB" now.free="30.0 GiB" now.free_swap="8.0 GiB" Jun 29 20:22:44 Aidev numactl[115262]: initializing /usr/lib/wsl/lib/libcuda.so Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuInit - 0x73202ec21dd0 Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuDriverGetVersion - 0x73202ec21d90 Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuDeviceGetCount - 0x73202ec21e0d Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuDeviceGet - 0x73202ec21e07 Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuDeviceGetAttribute - 0x73202ec21cf0 Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuDeviceGetUuid - 0x73202ec21e19 Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuDeviceGetName - 0x73202ec21e13 Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuCtxCreate_v3 - 0x73202ec21e8b Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuMemGetInfo_v2 - 0x73202ec21f8d Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuCtxDestroy - 0x73202ec21e9d Jun 29 20:22:44 Aidev numactl[115262]: calling cuInit Jun 29 20:22:44 Aidev numactl[115262]: calling cuDriverGetVersion Jun 29 20:22:44 Aidev numactl[115262]: raw version 0x2f3a Jun 29 20:22:44 Aidev numactl[115262]: CUDA driver version: 12.9 Jun 29 20:22:44 Aidev numactl[115262]: calling cuDeviceGetCount Jun 29 20:22:44 Aidev numactl[115262]: device count 2 Jun 29 20:22:44 Aidev numactl[115262]: time=2025-06-29T20:22:44.505-07:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-c6676d00-cd95-9a0e-46b3-f92becc20fea name="NVIDIA GeForce RTX 3060" overhead="0 B" before.total="12.0 GiB" before.free="11.0 GiB" now.total="12.0 GiB" now.free="11.0 GiB" now.used="1.0 GiB" Jun 29 20:22:44 Aidev numactl[115262]: time=2025-06-29T20:22:44.596-07:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-aa5c4c9d-7ba4-53f2-f3d2-9b12ddf79c6f name="NVIDIA GeForce RTX 3060" overhead="0 B" before.total="12.0 GiB" before.free="11.0 GiB" now.total="12.0 GiB" now.free="11.0 GiB" now.used="1.0 GiB" Jun 29 20:22:44 Aidev numactl[115262]: releasing cuda driver library Jun 29 20:22:44 Aidev numactl[115262]: time=2025-06-29T20:22:44.597-07:00 level=DEBUG source=memory.go:111 msg=evaluating library=cuda gpu_count=1 available="[11.0 GiB]" Jun 29 20:22:44 Aidev numactl[115262]: time=2025-06-29T20:22:44.597-07:00 level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="31.2 GiB" before.free="30.0 GiB" before.free_swap="8.0 GiB" now.total="31.2 GiB" now.free="30.0 GiB" now.free_swap="8.0 GiB" Jun 29 20:22:44 Aidev numactl[115262]: initializing /usr/lib/wsl/lib/libcuda.so Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuInit - 0x73202ec21dd0 Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuDriverGetVersion - 0x73202ec21d90 Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuDeviceGetCount - 0x73202ec21e0d Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuDeviceGet - 0x73202ec21e07 Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuDeviceGetAttribute - 0x73202ec21cf0 Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuDeviceGetUuid - 0x73202ec21e19 Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuDeviceGetName - 0x73202ec21e13 Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuCtxCreate_v3 - 0x73202ec21e8b Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuMemGetInfo_v2 - 0x73202ec21f8d Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuCtxDestroy - 0x73202ec21e9d Jun 29 20:22:44 Aidev numactl[115262]: calling cuInit Jun 29 20:22:44 Aidev numactl[115262]: calling cuDriverGetVersion Jun 29 20:22:44 Aidev numactl[115262]: raw version 0x2f3a Jun 29 20:22:44 Aidev numactl[115262]: CUDA driver version: 12.9 Jun 29 20:22:44 Aidev numactl[115262]: calling cuDeviceGetCount Jun 29 20:22:44 Aidev numactl[115262]: device count 2 Jun 29 20:22:44 Aidev numactl[115262]: time=2025-06-29T20:22:44.686-07:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-c6676d00-cd95-9a0e-46b3-f92becc20fea name="NVIDIA GeForce RTX 3060" overhead="0 B" before.total="12.0 GiB" before.free="11.0 GiB" now.total="12.0 GiB" now.free="11.0 GiB" now.used="1.0 GiB" Jun 29 20:22:44 Aidev numactl[115262]: time=2025-06-29T20:22:44.776-07:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-aa5c4c9d-7ba4-53f2-f3d2-9b12ddf79c6f name="NVIDIA GeForce RTX 3060" overhead="0 B" before.total="12.0 GiB" before.free="11.0 GiB" now.total="12.0 GiB" now.free="11.0 GiB" now.used="1.0 GiB" Jun 29 20:22:44 Aidev numactl[115262]: releasing cuda driver library Jun 29 20:22:44 Aidev numactl[115262]: time=2025-06-29T20:22:44.776-07:00 level=DEBUG source=memory.go:111 msg=evaluating library=cuda gpu_count=1 available="[11.0 GiB]" Jun 29 20:22:44 Aidev numactl[115262]: time=2025-06-29T20:22:44.777-07:00 level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="31.2 GiB" before.free="30.0 GiB" before.free_swap="8.0 GiB" now.total="31.2 GiB" now.free="30.0 GiB" now.free_swap="8.0 GiB" Jun 29 20:22:44 Aidev numactl[115262]: initializing /usr/lib/wsl/lib/libcuda.so Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuInit - 0x73202ec21dd0 Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuDriverGetVersion - 0x73202ec21d90 Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuDeviceGetCount - 0x73202ec21e0d Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuDeviceGet - 0x73202ec21e07 Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuDeviceGetAttribute - 0x73202ec21cf0 Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuDeviceGetUuid - 0x73202ec21e19 Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuDeviceGetName - 0x73202ec21e13 Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuCtxCreate_v3 - 0x73202ec21e8b Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuMemGetInfo_v2 - 0x73202ec21f8d Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuCtxDestroy - 0x73202ec21e9d Jun 29 20:22:44 Aidev numactl[115262]: calling cuInit Jun 29 20:22:44 Aidev numactl[115262]: calling cuDriverGetVersion Jun 29 20:22:44 Aidev numactl[115262]: raw version 0x2f3a Jun 29 20:22:44 Aidev numactl[115262]: CUDA driver version: 12.9 Jun 29 20:22:44 Aidev numactl[115262]: calling cuDeviceGetCount Jun 29 20:22:44 Aidev numactl[115262]: device count 2 Jun 29 20:22:44 Aidev numactl[115262]: time=2025-06-29T20:22:44.865-07:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-c6676d00-cd95-9a0e-46b3-f92becc20fea name="NVIDIA GeForce RTX 3060" overhead="0 B" before.total="12.0 GiB" before.free="11.0 GiB" now.total="12.0 GiB" now.free="11.0 GiB" now.used="1.0 GiB" Jun 29 20:22:44 Aidev numactl[115262]: time=2025-06-29T20:22:44.957-07:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-aa5c4c9d-7ba4-53f2-f3d2-9b12ddf79c6f name="NVIDIA GeForce RTX 3060" overhead="0 B" before.total="12.0 GiB" before.free="11.0 GiB" now.total="12.0 GiB" now.free="11.0 GiB" now.used="1.0 GiB" Jun 29 20:22:44 Aidev numactl[115262]: releasing cuda driver library Jun 29 20:22:44 Aidev numactl[115262]: time=2025-06-29T20:22:44.958-07:00 level=DEBUG source=memory.go:111 msg=evaluating library=cuda gpu_count=1 available="[11.0 GiB]" Jun 29 20:22:44 Aidev numactl[115262]: time=2025-06-29T20:22:44.958-07:00 level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="31.2 GiB" before.free="30.0 GiB" before.free_swap="8.0 GiB" now.total="31.2 GiB" now.free="30.0 GiB" now.free_swap="8.0 GiB" Jun 29 20:22:44 Aidev numactl[115262]: initializing /usr/lib/wsl/lib/libcuda.so Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuInit - 0x73202ec21dd0 Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuDriverGetVersion - 0x73202ec21d90 Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuDeviceGetCount - 0x73202ec21e0d Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuDeviceGet - 0x73202ec21e07 Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuDeviceGetAttribute - 0x73202ec21cf0 Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuDeviceGetUuid - 0x73202ec21e19 Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuDeviceGetName - 0x73202ec21e13 Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuCtxCreate_v3 - 0x73202ec21e8b Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuMemGetInfo_v2 - 0x73202ec21f8d Jun 29 20:22:44 Aidev numactl[115262]: dlsym: cuCtxDestroy - 0x73202ec21e9d Jun 29 20:22:44 Aidev numactl[115262]: calling cuInit Jun 29 20:22:44 Aidev numactl[115262]: calling cuDriverGetVersion Jun 29 20:22:44 Aidev numactl[115262]: raw version 0x2f3a Jun 29 20:22:44 Aidev numactl[115262]: CUDA driver version: 12.9 Jun 29 20:22:44 Aidev numactl[115262]: calling cuDeviceGetCount Jun 29 20:22:44 Aidev numactl[115262]: device count 2 Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.046-07:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-c6676d00-cd95-9a0e-46b3-f92becc20fea name="NVIDIA GeForce RTX 3060" overhead="0 B" before.total="12.0 GiB" before.free="11.0 GiB" now.total="12.0 GiB" now.free="11.0 GiB" now.used="1.0 GiB" Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.131-07:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-aa5c4c9d-7ba4-53f2-f3d2-9b12ddf79c6f name="NVIDIA GeForce RTX 3060" overhead="0 B" before.total="12.0 GiB" before.free="11.0 GiB" now.total="12.0 GiB" now.free="11.0 GiB" now.used="1.0 GiB" Jun 29 20:22:45 Aidev numactl[115262]: releasing cuda driver library Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.132-07:00 level=DEBUG source=memory.go:111 msg=evaluating library=cuda gpu_count=2 available="[11.0 GiB 11.0 GiB]" Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.133-07:00 level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="31.2 GiB" before.free="30.0 GiB" before.free_swap="8.0 GiB" now.total="31.2 GiB" now.free="30.0 GiB" now.free_swap="8.0 GiB" Jun 29 20:22:45 Aidev numactl[115262]: initializing /usr/lib/wsl/lib/libcuda.so Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuInit - 0x73202ec21dd0 Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuDriverGetVersion - 0x73202ec21d90 Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuDeviceGetCount - 0x73202ec21e0d Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuDeviceGet - 0x73202ec21e07 Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuDeviceGetAttribute - 0x73202ec21cf0 Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuDeviceGetUuid - 0x73202ec21e19 Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuDeviceGetName - 0x73202ec21e13 Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuCtxCreate_v3 - 0x73202ec21e8b Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuMemGetInfo_v2 - 0x73202ec21f8d Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuCtxDestroy - 0x73202ec21e9d Jun 29 20:22:45 Aidev numactl[115262]: calling cuInit Jun 29 20:22:45 Aidev numactl[115262]: calling cuDriverGetVersion Jun 29 20:22:45 Aidev numactl[115262]: raw version 0x2f3a Jun 29 20:22:45 Aidev numactl[115262]: CUDA driver version: 12.9 Jun 29 20:22:45 Aidev numactl[115262]: calling cuDeviceGetCount Jun 29 20:22:45 Aidev numactl[115262]: device count 2 Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.213-07:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-c6676d00-cd95-9a0e-46b3-f92becc20fea name="NVIDIA GeForce RTX 3060" overhead="0 B" before.total="12.0 GiB" before.free="11.0 GiB" now.total="12.0 GiB" now.free="11.0 GiB" now.used="1.0 GiB" Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.298-07:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-aa5c4c9d-7ba4-53f2-f3d2-9b12ddf79c6f name="NVIDIA GeForce RTX 3060" overhead="0 B" before.total="12.0 GiB" before.free="11.0 GiB" now.total="12.0 GiB" now.free="11.0 GiB" now.used="1.0 GiB" Jun 29 20:22:45 Aidev numactl[115262]: releasing cuda driver library Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.298-07:00 level=DEBUG source=memory.go:111 msg=evaluating library=cuda gpu_count=2 available="[11.0 GiB 11.0 GiB]" Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.299-07:00 level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="31.2 GiB" before.free="30.0 GiB" before.free_swap="8.0 GiB" now.total="31.2 GiB" now.free="30.0 GiB" now.free_swap="8.0 GiB" Jun 29 20:22:45 Aidev numactl[115262]: initializing /usr/lib/wsl/lib/libcuda.so Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuInit - 0x73202ec21dd0 Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuDriverGetVersion - 0x73202ec21d90 Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuDeviceGetCount - 0x73202ec21e0d Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuDeviceGet - 0x73202ec21e07 Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuDeviceGetAttribute - 0x73202ec21cf0 Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuDeviceGetUuid - 0x73202ec21e19 Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuDeviceGetName - 0x73202ec21e13 Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuCtxCreate_v3 - 0x73202ec21e8b Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuMemGetInfo_v2 - 0x73202ec21f8d Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuCtxDestroy - 0x73202ec21e9d Jun 29 20:22:45 Aidev numactl[115262]: calling cuInit Jun 29 20:22:45 Aidev numactl[115262]: calling cuDriverGetVersion Jun 29 20:22:45 Aidev numactl[115262]: raw version 0x2f3a Jun 29 20:22:45 Aidev numactl[115262]: CUDA driver version: 12.9 Jun 29 20:22:45 Aidev numactl[115262]: calling cuDeviceGetCount Jun 29 20:22:45 Aidev numactl[115262]: device count 2 Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.381-07:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-c6676d00-cd95-9a0e-46b3-f92becc20fea name="NVIDIA GeForce RTX 3060" overhead="0 B" before.total="12.0 GiB" before.free="11.0 GiB" now.total="12.0 GiB" now.free="11.0 GiB" now.used="1.0 GiB" Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.465-07:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-aa5c4c9d-7ba4-53f2-f3d2-9b12ddf79c6f name="NVIDIA GeForce RTX 3060" overhead="0 B" before.total="12.0 GiB" before.free="11.0 GiB" now.total="12.0 GiB" now.free="11.0 GiB" now.used="1.0 GiB" Jun 29 20:22:45 Aidev numactl[115262]: releasing cuda driver library Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.466-07:00 level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="31.2 GiB" before.free="30.0 GiB" before.free_swap="8.0 GiB" now.total="31.2 GiB" now.free="30.0 GiB" now.free_swap="8.0 GiB" Jun 29 20:22:45 Aidev numactl[115262]: initializing /usr/lib/wsl/lib/libcuda.so Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuInit - 0x73202ec21dd0 Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuDriverGetVersion - 0x73202ec21d90 Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuDeviceGetCount - 0x73202ec21e0d Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuDeviceGet - 0x73202ec21e07 Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuDeviceGetAttribute - 0x73202ec21cf0 Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuDeviceGetUuid - 0x73202ec21e19 Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuDeviceGetName - 0x73202ec21e13 Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuCtxCreate_v3 - 0x73202ec21e8b Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuMemGetInfo_v2 - 0x73202ec21f8d Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuCtxDestroy - 0x73202ec21e9d Jun 29 20:22:45 Aidev numactl[115262]: calling cuInit Jun 29 20:22:45 Aidev numactl[115262]: calling cuDriverGetVersion Jun 29 20:22:45 Aidev numactl[115262]: raw version 0x2f3a Jun 29 20:22:45 Aidev numactl[115262]: CUDA driver version: 12.9 Jun 29 20:22:45 Aidev numactl[115262]: calling cuDeviceGetCount Jun 29 20:22:45 Aidev numactl[115262]: device count 2 Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.543-07:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-c6676d00-cd95-9a0e-46b3-f92becc20fea name="NVIDIA GeForce RTX 3060" overhead="0 B" before.total="12.0 GiB" before.free="11.0 GiB" now.total="12.0 GiB" now.free="11.0 GiB" now.used="1.0 GiB" Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.633-07:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-aa5c4c9d-7ba4-53f2-f3d2-9b12ddf79c6f name="NVIDIA GeForce RTX 3060" overhead="0 B" before.total="12.0 GiB" before.free="11.0 GiB" now.total="12.0 GiB" now.free="11.0 GiB" now.used="1.0 GiB" Jun 29 20:22:45 Aidev numactl[115262]: releasing cuda driver library Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.633-07:00 level=INFO source=server.go:135 msg="system memory" total="31.2 GiB" free="30.0 GiB" free_swap="8.0 GiB" Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.633-07:00 level=DEBUG source=memory.go:111 msg=evaluating library=cuda gpu_count=2 available="[11.0 GiB 11.0 GiB]" Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.633-07:00 level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="31.2 GiB" before.free="30.0 GiB" before.free_swap="8.0 GiB" now.total="31.2 GiB" now.free="30.0 GiB" now.free_swap="8.0 GiB" Jun 29 20:22:45 Aidev numactl[115262]: initializing /usr/lib/wsl/lib/libcuda.so Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuInit - 0x73202ec21dd0 Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuDriverGetVersion - 0x73202ec21d90 Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuDeviceGetCount - 0x73202ec21e0d Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuDeviceGet - 0x73202ec21e07 Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuDeviceGetAttribute - 0x73202ec21cf0 Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuDeviceGetUuid - 0x73202ec21e19 Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuDeviceGetName - 0x73202ec21e13 Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuCtxCreate_v3 - 0x73202ec21e8b Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuMemGetInfo_v2 - 0x73202ec21f8d Jun 29 20:22:45 Aidev numactl[115262]: dlsym: cuCtxDestroy - 0x73202ec21e9d Jun 29 20:22:45 Aidev numactl[115262]: calling cuInit Jun 29 20:22:45 Aidev numactl[115262]: calling cuDriverGetVersion Jun 29 20:22:45 Aidev numactl[115262]: raw version 0x2f3a Jun 29 20:22:45 Aidev numactl[115262]: CUDA driver version: 12.9 Jun 29 20:22:45 Aidev numactl[115262]: calling cuDeviceGetCount Jun 29 20:22:45 Aidev numactl[115262]: device count 2 Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.710-07:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-c6676d00-cd95-9a0e-46b3-f92becc20fea name="NVIDIA GeForce RTX 3060" overhead="0 B" before.total="12.0 GiB" before.free="11.0 GiB" now.total="12.0 GiB" now.free="11.0 GiB" now.used="1.0 GiB" Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.792-07:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-aa5c4c9d-7ba4-53f2-f3d2-9b12ddf79c6f name="NVIDIA GeForce RTX 3060" overhead="0 B" before.total="12.0 GiB" before.free="11.0 GiB" now.total="12.0 GiB" now.free="11.0 GiB" now.used="1.0 GiB" Jun 29 20:22:45 Aidev numactl[115262]: releasing cuda driver library Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.793-07:00 level=INFO source=server.go:168 msg=offload library=cuda layers.requested=999 layers.model=35 layers.offload=35 layers.split=18,17 memory.available="[11.0 GiB 11.0 GiB]" memory.gpu_overhead="0 B" memory.required.full="8.1 GiB" memory.required.partial="8.1 GiB" memory.required.kv="334.0 MiB" memory.required.allocations="[5.5 GiB 2.6 GiB]" memory.weights.total="2.9 GiB" memory.weights.repeating="1.7 GiB" memory.weights.nonrepeating="1.3 GiB" memory.graph.full="1.0 GiB" memory.graph.partial="1.0 GiB" projector.weights="806.2 MiB" projector.graph="1.0 GiB" Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.793-07:00 level=INFO source=server.go:211 msg="enabling flash attention" Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.793-07:00 level=DEBUG source=server.go:284 msg="compatible gpu libraries" compatible=[] Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.814-07:00 level=DEBUG source=ggml.go:155 msg="key not found" key=general.alignment default=32 Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.815-07:00 level=DEBUG source=ggml.go:155 msg="key not found" key=tokenizer.ggml.eot_token_id default=106 Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.815-07:00 level=DEBUG source=ggml.go:155 msg="key not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}" Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.816-07:00 level=DEBUG source=ggml.go:155 msg="key not found" key=gemma3.rope.local.freq_base default=10000 Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.816-07:00 level=DEBUG source=ggml.go:155 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06 Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.816-07:00 level=DEBUG source=ggml.go:155 msg="key not found" key=gemma3.rope.freq_scale default=1 Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.816-07:00 level=DEBUG source=ggml.go:155 msg="key not found" key=gemma3.mm_tokens_per_image default=256 Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.817-07:00 level=INFO source=server.go:431 msg="starting llama server" cmd="/usr/local/bin/ollama runner --ollama-engine --model /usr/share/ollama/.ollama/models/blobs/sha256-529850705c0884a283b87d3b261d36ee30821e16f0310962ba977b456ad3b8cd --ctx-size 8192 --batch-size 512 --n-gpu-layers 999 --threads 10 --flash-attn --kv-cache-type f16 --parallel 1 --tensor-split 18,17 --port 38963" Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.817-07:00 level=DEBUG source=server.go:432 msg=subprocess PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/snap/bin OLLAMA_KEEP_ALIVE=-1 OLLAMA_FLASH_ATTENTION=1 OLLAMA_KV_CACHE_TYPE=f16 LD_LIBRARY_PATH=/usr/local/lib/ollama:/usr/share/ollama:/usr/local/lib/ollama OLLAMA_DEBUG=1 OLLAMA_MAX_LOADED_MODELS=6 OLLAMA_LIBRARY_PATH=/usr/local/lib/ollama CUDA_VISIBLE_DEVICES=GPU-c6676d00-cd95-9a0e-46b3-f92becc20fea,GPU-aa5c4c9d-7ba4-53f2-f3d2-9b12ddf79c6f Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.817-07:00 level=INFO source=sched.go:483 msg="loaded runners" count=1 Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.817-07:00 level=INFO source=server.go:591 msg="waiting for llama runner to start responding" Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.818-07:00 level=INFO source=server.go:625 msg="waiting for server to become available" status="llm server not responding" Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.828-07:00 level=INFO source=runner.go:925 msg="starting ollama engine" Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.831-07:00 level=INFO source=runner.go:983 msg="Server listening on 127.0.0.1:38963" Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.853-07:00 level=DEBUG source=ggml.go:155 msg="key not found" key=general.alignment default=32 Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.853-07:00 level=DEBUG source=ggml.go:155 msg="key not found" key=general.description default="" Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.853-07:00 level=INFO source=ggml.go:92 msg="" architecture=gemma3 file_type=Q4_0 name="Gemma3 4b It Qa_0 Qat Hf" description="" num_tensors=883 num_key_values=42 Jun 29 20:22:45 Aidev numactl[115262]: time=2025-06-29T20:22:45.853-07:00 level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/local/lib/ollama Jun 29 20:22:45 Aidev numactl[115262]: ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no Jun 29 20:22:45 Aidev numactl[115262]: ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no Jun 29 20:22:45 Aidev numactl[115262]: ggml_cuda_init: found 2 CUDA devices: Jun 29 20:22:45 Aidev numactl[115262]: Device 0: NVIDIA GeForce RTX 3060, compute capability 8.6, VMM: yes Jun 29 20:22:45 Aidev numactl[115262]: Device 1: NVIDIA GeForce RTX 3060, compute capability 8.6, VMM: yes Jun 29 20:22:46 Aidev numactl[115262]: load_backend: loaded CUDA backend from /usr/local/lib/ollama/libggml-cuda.so Jun 29 20:22:46 Aidev numactl[115262]: load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-alderlake.so Jun 29 20:22:46 Aidev numactl[115262]: time=2025-06-29T20:22:46.020-07:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX_VNNI=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=500,600,610,700,750,800,860,870,890,900,1200 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 CUDA.1.ARCHS=500,600,610,700,750,800,860,870,890,900,1200 CUDA.1.USE_GRAPHS=1 CUDA.1.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc) Jun 29 20:22:46 Aidev numactl[115262]: time=2025-06-29T20:22:46.069-07:00 level=INFO source=server.go:625 msg="waiting for server to become available" status="llm server loading model" Jun 29 20:22:46 Aidev numactl[115262]: time=2025-06-29T20:22:46.117-07:00 level=INFO source=ggml.go:351 msg="model weights" buffer=CUDA0 size="912.0 MiB" Jun 29 20:22:46 Aidev numactl[115262]: time=2025-06-29T20:22:46.117-07:00 level=INFO source=ggml.go:351 msg="model weights" buffer=CUDA1 size="2.8 GiB" Jun 29 20:22:46 Aidev numactl[115262]: time=2025-06-29T20:22:46.117-07:00 level=INFO source=ggml.go:351 msg="model weights" buffer=CPU size="1.3 GiB" Jun 29 20:22:46 Aidev numactl[115262]: time=2025-06-29T20:22:46.118-07:00 level=DEBUG source=ggml.go:155 msg="key not found" key=tokenizer.ggml.eot_token_id default=106 Jun 29 20:22:46 Aidev numactl[115262]: time=2025-06-29T20:22:46.118-07:00 level=DEBUG source=ggml.go:155 msg="key not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}" Jun 29 20:22:46 Aidev numactl[115262]: time=2025-06-29T20:22:46.119-07:00 level=DEBUG source=ggml.go:155 msg="key not found" key=gemma3.rope.local.freq_base default=10000 Jun 29 20:22:46 Aidev numactl[115262]: time=2025-06-29T20:22:46.119-07:00 level=DEBUG source=ggml.go:155 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06 Jun 29 20:22:46 Aidev numactl[115262]: time=2025-06-29T20:22:46.119-07:00 level=DEBUG source=ggml.go:155 msg="key not found" key=gemma3.rope.freq_scale default=1 Jun 29 20:22:46 Aidev numactl[115262]: time=2025-06-29T20:22:46.119-07:00 level=DEBUG source=ggml.go:155 msg="key not found" key=gemma3.mm_tokens_per_image default=256 Jun 29 20:22:46 Aidev numactl[115262]: time=2025-06-29T20:22:46.319-07:00 level=DEBUG source=ggml.go:620 msg="compute graph" nodes=972 splits=1 Jun 29 20:22:46 Aidev numactl[115262]: time=2025-06-29T20:22:46.319-07:00 level=INFO source=ggml.go:638 msg="compute graph" backend=CUDA0 buffer_type=CUDA0 size="0 B" Jun 29 20:22:46 Aidev numactl[115262]: time=2025-06-29T20:22:46.319-07:00 level=INFO source=ggml.go:638 msg="compute graph" backend=CUDA1 buffer_type=CUDA1 size="1.1 GiB" Jun 29 20:22:46 Aidev numactl[115262]: time=2025-06-29T20:22:46.319-07:00 level=INFO source=ggml.go:638 msg="compute graph" backend=CPU buffer_type=CPU size="0 B" Jun 29 20:22:46 Aidev numactl[115262]: time=2025-06-29T20:22:46.368-07:00 level=DEBUG source=ggml.go:620 msg="compute graph" nodes=1369 splits=3 Jun 29 20:22:46 Aidev numactl[115262]: time=2025-06-29T20:22:46.369-07:00 level=INFO source=ggml.go:638 msg="compute graph" backend=CUDA0 buffer_type=CUDA0 size="131.0 MiB" Jun 29 20:22:46 Aidev numactl[115262]: time=2025-06-29T20:22:46.369-07:00 level=INFO source=ggml.go:638 msg="compute graph" backend=CUDA1 buffer_type=CUDA1 size="1.1 GiB" Jun 29 20:22:46 Aidev numactl[115262]: time=2025-06-29T20:22:46.369-07:00 level=INFO source=ggml.go:638 msg="compute graph" backend=CPU buffer_type=CPU size="5.0 MiB" Jun 29 20:22:46 Aidev numactl[115262]: time=2025-06-29T20:22:46.369-07:00 level=DEBUG source=runner.go:883 msg=memory allocated.InputWeights=1342504960A allocated.CPU.Graph=5242880A allocated.CUDA0.Weights="[53127168A 53127168A 53127168A 53127168A 53127168A 53127168A 53127168A 53127168A 53127168A 53127168A 53127168A 53127168A 53127168A 53127168A 53127168A 53127168A 53127168A 53127168A 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U]" allocated.CUDA0.Cache="[6291456A 6291456A 6291456A 6291456A 6291456A 33554432A 6291456A 6291456A 6291456A 6291456A 6291456A 33554432A 6291456A 6291456A 6291456A 6291456A 6291456A 33554432A 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U]" allocated.CUDA0.Graph=137371648A allocated.CUDA1.Weights="[0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 53127168A 53127168A 53127168A 53127168A 53127168A 53127168A 53127168A 53127168A 53127168A 53127168A 53127168A 53127168A 53127168A 53127168A 53127168A 53127168A 2193742976A]" allocated.CUDA1.Cache="[0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 6291456A 6291456A 6291456A 6291456A 6291456A 33554432A 6291456A 6291456A 6291456A 6291456A 6291456A 33554432A 6291456A 6291456A 6291456A 6291456A 0U]" allocated.CUDA1.Graph=1212612608A Jun 29 20:22:46 Aidev numactl[115262]: time=2025-06-29T20:22:46.570-07:00 level=DEBUG source=server.go:636 msg="model load progress 0.12" Jun 29 20:22:46 Aidev numactl[115262]: time=2025-06-29T20:22:46.822-07:00 level=DEBUG source=server.go:636 msg="model load progress 0.30" Jun 29 20:22:47 Aidev numactl[115262]: time=2025-06-29T20:22:47.073-07:00 level=DEBUG source=server.go:636 msg="model load progress 0.41" Jun 29 20:22:47 Aidev numactl[115262]: time=2025-06-29T20:22:47.324-07:00 level=DEBUG source=server.go:636 msg="model load progress 0.53" Jun 29 20:22:47 Aidev numactl[115262]: time=2025-06-29T20:22:47.575-07:00 level=DEBUG source=server.go:636 msg="model load progress 0.64" Jun 29 20:22:47 Aidev numactl[115262]: time=2025-06-29T20:22:47.826-07:00 level=DEBUG source=server.go:636 msg="model load progress 0.74" Jun 29 20:22:48 Aidev numactl[115262]: time=2025-06-29T20:22:48.077-07:00 level=DEBUG source=server.go:636 msg="model load progress 0.78" Jun 29 20:22:48 Aidev numactl[115262]: time=2025-06-29T20:22:48.328-07:00 level=DEBUG source=server.go:636 msg="model load progress 0.81" Jun 29 20:22:48 Aidev numactl[115262]: time=2025-06-29T20:22:48.579-07:00 level=DEBUG source=server.go:636 msg="model load progress 0.85" Jun 29 20:22:48 Aidev numactl[115262]: time=2025-06-29T20:22:48.830-07:00 level=DEBUG source=server.go:636 msg="model load progress 0.89" Jun 29 20:22:49 Aidev numactl[115262]: time=2025-06-29T20:22:49.081-07:00 level=DEBUG source=server.go:636 msg="model load progress 0.93" Jun 29 20:22:49 Aidev numactl[115262]: time=2025-06-29T20:22:49.331-07:00 level=DEBUG source=server.go:636 msg="model load progress 0.97" Jun 29 20:22:49 Aidev numactl[115262]: time=2025-06-29T20:22:49.582-07:00 level=INFO source=server.go:630 msg="llama runner started in 3.76 seconds" Jun 29 20:22:49 Aidev numactl[115262]: time=2025-06-29T20:22:49.582-07:00 level=DEBUG source=sched.go:495 msg="finished setting up" runner.name=registry.ollama.ai/library/hfs-ai-model-test:latest runner.inference=cuda runner.devices=2 runner.size="8.1 GiB" runner.vram="8.1 GiB" runner.parallel=1 runner.pid=115323 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-529850705c0884a283b87d3b261d36ee30821e16f0310962ba977b456ad3b8cd runner.num_ctx=8192 Jun 29 20:22:49 Aidev numactl[115262]: time=2025-06-29T20:22:49.743-07:00 level=DEBUG source=server.go:729 msg="completion request" images=0 prompt=3145 format="" Jun 29 20:22:49 Aidev numactl[115262]: time=2025-06-29T20:22:49.902-07:00 level=DEBUG source=vocabulary.go:52 msg="adding bos token to prompt" id=[2] Jun 29 20:22:49 Aidev numactl[115262]: time=2025-06-29T20:22:49.902-07:00 level=DEBUG source=cache.go:136 msg="loading cache slot" id=0 cache=0 prompt=764 used=0 remaining=764 Jun 29 20:22:52 Aidev numactl[115262]: [GIN] 2025/06/29 - 20:22:52 | 200 | 6.509760274s | 127.0.0.1 | POST "/v1/chat/completions" Jun 29 20:22:52 Aidev numactl[115262]: time=2025-06-29T20:22:52.781-07:00 level=DEBUG source=sched.go:503 msg="context for request finished" Jun 29 20:22:52 Aidev numactl[115262]: time=2025-06-29T20:22:52.781-07:00 level=DEBUG source=sched.go:343 msg="runner with non-zero duration has gone idle, adding timer" runner.name=registry.ollama.ai/library/hfs-ai-model-test:latest runner.inference=cuda runner.devices=2 runner.size="8.1 GiB" runner.vram="8.1 GiB" runner.parallel=1 runner.pid=115323 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-529850705c0884a283b87d3b261d36ee30821e16f0310962ba977b456ad3b8cd runner.num_ctx=8192 duration=2562047h47m16.854775807s Jun 29 20:22:52 Aidev numactl[115262]: time=2025-06-29T20:22:52.781-07:00 level=DEBUG source=sched.go:361 msg="after processing request finished event" runner.name=registry.ollama.ai/library/hfs-ai-model-test:latest runner.inference=cuda runner.devices=2 runner.size="8.1 GiB" runner.vram="8.1 GiB" runner.parallel=1 runner.pid=115323 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-529850705c0884a283b87d3b261d36ee30821e16f0310962ba977b456ad3b8cd runner.num_ctx=8192 refCount=0 Jun 29 20:22:52 Aidev numactl[115262]: [GIN] 2025/06/29 - 20:22:52 | 200 | 28.105µs | 127.0.0.1 | GET "/" Jun 29 20:23:02 Aidev numactl[115262]: [GIN] 2025/06/29 - 20:23:02 | 200 | 26.642µs | 127.0.0.1 | GET "/" Jun 29 20:23:02 Aidev numactl[115262]: time=2025-06-29T20:23:02.624-07:00 level=DEBUG source=ggml.go:155 msg="key not found" key=general.alignment default=32 Jun 29 20:23:02 Aidev numactl[115262]: time=2025-06-29T20:23:02.625-07:00 level=DEBUG source=sched.go:615 msg="evaluating already loaded" model=/usr/share/ollama/.ollama/models/blobs/sha256-529850705c0884a283b87d3b261d36ee30821e16f0310962ba977b456ad3b8cd Jun 29 20:23:45 Aidev numactl[115262]: time=2025-06-29T20:23:45.176-07:00 level=DEBUG source=server.go:729 msg="completion request" images=0 prompt=18213 format="" Jun 29 20:23:55 Aidev numactl[115262]: time=2025-06-29T20:23:55.303-07:00 level=DEBUG source=vocabulary.go:52 msg="adding bos token to prompt" id=[2] Jun 29 20:23:55 Aidev numactl[115262]: time=2025-06-29T20:23:55.303-07:00 level=DEBUG source=cache.go:136 msg="loading cache slot" id=0 cache=792 prompt=5355 used=63 remaining=5292 Jun 29 20:23:56 Aidev numactl[115262]: time=2025-06-29T20:23:56.054-07:00 level=DEBUG source=causal.go:386 msg="defragmenting kv cache" Jun 29 20:23:56 Aidev numactl[115262]: time=2025-06-29T20:23:56.342-07:00 level=DEBUG source=causal.go:386 msg="defragmenting kv cache" Jun 29 20:23:56 Aidev numactl[115262]: time=2025-06-29T20:23:56.544-07:00 level=DEBUG source=causal.go:386 msg="defragmenting kv cache" Jun 29 20:23:56 Aidev numactl[115262]: time=2025-06-29T20:23:56.892-07:00 level=DEBUG source=causal.go:386 msg="defragmenting kv cache" Jun 29 20:23:57 Aidev numactl[115262]: time=2025-06-29T20:23:57.101-07:00 level=DEBUG source=causal.go:386 msg="defragmenting kv cache" ``` ### OS Windows ### GPU Nvidia ### CPU Intel ### Ollama version Logs from 0.9.2 which I had downgraded to from 0.9.3 in trying to troubleshoot.
GiteaMirror added the bug label 2026-05-04 18:11:22 -05:00
Author
Owner

@eowensai commented on GitHub (Jul 3, 2025):

I'm providing an update to this issue that resolved one problem seen in the logs, although the CPU-only prefill persists:
Initially, the Ollama logs consistently showed an empty list of compatible GPU libraries:
“source=server.go:284 msg="compatible gpu libraries" compatible=[]”
I think this was due to a conflict where libnvidia-ml.so from Ubuntu's libnvidia-compute-535 package was being prioritized over the necessary WSL2 stubs.
The following was done:
Isolated Conflicting Libraries: The libnvidia-ml.so* files from /usr/lib/x86_64-linux-gnu/ were moved to a backup directory, followed by sudo ldconfig.
Installed Ollama's CUDA Backend: Downloaded the NVIDIA-enabled ollama-linux-amd64.tgz build and extracted them. Created /usr/local/lib/ollama/cuda_v12 and moved the CUDA 12 libs there (they had extracted under /usr/lib/ollama).
Configured Systemd Service: Created a systemd override (/etc/systemd/system/ollama.service.d/override.conf) to explicitly set the following environment variables:
OLLAMA_LIBRARY_PATH=/usr/local/lib/ollama/cuda_v12
LD_LIBRARY_PATH=/usr/lib/wsl/lib:/usr/local/lib/ollama/cuda_v12
After a daemon-reload and service restart, the logs now correctly show that the CUDA backend is found:
“compatible gpu libraries" compatible=[cuda_v12]

<!-- gh-comment-id:3030513697 --> @eowensai commented on GitHub (Jul 3, 2025): I'm providing an update to this issue that resolved one problem seen in the logs, **although the CPU-only prefill persists**: Initially, the Ollama logs consistently showed an empty list of compatible GPU libraries: _“source=server.go:284 msg="compatible gpu libraries" compatible=[]”_ I think this was due to a conflict where libnvidia-ml.so from Ubuntu's libnvidia-compute-535 package was being prioritized over the necessary WSL2 stubs. **The following was done:** **Isolated Conflicting Libraries**: The libnvidia-ml.so* files from /usr/lib/x86_64-linux-gnu/ were moved to a backup directory, followed by sudo ldconfig. **Installed Ollama's CUDA Backend**: Downloaded the NVIDIA-enabled ollama-linux-amd64.tgz build and extracted them. Created /usr/local/lib/ollama/cuda_v12 and moved the CUDA 12 libs there (they had extracted under /usr/lib/ollama). **Configured Systemd Service**: Created a systemd override (/etc/systemd/system/ollama.service.d/override.conf) to explicitly set the following environment variables: OLLAMA_LIBRARY_PATH=/usr/local/lib/ollama/cuda_v12 LD_LIBRARY_PATH=/usr/lib/wsl/lib:/usr/local/lib/ollama/cuda_v12 After a daemon-reload and service restart, the logs now correctly show that the CUDA backend is found: _“compatible gpu libraries" compatible=[cuda_v12]_
Author
Owner

@eowensai commented on GitHub (Jul 3, 2025):

Today's attempts, no luck getting prefill to work on the gpu(s)

  • Rebooted (it's Windows after all)
  • Devices=0 (from 0,1)
  • Ollama flash attention = 0 (was 1)
  • Tried gemma3 12b version of the it qat model (from 4b)
  • OLLAMA_KV_CACHE_TYPE=f16 (previously didn't include defining this to anything. Logs now report "OLLAMA_KV_CACHE_TYPE:f16" where before it was "OLLAMA_KV_CACHE_TYPE: ")
<!-- gh-comment-id:3033970126 --> @eowensai commented on GitHub (Jul 3, 2025): Today's attempts, no luck getting prefill to work on the gpu(s) - Rebooted (it's Windows after all) - Devices=0 (from 0,1) - Ollama flash attention = 0 (was 1) - Tried gemma3 12b version of the it qat model (from 4b) - OLLAMA_KV_CACHE_TYPE=f16 (previously didn't include defining this to anything. Logs now report "OLLAMA_KV_CACHE_TYPE:f16" where before it was "OLLAMA_KV_CACHE_TYPE: ")
Author
Owner

@Atliac commented on GitHub (Jul 11, 2025):

I have the same issue.

Host OS: Windows 10 (native)
ollama version: 0.9.6
GPU: NVIDIA GeForce RTX 4060 TI 8G
Model: gemma-3-4b-it-qat.gguf

<!-- gh-comment-id:3060027760 --> @Atliac commented on GitHub (Jul 11, 2025): I have the same issue. Host OS: Windows 10 (native) ollama version: 0.9.6 GPU: NVIDIA GeForce RTX 4060 TI 8G Model: gemma-3-4b-it-qat.gguf
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#69464