[GH-ISSUE #12431] Slow model loading #34016

Closed
opened 2026-04-22 17:14:41 -05:00 by GiteaMirror · 4 comments
Owner

Originally created by @asdnemasd on GitHub (Sep 27, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/12431

What is the issue?

With the Qwen3-Coder-30B-A3B model and Ollama v0.12.1, the model loads at around 100 MB/s. However, with the latest version, the model only loads at around 30 MB/s, which is significantly slower. In both cases, the model loads from an HDD.

Relevant log output

time=2025-09-27T19:30:06.369+02:00 level=INFO source=routes.go:1475 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:DEBUG OLLAMA_FLASH_ATTENTION:true OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:2562047h47m16.854775807s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:D:\\ollama\\models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:true OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES:]"
time=2025-09-27T19:30:06.437+02:00 level=INFO source=images.go:518 msg="total blobs: 36"
time=2025-09-27T19:30:06.439+02:00 level=INFO source=images.go:525 msg="total unused blobs removed: 0"
time=2025-09-27T19:30:06.440+02:00 level=INFO source=routes.go:1528 msg="Listening on 127.0.0.1:11434 (version 0.12.3)"
time=2025-09-27T19:30:06.440+02:00 level=DEBUG source=sched.go:121 msg="starting llm scheduler"
time=2025-09-27T19:30:06.440+02:00 level=INFO source=gpu.go:217 msg="looking for compatible GPUs"
time=2025-09-27T19:30:06.440+02:00 level=INFO source=gpu_windows.go:167 msg=packages count=1
time=2025-09-27T19:30:06.441+02:00 level=INFO source=gpu_windows.go:214 msg="" package=0 cores=6 efficiency=0 threads=6
time=2025-09-27T19:30:06.441+02:00 level=DEBUG source=gpu.go:98 msg="searching for GPU discovery libraries for NVIDIA"
time=2025-09-27T19:30:06.441+02:00 level=DEBUG source=gpu.go:520 msg="Searching for GPU library" name=nvml.dll
time=2025-09-27T19:30:06.441+02:00 level=DEBUG source=gpu.go:544 msg="gpu library search" globs="[C:\\Users\\asd\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\nvml.dll C:\\Program Files\\ImageMagick-7.1.2-Q16-HDRI\\nvml.dll C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.9\\bin\\nvml.dll C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.9\\libnvvp\\nvml.dll C:\\Program Files\\ImageMagick-7.1.1-Q16-HDRI\\nvml.dll C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.8\\bin\\nvml.dll C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.8\\libnvvp\\nvml.dll c:\\program files\\nvidia gpu computing toolkit\\cuda\\v11.8\\bin\\nvml.dll c:\\program files\\nvidia gpu computing toolkit\\cuda\\v11.8\\libnvvp\\nvml.dll c:\\program files\\eclipse adoptium\\jre-21.0.6.7-hotspot\\bin\\nvml.dll c:\\program files\\lapce\\nvml.dll c:\\program files\\imagemagick-7.1.1-q16-hdri\\nvml.dll c:\\p\\ffmpeg-6.1.1-full_build\\bin\\nvml.dll c:\\program files\\dotnet\\nvml.dll c:\\windows\\system32\\windowspowershell\\v1.0\\nvml.dll c:\\program files\\gource\\cmd\\nvml.dll c:\\program files\\nodejs\\nvml.dll c:\\program files\\git\\cmd\\nvml.dll c:\\program files\\exiftoolgui\\nvml.dll c:\\program files\\git\\cmd\\nvml.dll c:\\program files\\wireguard\\nvml.dll C:\\Program Files (x86)\\NVIDIA Corporation\\PhysX\\Common\\nvml.dll C:\\Program Files\\NVIDIA Corporation\\NVIDIA App\\NvDLISR\\nvml.dll C:\\Program Files\\Sunshine\\nvml.dll C:\\Program Files\\Sunshine\\tools\\nvml.dll C:\\Program Files\\Git\\cmd\\nvml.dll C:\\Program Files (x86)\\ZeroTier\\One\\nvml.dll C:\\Users\\asd\\AppData\\Local\\Programs\\Python\\Launcher\\nvml.dll C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.8\\bin\\nvml.dll C:\\Users\\asd\\AppData\\Local\\Programs\\Python\\Python39\\Scripts\\nvml.dll C:\\Users\\asd\\AppData\\Local\\Programs\\Python\\Python39\\nvml.dll C:\\Users\\asd\\AppData\\Local\\Programs\\Python\\Python312\\Scripts\\nvml.dll C:\\Users\\asd\\AppData\\Local\\Programs\\Python\\Python312\\nvml.dll C:\\Users\\asd\\AppData\\Local\\UniGetUI\\Chocolatey\\bin\\nvml.dll C:\\Users\\asd\\AppData\\Local\\Programs\\Python\\Python310\\Scripts\\nvml.dll C:\\Users\\asd\\AppData\\Local\\Programs\\Python\\Python310\\nvml.dll C:\\Users\\asd\\AppData\\Local\\Programs\\WingetUI\\choco-cli\\bin\\nvml.dll C:\\Users\\asd\\AppData\\Local\\Microsoft\\WindowsApps\\nvml.dll C:\\Users\\asd\\AppData\\Local\\Programs\\VSCodium\\bin\\nvml.dll C:\\Users\\asd\\AppData\\Local\\Programs\\Hyper\\resources\\bin\\nvml.dll C:\\Users\\asd\\AppData\\Local\\Programs\\Ollama\\nvml.dll C:\\Users\\asd\\.dotnet\\tools\\nvml.dll C:\\msys64\\mingw64\\bin\\nvml.dll C:\\msys64\\usr\\bin\\nvml.dll C:\\Windows\\System32\\nvml.dll C:\\p\\path\\nvml.dll C:\\Program Files\\Airshipper\\nvml.dll C:\\Users\\asd\\.dotnet\\tools\\nvml.dll c:\\users\\asd\\.local\\bin\\nvml.dll C:\\Users\\asd\\AppData\\Roaming\\npm\\nvml.dll C:\\Users\\asd\\.lmstudio\\bin\\nvml.dll C:\\Users\\asd\\AppData\\Local\\Microsoft\\WinGet\\Packages\\Gyan.FFmpeg.Essentials_Microsoft.Winget.Source_8wekyb3d8bbwe\\ffmpeg-7.1.1-essentials_build\\bin\\nvml.dll C:\\Users\\asd\\.dotnet\\tools\\nvml.dll C:\\Users\\asd\\nvml.dll c:\\Windows\\System32\\nvml.dll]"
time=2025-09-27T19:30:06.442+02:00 level=DEBUG source=gpu.go:548 msg="skipping PhysX cuda library path" path="C:\\Program Files (x86)\\NVIDIA Corporation\\PhysX\\Common\\nvml.dll"
time=2025-09-27T19:30:06.444+02:00 level=DEBUG source=gpu.go:577 msg="discovered GPU libraries" paths="[C:\\Windows\\System32\\nvml.dll c:\\Windows\\System32\\nvml.dll]"
time=2025-09-27T19:30:06.466+02:00 level=DEBUG source=gpu.go:111 msg="nvidia-ml loaded" library=C:\Windows\System32\nvml.dll
time=2025-09-27T19:30:06.466+02:00 level=DEBUG source=gpu.go:520 msg="Searching for GPU library" name=nvcuda.dll
time=2025-09-27T19:30:06.468+02:00 level=DEBUG source=gpu.go:548 msg="skipping PhysX cuda library path" path="C:\\Program Files (x86)\\NVIDIA Corporation\\PhysX\\Common\\nvcuda.dll"
time=2025-09-27T19:30:06.471+02:00 level=DEBUG source=gpu.go:577 msg="discovered GPU libraries" paths=[C:\Windows\System32\nvcuda.dll]
initializing C:\Windows\System32\nvcuda.dll
dlsym: cuInit - 00007FFADD1D1F80
dlsym: cuDriverGetVersion - 00007FFADD1D2020
dlsym: cuDeviceGetCount - 00007FFADD1D2816
dlsym: cuDeviceGet - 00007FFADD1D2810
dlsym: cuDeviceGetAttribute - 00007FFADD1D2170
dlsym: cuDeviceGetUuid - 00007FFADD1D2822
dlsym: cuDeviceGetName - 00007FFADD1D281C
dlsym: cuCtxCreate_v3 - 00007FFADD1D2894
dlsym: cuMemGetInfo_v2 - 00007FFADD1D2996
dlsym: cuCtxDestroy - 00007FFADD1D28A6
calling cuInit
calling cuDriverGetVersion
raw version 0x2f3a
CUDA driver version: 12.9
calling cuDeviceGetCount
device count 1
time=2025-09-27T19:30:06.495+02:00 level=DEBUG source=gpu.go:125 msg="detected GPUs" count=1 library=C:\Windows\System32\nvcuda.dll
[GPU-d0f9ed85-a082-2e7e-f1ca-ec56c347fcb0] CUDA totalMem 16379mb
[GPU-d0f9ed85-a082-2e7e-f1ca-ec56c347fcb0] CUDA freeMem 15225mb
[GPU-d0f9ed85-a082-2e7e-f1ca-ec56c347fcb0] Compute Capability 8.9
time=2025-09-27T19:30:06.597+02:00 level=WARN source=cuda_common.go:60 msg="old CUDA driver detected - please upgrade to a newer driver for best performance" version=12.9
time=2025-09-27T19:30:06.600+02:00 level=DEBUG source=amd_windows.go:34 msg="unable to load amdhip64_6.dll, please make sure to upgrade to the latest amd driver: A megadott modul nem található."
releasing cuda driver library
releasing nvml library
time=2025-09-27T19:30:06.602+02:00 level=INFO source=types.go:131 msg="inference compute" id=GPU-d0f9ed85-a082-2e7e-f1ca-ec56c347fcb0 library=cuda variant=v12 compute=8.9 driver=12.9 name="NVIDIA GeForce RTX 4060 Ti" total="16.0 GiB" available="14.9 GiB"
time=2025-09-27T19:30:06.602+02:00 level=INFO source=routes.go:1569 msg="entering low vram mode" "total vram"="16.0 GiB" threshold="20.0 GiB"
[GIN] 2025/09/27 - 19:30:12 | 200 |            0s |       127.0.0.1 | HEAD     "/"
time=2025-09-27T19:30:12.270+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32
[GIN] 2025/09/27 - 19:30:12 | 200 |    102.5496ms |       127.0.0.1 | POST     "/api/show"
time=2025-09-27T19:30:12.356+02:00 level=DEBUG source=gpu.go:410 msg="updating system memory data" before.total="23.9 GiB" before.free="18.4 GiB" before.free_swap="24.6 GiB" now.total="23.9 GiB" now.free="18.4 GiB" now.free_swap="24.5 GiB"
time=2025-09-27T19:30:12.377+02:00 level=DEBUG source=gpu.go:460 msg="updating cuda memory data" gpu=GPU-d0f9ed85-a082-2e7e-f1ca-ec56c347fcb0 name="NVIDIA GeForce RTX 4060 Ti" overhead="0 B" before.total="16.0 GiB" before.free="14.9 GiB" now.total="16.0 GiB" now.free="14.7 GiB" now.used="1.3 GiB"
releasing nvml library
time=2025-09-27T19:30:12.379+02:00 level=DEBUG source=sched.go:188 msg="updating default concurrency" OLLAMA_MAX_LOADED_MODELS=3 gpu_count=1
time=2025-09-27T19:30:12.394+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32
time=2025-09-27T19:30:12.397+02:00 level=DEBUG source=sched.go:208 msg="loading first model" model=D:\ollama\models\blobs\sha256-17d51f5310e9a598e5ac914d30f401fb2d1bc3b6a06a846919099eac09364ae1
time=2025-09-27T19:30:12.453+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32
time=2025-09-27T19:30:12.456+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=qwen3moe.pooling_type default=0
time=2025-09-27T19:30:12.456+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.bos_token_id default=0
time=2025-09-27T19:30:12.457+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.add_eos_token default=false
time=2025-09-27T19:30:12.457+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}"
time=2025-09-27T19:30:12.460+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=qwen3moe.rope.scaling.factor default=1
time=2025-09-27T19:30:12.460+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=qwen3moe.norm_top_k_prob default=true
time=2025-09-27T19:30:12.461+02:00 level=DEBUG source=gpu.go:410 msg="updating system memory data" before.total="23.9 GiB" before.free="18.4 GiB" before.free_swap="24.5 GiB" now.total="23.9 GiB" now.free="18.4 GiB" now.free_swap="24.5 GiB"
time=2025-09-27T19:30:12.486+02:00 level=DEBUG source=gpu.go:460 msg="updating cuda memory data" gpu=GPU-d0f9ed85-a082-2e7e-f1ca-ec56c347fcb0 name="NVIDIA GeForce RTX 4060 Ti" overhead="0 B" before.total="16.0 GiB" before.free="14.7 GiB" now.total="16.0 GiB" now.free="14.7 GiB" now.used="1.3 GiB"
releasing nvml library
time=2025-09-27T19:30:12.489+02:00 level=INFO source=server.go:217 msg="enabling flash attention"
time=2025-09-27T19:30:12.499+02:00 level=DEBUG source=server.go:324 msg="adding gpu library" path=C:\Users\asd\AppData\Local\Programs\Ollama\lib\ollama\cuda_v12
time=2025-09-27T19:30:12.499+02:00 level=DEBUG source=server.go:332 msg="adding gpu dependency paths" paths=[C:\Users\asd\AppData\Local\Programs\Ollama\lib\ollama\cuda_v12]
time=2025-09-27T19:30:12.500+02:00 level=INFO source=server.go:399 msg="starting runner" cmd="C:\\Users\\asd\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --model D:\\ollama\\models\\blobs\\sha256-17d51f5310e9a598e5ac914d30f401fb2d1bc3b6a06a846919099eac09364ae1 --port 60264"
time=2025-09-27T19:30:12.507+02:00 level=INFO source=server.go:672 msg="loading model" "model layers"=49 requested=99
time=2025-09-27T19:30:12.508+02:00 level=DEBUG source=gpu.go:410 msg="updating system memory data" before.total="23.9 GiB" before.free="18.4 GiB" before.free_swap="24.5 GiB" now.total="23.9 GiB" now.free="18.4 GiB" now.free_swap="24.5 GiB"
time=2025-09-27T19:30:12.517+02:00 level=DEBUG source=gpu.go:460 msg="updating cuda memory data" gpu=GPU-d0f9ed85-a082-2e7e-f1ca-ec56c347fcb0 name="NVIDIA GeForce RTX 4060 Ti" overhead="0 B" before.total="16.0 GiB" before.free="14.7 GiB" now.total="16.0 GiB" now.free="14.7 GiB" now.used="1.3 GiB"
releasing nvml library
time=2025-09-27T19:30:12.519+02:00 level=INFO source=server.go:678 msg="system memory" total="23.9 GiB" free="18.4 GiB" free_swap="24.5 GiB"
time=2025-09-27T19:30:12.519+02:00 level=INFO source=server.go:686 msg="gpu memory" id=GPU-d0f9ed85-a082-2e7e-f1ca-ec56c347fcb0 available="14.2 GiB" free="14.7 GiB" minimum="457.0 MiB" overhead="0 B"
time=2025-09-27T19:30:12.548+02:00 level=INFO source=runner.go:1252 msg="starting ollama engine"
time=2025-09-27T19:30:12.559+02:00 level=INFO source=runner.go:1287 msg="Server listening on 127.0.0.1:60264"
time=2025-09-27T19:30:12.563+02:00 level=INFO source=runner.go:1171 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:8192 KvCacheType: NumThreads:6 GPULayers:49[ID:GPU-d0f9ed85-a082-2e7e-f1ca-ec56c347fcb0 Layers:49(0..48)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2025-09-27T19:30:12.595+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32
time=2025-09-27T19:30:12.598+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.description default=""
time=2025-09-27T19:30:12.599+02:00 level=INFO source=ggml.go:131 msg="" architecture=qwen3moe file_type=Q3_K_S name=Qwen3-Coder-30B-A3B-Instruct description="" num_tensors=579 num_key_values=45
time=2025-09-27T19:30:12.600+02:00 level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=C:\Users\asd\AppData\Local\Programs\Ollama\lib\ollama
load_backend: loaded CPU backend from C:\Users\asd\AppData\Local\Programs\Ollama\lib\ollama\ggml-cpu-haswell.dll
time=2025-09-27T19:30:13.385+02:00 level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=C:\Users\asd\AppData\Local\Programs\Ollama\lib\ollama\cuda_v12
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 4060 Ti, compute capability 8.9, VMM: yes, ID: GPU-d0f9ed85-a082-2e7e-f1ca-ec56c347fcb0
load_backend: loaded CUDA backend from C:\Users\asd\AppData\Local\Programs\Ollama\lib\ollama\cuda_v12\ggml-cuda.dll
time=2025-09-27T19:30:14.894+02:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=500,600,610,700,750,800,860,870,890,900,1200 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(clang)
time=2025-09-27T19:30:14.901+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=qwen3moe.pooling_type default=0
time=2025-09-27T19:30:14.901+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.bos_token_id default=0
time=2025-09-27T19:30:14.901+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.add_eos_token default=false
time=2025-09-27T19:30:14.901+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}"
time=2025-09-27T19:30:14.901+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=qwen3moe.rope.scaling.factor default=1
time=2025-09-27T19:30:14.901+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=qwen3moe.norm_top_k_prob default=true
time=2025-09-27T19:30:15.008+02:00 level=DEBUG source=ggml.go:794 msg="compute graph" nodes=2982 splits=2
time=2025-09-27T19:30:15.009+02:00 level=DEBUG source=backend.go:310 msg="model weights" device=CUDA0 size="12.2 GiB"
time=2025-09-27T19:30:15.010+02:00 level=DEBUG source=backend.go:315 msg="model weights" device=CPU size="127.5 MiB"
time=2025-09-27T19:30:15.010+02:00 level=DEBUG source=backend.go:321 msg="kv cache" device=CUDA0 size="768.0 MiB"
time=2025-09-27T19:30:15.010+02:00 level=DEBUG source=backend.go:332 msg="compute graph" device=CUDA0 size="92.0 MiB"
time=2025-09-27T19:30:15.011+02:00 level=DEBUG source=backend.go:337 msg="compute graph" device=CPU size="4.0 MiB"
time=2025-09-27T19:30:15.012+02:00 level=DEBUG source=backend.go:342 msg="total memory" size="13.2 GiB"
time=2025-09-27T19:30:15.012+02:00 level=DEBUG source=server.go:717 msg=memory success=true required.InputWeights=133703680U required.CPU.Graph=4194304U required.CUDA0.ID=GPU-d0f9ed85-a082-2e7e-f1ca-ec56c347fcb0 required.CUDA0.Weights="[268698752U 268698752U 268698752U 268698752U 268698752U 268698752U 268698752U 268698752U 268698752U 268698752U 268698752U 268698752U 268698752U 268698752U 268698752U 268698752U 268698752U 268698752U 268698752U 268698752U 268698752U 268698752U 268698752U 268698752U 268698752U 268698752U 268698752U 268698752U 268698752U 268698752U 268698752U 268698752U 268698752U 268698752U 268698752U 268698752U 268698752U 268698752U 268698752U 268698752U 268698752U 268698752U 268698752U 268698752U 268698752U 268698752U 268698752U 268698752U 255260672U]" required.CUDA0.Cache="[16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 0U]" required.CUDA0.Graph=96471168U
time=2025-09-27T19:30:15.013+02:00 level=DEBUG source=server.go:894 msg="available gpu" id=GPU-d0f9ed85-a082-2e7e-f1ca-ec56c347fcb0 "available layer vram"="14.1 GiB" backoff=0.00 minimum="457.0 MiB" overhead="0 B" graph="92.0 MiB"
time=2025-09-27T19:30:15.013+02:00 level=DEBUG source=server.go:728 msg="new layout created" layers="49[ID:GPU-d0f9ed85-a082-2e7e-f1ca-ec56c347fcb0 Layers:49(0..48)]"
time=2025-09-27T19:30:15.013+02:00 level=INFO source=runner.go:1171 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:8192 KvCacheType: NumThreads:6 GPULayers:49[ID:GPU-d0f9ed85-a082-2e7e-f1ca-ec56c347fcb0 Layers:49(0..48)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2025-09-27T19:30:15.041+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32
time=2025-09-27T19:30:15.081+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=qwen3moe.pooling_type default=0
time=2025-09-27T19:30:15.081+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.bos_token_id default=0
time=2025-09-27T19:30:15.081+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.add_eos_token default=false
time=2025-09-27T19:30:15.081+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}"
time=2025-09-27T19:30:15.081+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=qwen3moe.rope.scaling.factor default=1
time=2025-09-27T19:30:15.081+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=qwen3moe.norm_top_k_prob default=true
time=2025-09-27T19:30:15.376+02:00 level=DEBUG source=ggml.go:794 msg="compute graph" nodes=2982 splits=2
time=2025-09-27T19:30:15.377+02:00 level=DEBUG source=backend.go:310 msg="model weights" device=CUDA0 size="12.2 GiB"
time=2025-09-27T19:30:15.377+02:00 level=DEBUG source=backend.go:315 msg="model weights" device=CPU size="127.5 MiB"
time=2025-09-27T19:30:15.378+02:00 level=DEBUG source=backend.go:321 msg="kv cache" device=CUDA0 size="768.0 MiB"
time=2025-09-27T19:30:15.378+02:00 level=DEBUG source=backend.go:332 msg="compute graph" device=CUDA0 size="92.0 MiB"
time=2025-09-27T19:30:15.378+02:00 level=DEBUG source=backend.go:337 msg="compute graph" device=CPU size="4.0 MiB"
time=2025-09-27T19:30:15.379+02:00 level=DEBUG source=backend.go:342 msg="total memory" size="13.2 GiB"
time=2025-09-27T19:30:15.379+02:00 level=DEBUG source=server.go:717 msg=memory success=true required.InputWeights=133703680A required.CPU.Graph=4194304A required.CUDA0.ID=GPU-d0f9ed85-a082-2e7e-f1ca-ec56c347fcb0 required.CUDA0.Weights="[268698752A 268698752A 268698752A 268698752A 268698752A 268698752A 268698752A 268698752A 268698752A 268698752A 268698752A 268698752A 268698752A 268698752A 268698752A 268698752A 268698752A 268698752A 268698752A 268698752A 268698752A 268698752A 268698752A 268698752A 268698752A 268698752A 268698752A 268698752A 268698752A 268698752A 268698752A 268698752A 268698752A 268698752A 268698752A 268698752A 268698752A 268698752A 268698752A 268698752A 268698752A 268698752A 268698752A 268698752A 268698752A 268698752A 268698752A 268698752A 255260672A]" required.CUDA0.Cache="[16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 0U]" required.CUDA0.Graph=96471168A
time=2025-09-27T19:30:15.380+02:00 level=DEBUG source=server.go:894 msg="available gpu" id=GPU-d0f9ed85-a082-2e7e-f1ca-ec56c347fcb0 "available layer vram"="14.1 GiB" backoff=0.00 minimum="457.0 MiB" overhead="0 B" graph="92.0 MiB"
time=2025-09-27T19:30:15.380+02:00 level=DEBUG source=server.go:728 msg="new layout created" layers="49[ID:GPU-d0f9ed85-a082-2e7e-f1ca-ec56c347fcb0 Layers:49(0..48)]"
time=2025-09-27T19:30:15.381+02:00 level=INFO source=runner.go:1171 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:8192 KvCacheType: NumThreads:6 GPULayers:49[ID:GPU-d0f9ed85-a082-2e7e-f1ca-ec56c347fcb0 Layers:49(0..48)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2025-09-27T19:30:15.381+02:00 level=INFO source=backend.go:310 msg="model weights" device=CUDA0 size="12.2 GiB"
time=2025-09-27T19:30:15.381+02:00 level=INFO source=backend.go:315 msg="model weights" device=CPU size="127.5 MiB"
time=2025-09-27T19:30:15.382+02:00 level=INFO source=backend.go:321 msg="kv cache" device=CUDA0 size="768.0 MiB"
time=2025-09-27T19:30:15.381+02:00 level=INFO source=ggml.go:487 msg="offloading 48 repeating layers to GPU"
time=2025-09-27T19:30:15.382+02:00 level=INFO source=ggml.go:493 msg="offloading output layer to GPU"
time=2025-09-27T19:30:15.382+02:00 level=INFO source=ggml.go:498 msg="offloaded 49/49 layers to GPU"
time=2025-09-27T19:30:15.382+02:00 level=INFO source=backend.go:332 msg="compute graph" device=CUDA0 size="92.0 MiB"
time=2025-09-27T19:30:15.383+02:00 level=INFO source=backend.go:337 msg="compute graph" device=CPU size="4.0 MiB"
time=2025-09-27T19:30:15.383+02:00 level=INFO source=backend.go:342 msg="total memory" size="13.2 GiB"
time=2025-09-27T19:30:15.384+02:00 level=INFO source=sched.go:470 msg="loaded runners" count=1
time=2025-09-27T19:30:15.386+02:00 level=INFO source=server.go:1251 msg="waiting for llama runner to start responding"
time=2025-09-27T19:30:15.386+02:00 level=INFO source=server.go:1285 msg="waiting for server to become available" status="llm server loading model"
time=2025-09-27T19:30:15.387+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.00"
time=2025-09-27T19:30:15.637+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.00"
time=2025-09-27T19:30:15.888+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.00"
time=2025-09-27T19:30:16.138+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.00"
time=2025-09-27T19:30:16.388+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.00"
time=2025-09-27T19:34:42.457+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.70"
time=2025-09-27T19:34:42.708+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.70"
time=2025-09-27T19:34:42.958+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.70"
time=2025-09-27T19:34:43.208+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.70"
time=2025-09-27T19:34:43.459+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.70"
time=2025-09-27T19:34:43.709+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.70"
time=2025-09-27T19:34:43.959+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.70"
time=2025-09-27T19:34:44.209+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.70"
time=2025-09-27T19:34:44.459+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.71"
time=2025-09-27T19:34:44.709+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.71"
time=2025-09-27T19:34:44.959+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.71"
time=2025-09-27T19:34:45.209+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.71"
time=2025-09-27T19:34:45.460+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.71"
time=2025-09-27T19:34:45.710+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.71"
time=2025-09-27T19:34:45.960+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.71"
time=2025-09-27T19:34:46.211+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.71"
time=2025-09-27T19:34:46.461+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.71"
time=2025-09-27T19:34:46.712+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.71"
time=2025-09-27T19:34:46.962+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.71"
time=2025-09-27T19:34:47.212+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.71"
time=2025-09-27T19:34:47.463+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.71"
time=2025-09-27T19:34:47.713+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.71"
time=2025-09-27T19:34:47.963+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.71"
time=2025-09-27T19:34:48.213+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.72"
time=2025-09-27T19:34:48.463+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.72"
time=2025-09-27T19:34:48.713+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.72"
time=2025-09-27T19:34:48.964+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.72"
time=2025-09-27T19:34:49.214+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.72"
time=2025-09-27T19:34:49.464+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.72"
time=2025-09-27T19:34:49.714+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.72"
time=2025-09-27T19:34:49.965+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.72"
time=2025-09-27T19:34:50.215+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.72"
time=2025-09-27T19:34:50.466+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.72"
time=2025-09-27T19:34:50.716+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.72"
time=2025-09-27T19:34:50.967+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.72"
time=2025-09-27T19:34:51.217+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.72"
time=2025-09-27T19:34:51.467+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.72"
time=2025-09-27T19:34:51.717+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.72"
time=2025-09-27T19:34:51.968+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.73"
time=2025-09-27T19:34:52.218+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.73"
time=2025-09-27T19:34:52.468+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.73"
time=2025-09-27T19:34:52.718+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.73"
time=2025-09-27T19:34:52.969+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.73"
time=2025-09-27T19:34:53.219+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.73"
time=2025-09-27T19:34:53.469+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.73"
time=2025-09-27T19:34:53.720+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.73"
time=2025-09-27T19:34:53.970+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.73"
time=2025-09-27T19:34:54.220+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.73"
time=2025-09-27T19:34:54.470+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.73"
time=2025-09-27T19:34:54.721+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.73"
time=2025-09-27T19:34:54.971+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.73"
time=2025-09-27T19:34:55.221+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.73"
time=2025-09-27T19:34:55.472+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.73"
time=2025-09-27T19:34:55.722+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.73"
time=2025-09-27T19:34:55.972+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.74"
time=2025-09-27T19:34:56.222+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.74"
time=2025-09-27T19:34:56.473+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.74"
time=2025-09-27T19:34:56.723+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.74"
time=2025-09-27T19:34:56.974+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.74"
time=2025-09-27T19:34:57.224+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.74"
time=2025-09-27T19:34:57.474+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.74"
time=2025-09-27T19:34:57.724+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.74"
time=2025-09-27T19:34:57.975+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.74"
time=2025-09-27T19:34:58.225+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.74"
time=2025-09-27T19:34:58.476+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.74"
time=2025-09-27T19:34:58.726+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.74"
time=2025-09-27T19:34:58.976+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.74"
time=2025-09-27T19:34:59.226+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.74"
time=2025-09-27T19:34:59.476+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.74"
time=2025-09-27T19:34:59.727+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.75"
time=2025-09-27T19:34:59.977+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.75"
time=2025-09-27T19:35:00.227+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.75"
time=2025-09-27T19:35:00.477+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.75"
time=2025-09-27T19:35:00.728+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.75"
time=2025-09-27T19:35:00.978+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.75"
time=2025-09-27T19:35:01.228+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.75"
time=2025-09-27T19:35:01.478+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.75"
time=2025-09-27T19:35:01.728+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.75"
time=2025-09-27T19:35:01.979+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.75"
time=2025-09-27T19:35:02.229+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.75"
time=2025-09-27T19:35:02.479+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.75"
time=2025-09-27T19:35:02.730+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.75"
time=2025-09-27T19:35:02.980+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.75"
time=2025-09-27T19:35:03.230+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.75"
time=2025-09-27T19:35:03.481+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.76"
time=2025-09-27T19:35:03.731+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.76"
time=2025-09-27T19:35:03.981+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.76"
time=2025-09-27T19:35:04.232+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.76"
time=2025-09-27T19:35:04.482+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.76"
time=2025-09-27T19:35:04.732+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.76"
time=2025-09-27T19:35:04.983+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.76"
time=2025-09-27T19:35:05.233+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.76"
time=2025-09-27T19:35:05.483+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.76"
time=2025-09-27T19:35:05.734+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.76"
time=2025-09-27T19:35:05.984+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.76"
time=2025-09-27T19:35:06.234+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.76"
time=2025-09-27T19:35:06.484+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.76"
time=2025-09-27T19:35:06.734+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.76"
time=2025-09-27T19:35:06.984+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.76"
time=2025-09-27T19:35:07.234+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.76"
time=2025-09-27T19:35:07.485+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.76"
time=2025-09-27T19:35:07.735+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.77"
time=2025-09-27T19:35:07.986+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.77"
time=2025-09-27T19:35:08.236+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.77"
time=2025-09-27T19:35:08.486+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.77"
time=2025-09-27T19:35:08.736+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.77"
time=2025-09-27T19:35:08.986+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.77"
time=2025-09-27T19:35:09.237+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.77"
time=2025-09-27T19:35:09.487+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.77"
time=2025-09-27T19:35:09.737+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.77"
time=2025-09-27T19:35:09.987+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.77"
time=2025-09-27T19:35:10.237+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.77"
time=2025-09-27T19:35:10.488+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.77"
time=2025-09-27T19:35:10.738+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.77"
time=2025-09-27T19:35:10.988+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.77"
time=2025-09-27T19:35:11.239+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.77"
time=2025-09-27T19:35:11.489+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.77"
time=2025-09-27T19:35:11.739+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.77"
time=2025-09-27T19:35:11.990+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.78"
time=2025-09-27T19:35:12.240+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.78"
time=2025-09-27T19:35:12.490+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.78"
time=2025-09-27T19:35:12.741+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.78"
time=2025-09-27T19:35:12.991+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.78"
time=2025-09-27T19:35:13.241+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.78"
time=2025-09-27T19:35:13.492+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.78"
time=2025-09-27T19:35:13.742+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.78"
time=2025-09-27T19:35:13.992+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.78"
time=2025-09-27T19:35:14.242+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.78"
time=2025-09-27T19:35:14.492+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.78"
time=2025-09-27T19:35:14.743+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.78"
time=2025-09-27T19:35:14.993+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.78"
time=2025-09-27T19:35:15.243+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.78"
time=2025-09-27T19:35:15.493+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.78"
time=2025-09-27T19:35:15.743+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.78"
time=2025-09-27T19:35:15.994+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.78"
time=2025-09-27T19:35:16.244+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.79"
time=2025-09-27T19:35:16.494+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.79"
time=2025-09-27T19:35:16.744+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.79"
time=2025-09-27T19:35:16.994+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.79"
time=2025-09-27T19:35:17.245+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.79"
time=2025-09-27T19:35:17.495+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.79"
time=2025-09-27T19:35:17.745+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.79"
time=2025-09-27T19:35:17.995+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.79"
time=2025-09-27T19:35:18.245+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.79"
time=2025-09-27T19:35:18.496+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.79"
time=2025-09-27T19:35:18.746+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.80"
time=2025-09-27T19:35:18.996+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.80"
time=2025-09-27T19:35:19.247+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.80"
time=2025-09-27T19:35:19.497+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.80"
time=2025-09-27T19:35:19.748+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.80"
time=2025-09-27T19:35:19.998+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.80"
time=2025-09-27T19:35:20.249+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.80"
time=2025-09-27T19:35:20.499+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.80"
time=2025-09-27T19:35:20.749+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.80"
time=2025-09-27T19:35:21.000+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.81"
time=2025-09-27T19:35:21.250+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.81"
time=2025-09-27T19:35:21.501+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.81"
time=2025-09-27T19:35:21.751+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.81"
time=2025-09-27T19:35:22.001+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.81"
time=2025-09-27T19:35:22.251+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.81"
time=2025-09-27T19:35:22.502+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.81"
time=2025-09-27T19:35:22.752+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.81"
time=2025-09-27T19:35:23.002+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.82"
time=2025-09-27T19:35:23.253+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.82"
time=2025-09-27T19:35:23.503+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.82"
time=2025-09-27T19:35:23.754+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.82"
time=2025-09-27T19:35:24.004+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.82"
time=2025-09-27T19:35:24.254+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.82"
time=2025-09-27T19:35:24.504+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.82"
time=2025-09-27T19:35:24.754+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.82"
time=2025-09-27T19:35:25.004+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.83"
time=2025-09-27T19:35:25.254+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.83"
time=2025-09-27T19:35:25.505+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.83"
time=2025-09-27T19:35:25.755+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.83"
time=2025-09-27T19:35:26.005+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.83"
time=2025-09-27T19:35:26.256+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.83"
time=2025-09-27T19:35:26.506+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.83"
time=2025-09-27T19:35:26.756+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.83"
time=2025-09-27T19:35:27.007+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.83"
time=2025-09-27T19:35:27.257+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.84"
time=2025-09-27T19:35:27.508+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.84"
time=2025-09-27T19:35:27.758+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.84"
time=2025-09-27T19:35:28.008+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.84"
time=2025-09-27T19:35:28.258+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.84"
time=2025-09-27T19:35:28.509+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.84"
time=2025-09-27T19:35:28.759+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.84"
time=2025-09-27T19:35:29.009+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.84"
time=2025-09-27T19:35:29.259+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.85"
time=2025-09-27T19:35:29.510+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.85"
time=2025-09-27T19:35:29.760+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.85"
time=2025-09-27T19:35:30.010+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.85"
time=2025-09-27T19:35:30.260+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.85"
time=2025-09-27T19:35:30.511+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.85"
time=2025-09-27T19:35:30.761+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.85"
time=2025-09-27T19:35:31.012+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.86"
time=2025-09-27T19:35:31.262+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.86"
time=2025-09-27T19:35:31.513+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.86"
time=2025-09-27T19:35:31.763+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.86"
time=2025-09-27T19:35:32.013+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.86"
time=2025-09-27T19:35:32.264+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.86"
time=2025-09-27T19:35:32.514+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.86"
time=2025-09-27T19:35:32.764+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.86"
time=2025-09-27T19:35:33.014+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.87"
time=2025-09-27T19:35:33.265+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.87"
time=2025-09-27T19:35:33.515+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.87"
time=2025-09-27T19:35:33.765+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.87"
time=2025-09-27T19:35:34.016+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.87"
time=2025-09-27T19:35:34.266+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.87"
time=2025-09-27T19:35:34.516+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.87"
time=2025-09-27T19:35:34.767+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.87"
time=2025-09-27T19:35:35.017+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.88"
time=2025-09-27T19:35:35.268+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.88"
time=2025-09-27T19:35:35.518+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.88"
time=2025-09-27T19:35:35.768+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.88"
time=2025-09-27T19:35:36.018+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.88"
time=2025-09-27T19:35:36.269+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.88"
time=2025-09-27T19:35:36.519+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.88"
time=2025-09-27T19:35:36.769+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.88"
time=2025-09-27T19:35:37.020+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.89"
time=2025-09-27T19:35:37.270+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.89"
time=2025-09-27T19:35:37.520+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.89"
time=2025-09-27T19:35:37.771+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.89"
time=2025-09-27T19:35:38.021+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.89"
time=2025-09-27T19:35:38.271+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.89"
time=2025-09-27T19:35:38.521+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.89"
time=2025-09-27T19:35:38.771+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.89"
time=2025-09-27T19:35:39.021+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.90"
time=2025-09-27T19:35:39.272+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.90"
time=2025-09-27T19:35:39.522+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.90"
time=2025-09-27T19:35:39.772+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.90"
time=2025-09-27T19:35:40.022+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.90"
time=2025-09-27T19:35:40.273+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.90"
time=2025-09-27T19:35:40.523+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.90"
time=2025-09-27T19:35:40.774+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.90"
time=2025-09-27T19:35:41.024+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.91"
time=2025-09-27T19:35:41.274+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.91"
time=2025-09-27T19:35:41.524+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.91"
time=2025-09-27T19:35:41.775+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.91"
time=2025-09-27T19:35:42.025+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.91"
time=2025-09-27T19:35:42.276+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.91"
time=2025-09-27T19:35:42.526+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.91"
time=2025-09-27T19:35:42.777+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.92"
time=2025-09-27T19:35:43.027+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.92"
time=2025-09-27T19:35:43.277+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.92"
time=2025-09-27T19:35:43.527+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.92"
time=2025-09-27T19:35:43.778+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.92"
time=2025-09-27T19:35:44.028+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.92"
time=2025-09-27T19:35:44.278+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.92"
time=2025-09-27T19:35:44.529+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.92"
time=2025-09-27T19:35:44.779+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.93"
time=2025-09-27T19:35:45.029+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.93"
time=2025-09-27T19:35:45.280+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.93"
time=2025-09-27T19:35:45.530+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.93"
time=2025-09-27T19:35:45.780+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.93"
time=2025-09-27T19:35:46.031+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.93"
time=2025-09-27T19:35:46.281+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.93"
time=2025-09-27T19:35:46.531+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.93"
time=2025-09-27T19:35:46.782+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.94"
time=2025-09-27T19:35:47.032+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.94"
time=2025-09-27T19:35:47.282+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.94"
time=2025-09-27T19:35:47.533+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.94"
time=2025-09-27T19:35:47.783+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.94"
time=2025-09-27T19:35:48.033+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.94"
time=2025-09-27T19:35:48.284+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.94"
time=2025-09-27T19:35:48.534+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.94"
time=2025-09-27T19:35:48.784+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.95"
time=2025-09-27T19:35:49.035+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.95"
time=2025-09-27T19:35:49.285+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.95"
time=2025-09-27T19:35:49.535+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.95"
time=2025-09-27T19:35:49.785+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.95"
time=2025-09-27T19:35:50.036+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.95"
time=2025-09-27T19:35:50.286+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.95"
time=2025-09-27T19:35:50.536+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.95"
time=2025-09-27T19:35:50.787+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.96"
time=2025-09-27T19:35:51.037+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.96"
time=2025-09-27T19:35:51.287+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.96"
time=2025-09-27T19:35:51.537+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.96"
time=2025-09-27T19:35:51.787+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.96"
time=2025-09-27T19:35:52.038+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.96"
time=2025-09-27T19:35:52.288+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.96"
time=2025-09-27T19:35:52.538+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.96"
time=2025-09-27T19:35:52.788+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.96"
time=2025-09-27T19:35:53.039+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.97"
time=2025-09-27T19:35:53.289+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.97"
time=2025-09-27T19:35:53.539+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.97"
time=2025-09-27T19:35:53.789+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.97"
time=2025-09-27T19:35:54.040+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.97"
time=2025-09-27T19:35:54.290+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.97"
time=2025-09-27T19:35:54.540+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.97"
time=2025-09-27T19:35:54.791+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.97"
time=2025-09-27T19:35:55.041+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.98"
time=2025-09-27T19:35:55.292+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.98"
time=2025-09-27T19:35:55.542+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.98"
time=2025-09-27T19:35:55.792+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.98"
time=2025-09-27T19:35:56.042+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.98"
time=2025-09-27T19:35:56.293+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.98"
time=2025-09-27T19:35:56.543+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.98"
time=2025-09-27T19:35:56.794+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.99"
time=2025-09-27T19:35:57.044+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.99"
time=2025-09-27T19:35:57.295+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.99"
time=2025-09-27T19:35:57.545+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.99"
time=2025-09-27T19:35:57.795+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.99"
time=2025-09-27T19:35:58.045+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.99"
time=2025-09-27T19:35:58.295+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.99"
time=2025-09-27T19:35:58.546+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.99"
time=2025-09-27T19:35:58.796+02:00 level=DEBUG source=server.go:1295 msg="model load progress 1.00"
time=2025-09-27T19:35:59.046+02:00 level=DEBUG source=server.go:1295 msg="model load progress 1.00"
time=2025-09-27T19:35:59.297+02:00 level=DEBUG source=server.go:1295 msg="model load progress 1.00"
time=2025-09-27T19:35:59.547+02:00 level=DEBUG source=server.go:1295 msg="model load progress 1.00"
time=2025-09-27T19:35:59.766+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=qwen3moe.pooling_type default=0
time=2025-09-27T19:35:59.797+02:00 level=INFO source=server.go:1289 msg="llama runner started in 347.30 seconds"
time=2025-09-27T19:35:59.798+02:00 level=DEBUG source=sched.go:482 msg="finished setting up" runner.name=registry.ollama.ai/library/Qwen3-Coder-30B-A3B:latest runner.inference=cuda runner.devices=1 runner.size="13.2 GiB" runner.vram="13.2 GiB" runner.parallel=1 runner.pid=14444 runner.model=D:\ollama\models\blobs\sha256-17d51f5310e9a598e5ac914d30f401fb2d1bc3b6a06a846919099eac09364ae1 runner.num_ctx=8192
[GIN] 2025/09/27 - 19:35:59 | 200 |         5m47s |       127.0.0.1 | POST     "/api/generate"
time=2025-09-27T19:35:59.799+02:00 level=DEBUG source=sched.go:490 msg="context for request finished"
time=2025-09-27T19:35:59.799+02:00 level=DEBUG source=sched.go:286 msg="runner with non-zero duration has gone idle, adding timer" runner.name=registry.ollama.ai/library/Qwen3-Coder-30B-A3B:latest runner.inference=cuda runner.devices=1 runner.size="13.2 GiB" runner.vram="13.2 GiB" runner.parallel=1 runner.pid=14444 runner.model=D:\ollama\models\blobs\sha256-17d51f5310e9a598e5ac914d30f401fb2d1bc3b6a06a846919099eac09364ae1 runner.num_ctx=8192 duration=2562047h47m16.854775807s
time=2025-09-27T19:35:59.799+02:00 level=DEBUG source=sched.go:304 msg="after processing request finished event" runner.name=registry.ollama.ai/library/Qwen3-Coder-30B-A3B:latest runner.inference=cuda runner.devices=1 runner.size="13.2 GiB" runner.vram="13.2 GiB" runner.parallel=1 runner.pid=14444 runner.model=D:\ollama\models\blobs\sha256-17d51f5310e9a598e5ac914d30f401fb2d1bc3b6a06a846919099eac09364ae1 runner.num_ctx=8192 refCount=0
time=2025-09-27T19:37:12.545+02:00 level=DEBUG source=sched.go:580 msg="evaluating already loaded" model=D:\ollama\models\blobs\sha256-17d51f5310e9a598e5ac914d30f401fb2d1bc3b6a06a846919099eac09364ae1
time=2025-09-27T19:37:12.547+02:00 level=DEBUG source=server.go:1388 msg="completion request" images=0 prompt=55 format=""
time=2025-09-27T19:37:12.557+02:00 level=DEBUG source=cache.go:142 msg="loading cache slot" id=0 cache=0 prompt=9 used=0 remaining=9
[GIN] 2025/09/27 - 19:37:13 | 200 |    809.1608ms |       127.0.0.1 | POST     "/api/chat"
time=2025-09-27T19:37:13.283+02:00 level=DEBUG source=sched.go:377 msg="context for request finished" runner.name=registry.ollama.ai/library/Qwen3-Coder-30B-A3B:latest runner.inference=cuda runner.devices=1 runner.size="13.2 GiB" runner.vram="13.2 GiB" runner.parallel=1 runner.pid=14444 runner.model=D:\ollama\models\blobs\sha256-17d51f5310e9a598e5ac914d30f401fb2d1bc3b6a06a846919099eac09364ae1 runner.num_ctx=8192
time=2025-09-27T19:37:13.284+02:00 level=DEBUG source=sched.go:286 msg="runner with non-zero duration has gone idle, adding timer" runner.name=registry.ollama.ai/library/Qwen3-Coder-30B-A3B:latest runner.inference=cuda runner.devices=1 runner.size="13.2 GiB" runner.vram="13.2 GiB" runner.parallel=1 runner.pid=14444 runner.model=D:\ollama\models\blobs\sha256-17d51f5310e9a598e5ac914d30f401fb2d1bc3b6a06a846919099eac09364ae1 runner.num_ctx=8192 duration=2562047h47m16.854775807s
time=2025-09-27T19:37:13.284+02:00 level=DEBUG source=sched.go:304 msg="after processing request finished event" runner.name=registry.ollama.ai/library/Qwen3-Coder-30B-A3B:latest runner.inference=cuda runner.devices=1 runner.size="13.2 GiB" runner.vram="13.2 GiB" runner.parallel=1 runner.pid=14444 runner.model=D:\ollama\models\blobs\sha256-17d51f5310e9a598e5ac914d30f401fb2d1bc3b6a06a846919099eac09364ae1 runner.num_ctx=8192 refCount=0

OS

Windows

GPU

Nvidia

CPU

Intel

Ollama version

0.12.3

Originally created by @asdnemasd on GitHub (Sep 27, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/12431 ### What is the issue? With the Qwen3-Coder-30B-A3B model and Ollama v0.12.1, the model loads at around 100 MB/s. However, with the latest version, the model only loads at around 30 MB/s, which is significantly slower. In both cases, the model loads from an HDD. ### Relevant log output ```shell time=2025-09-27T19:30:06.369+02:00 level=INFO source=routes.go:1475 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:DEBUG OLLAMA_FLASH_ATTENTION:true OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:2562047h47m16.854775807s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:D:\\ollama\\models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:true OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES:]" time=2025-09-27T19:30:06.437+02:00 level=INFO source=images.go:518 msg="total blobs: 36" time=2025-09-27T19:30:06.439+02:00 level=INFO source=images.go:525 msg="total unused blobs removed: 0" time=2025-09-27T19:30:06.440+02:00 level=INFO source=routes.go:1528 msg="Listening on 127.0.0.1:11434 (version 0.12.3)" time=2025-09-27T19:30:06.440+02:00 level=DEBUG source=sched.go:121 msg="starting llm scheduler" time=2025-09-27T19:30:06.440+02:00 level=INFO source=gpu.go:217 msg="looking for compatible GPUs" time=2025-09-27T19:30:06.440+02:00 level=INFO source=gpu_windows.go:167 msg=packages count=1 time=2025-09-27T19:30:06.441+02:00 level=INFO source=gpu_windows.go:214 msg="" package=0 cores=6 efficiency=0 threads=6 time=2025-09-27T19:30:06.441+02:00 level=DEBUG source=gpu.go:98 msg="searching for GPU discovery libraries for NVIDIA" time=2025-09-27T19:30:06.441+02:00 level=DEBUG source=gpu.go:520 msg="Searching for GPU library" name=nvml.dll time=2025-09-27T19:30:06.441+02:00 level=DEBUG source=gpu.go:544 msg="gpu library search" globs="[C:\\Users\\asd\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\nvml.dll C:\\Program Files\\ImageMagick-7.1.2-Q16-HDRI\\nvml.dll C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.9\\bin\\nvml.dll C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.9\\libnvvp\\nvml.dll C:\\Program Files\\ImageMagick-7.1.1-Q16-HDRI\\nvml.dll C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.8\\bin\\nvml.dll C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.8\\libnvvp\\nvml.dll c:\\program files\\nvidia gpu computing toolkit\\cuda\\v11.8\\bin\\nvml.dll c:\\program files\\nvidia gpu computing toolkit\\cuda\\v11.8\\libnvvp\\nvml.dll c:\\program files\\eclipse adoptium\\jre-21.0.6.7-hotspot\\bin\\nvml.dll c:\\program files\\lapce\\nvml.dll c:\\program files\\imagemagick-7.1.1-q16-hdri\\nvml.dll c:\\p\\ffmpeg-6.1.1-full_build\\bin\\nvml.dll c:\\program files\\dotnet\\nvml.dll c:\\windows\\system32\\windowspowershell\\v1.0\\nvml.dll c:\\program files\\gource\\cmd\\nvml.dll c:\\program files\\nodejs\\nvml.dll c:\\program files\\git\\cmd\\nvml.dll c:\\program files\\exiftoolgui\\nvml.dll c:\\program files\\git\\cmd\\nvml.dll c:\\program files\\wireguard\\nvml.dll C:\\Program Files (x86)\\NVIDIA Corporation\\PhysX\\Common\\nvml.dll C:\\Program Files\\NVIDIA Corporation\\NVIDIA App\\NvDLISR\\nvml.dll C:\\Program Files\\Sunshine\\nvml.dll C:\\Program Files\\Sunshine\\tools\\nvml.dll C:\\Program Files\\Git\\cmd\\nvml.dll C:\\Program Files (x86)\\ZeroTier\\One\\nvml.dll C:\\Users\\asd\\AppData\\Local\\Programs\\Python\\Launcher\\nvml.dll C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.8\\bin\\nvml.dll C:\\Users\\asd\\AppData\\Local\\Programs\\Python\\Python39\\Scripts\\nvml.dll C:\\Users\\asd\\AppData\\Local\\Programs\\Python\\Python39\\nvml.dll C:\\Users\\asd\\AppData\\Local\\Programs\\Python\\Python312\\Scripts\\nvml.dll C:\\Users\\asd\\AppData\\Local\\Programs\\Python\\Python312\\nvml.dll C:\\Users\\asd\\AppData\\Local\\UniGetUI\\Chocolatey\\bin\\nvml.dll C:\\Users\\asd\\AppData\\Local\\Programs\\Python\\Python310\\Scripts\\nvml.dll C:\\Users\\asd\\AppData\\Local\\Programs\\Python\\Python310\\nvml.dll C:\\Users\\asd\\AppData\\Local\\Programs\\WingetUI\\choco-cli\\bin\\nvml.dll C:\\Users\\asd\\AppData\\Local\\Microsoft\\WindowsApps\\nvml.dll C:\\Users\\asd\\AppData\\Local\\Programs\\VSCodium\\bin\\nvml.dll C:\\Users\\asd\\AppData\\Local\\Programs\\Hyper\\resources\\bin\\nvml.dll C:\\Users\\asd\\AppData\\Local\\Programs\\Ollama\\nvml.dll C:\\Users\\asd\\.dotnet\\tools\\nvml.dll C:\\msys64\\mingw64\\bin\\nvml.dll C:\\msys64\\usr\\bin\\nvml.dll C:\\Windows\\System32\\nvml.dll C:\\p\\path\\nvml.dll C:\\Program Files\\Airshipper\\nvml.dll C:\\Users\\asd\\.dotnet\\tools\\nvml.dll c:\\users\\asd\\.local\\bin\\nvml.dll C:\\Users\\asd\\AppData\\Roaming\\npm\\nvml.dll C:\\Users\\asd\\.lmstudio\\bin\\nvml.dll C:\\Users\\asd\\AppData\\Local\\Microsoft\\WinGet\\Packages\\Gyan.FFmpeg.Essentials_Microsoft.Winget.Source_8wekyb3d8bbwe\\ffmpeg-7.1.1-essentials_build\\bin\\nvml.dll C:\\Users\\asd\\.dotnet\\tools\\nvml.dll C:\\Users\\asd\\nvml.dll c:\\Windows\\System32\\nvml.dll]" time=2025-09-27T19:30:06.442+02:00 level=DEBUG source=gpu.go:548 msg="skipping PhysX cuda library path" path="C:\\Program Files (x86)\\NVIDIA Corporation\\PhysX\\Common\\nvml.dll" time=2025-09-27T19:30:06.444+02:00 level=DEBUG source=gpu.go:577 msg="discovered GPU libraries" paths="[C:\\Windows\\System32\\nvml.dll c:\\Windows\\System32\\nvml.dll]" time=2025-09-27T19:30:06.466+02:00 level=DEBUG source=gpu.go:111 msg="nvidia-ml loaded" library=C:\Windows\System32\nvml.dll time=2025-09-27T19:30:06.466+02:00 level=DEBUG source=gpu.go:520 msg="Searching for GPU library" name=nvcuda.dll time=2025-09-27T19:30:06.468+02:00 level=DEBUG source=gpu.go:548 msg="skipping PhysX cuda library path" path="C:\\Program Files (x86)\\NVIDIA Corporation\\PhysX\\Common\\nvcuda.dll" time=2025-09-27T19:30:06.471+02:00 level=DEBUG source=gpu.go:577 msg="discovered GPU libraries" paths=[C:\Windows\System32\nvcuda.dll] initializing C:\Windows\System32\nvcuda.dll dlsym: cuInit - 00007FFADD1D1F80 dlsym: cuDriverGetVersion - 00007FFADD1D2020 dlsym: cuDeviceGetCount - 00007FFADD1D2816 dlsym: cuDeviceGet - 00007FFADD1D2810 dlsym: cuDeviceGetAttribute - 00007FFADD1D2170 dlsym: cuDeviceGetUuid - 00007FFADD1D2822 dlsym: cuDeviceGetName - 00007FFADD1D281C dlsym: cuCtxCreate_v3 - 00007FFADD1D2894 dlsym: cuMemGetInfo_v2 - 00007FFADD1D2996 dlsym: cuCtxDestroy - 00007FFADD1D28A6 calling cuInit calling cuDriverGetVersion raw version 0x2f3a CUDA driver version: 12.9 calling cuDeviceGetCount device count 1 time=2025-09-27T19:30:06.495+02:00 level=DEBUG source=gpu.go:125 msg="detected GPUs" count=1 library=C:\Windows\System32\nvcuda.dll [GPU-d0f9ed85-a082-2e7e-f1ca-ec56c347fcb0] CUDA totalMem 16379mb [GPU-d0f9ed85-a082-2e7e-f1ca-ec56c347fcb0] CUDA freeMem 15225mb [GPU-d0f9ed85-a082-2e7e-f1ca-ec56c347fcb0] Compute Capability 8.9 time=2025-09-27T19:30:06.597+02:00 level=WARN source=cuda_common.go:60 msg="old CUDA driver detected - please upgrade to a newer driver for best performance" version=12.9 time=2025-09-27T19:30:06.600+02:00 level=DEBUG source=amd_windows.go:34 msg="unable to load amdhip64_6.dll, please make sure to upgrade to the latest amd driver: A megadott modul nem található." releasing cuda driver library releasing nvml library time=2025-09-27T19:30:06.602+02:00 level=INFO source=types.go:131 msg="inference compute" id=GPU-d0f9ed85-a082-2e7e-f1ca-ec56c347fcb0 library=cuda variant=v12 compute=8.9 driver=12.9 name="NVIDIA GeForce RTX 4060 Ti" total="16.0 GiB" available="14.9 GiB" time=2025-09-27T19:30:06.602+02:00 level=INFO source=routes.go:1569 msg="entering low vram mode" "total vram"="16.0 GiB" threshold="20.0 GiB" [GIN] 2025/09/27 - 19:30:12 | 200 | 0s | 127.0.0.1 | HEAD "/" time=2025-09-27T19:30:12.270+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32 [GIN] 2025/09/27 - 19:30:12 | 200 | 102.5496ms | 127.0.0.1 | POST "/api/show" time=2025-09-27T19:30:12.356+02:00 level=DEBUG source=gpu.go:410 msg="updating system memory data" before.total="23.9 GiB" before.free="18.4 GiB" before.free_swap="24.6 GiB" now.total="23.9 GiB" now.free="18.4 GiB" now.free_swap="24.5 GiB" time=2025-09-27T19:30:12.377+02:00 level=DEBUG source=gpu.go:460 msg="updating cuda memory data" gpu=GPU-d0f9ed85-a082-2e7e-f1ca-ec56c347fcb0 name="NVIDIA GeForce RTX 4060 Ti" overhead="0 B" before.total="16.0 GiB" before.free="14.9 GiB" now.total="16.0 GiB" now.free="14.7 GiB" now.used="1.3 GiB" releasing nvml library time=2025-09-27T19:30:12.379+02:00 level=DEBUG source=sched.go:188 msg="updating default concurrency" OLLAMA_MAX_LOADED_MODELS=3 gpu_count=1 time=2025-09-27T19:30:12.394+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32 time=2025-09-27T19:30:12.397+02:00 level=DEBUG source=sched.go:208 msg="loading first model" model=D:\ollama\models\blobs\sha256-17d51f5310e9a598e5ac914d30f401fb2d1bc3b6a06a846919099eac09364ae1 time=2025-09-27T19:30:12.453+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32 time=2025-09-27T19:30:12.456+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=qwen3moe.pooling_type default=0 time=2025-09-27T19:30:12.456+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.bos_token_id default=0 time=2025-09-27T19:30:12.457+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.add_eos_token default=false time=2025-09-27T19:30:12.457+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}" time=2025-09-27T19:30:12.460+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=qwen3moe.rope.scaling.factor default=1 time=2025-09-27T19:30:12.460+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=qwen3moe.norm_top_k_prob default=true time=2025-09-27T19:30:12.461+02:00 level=DEBUG source=gpu.go:410 msg="updating system memory data" before.total="23.9 GiB" before.free="18.4 GiB" before.free_swap="24.5 GiB" now.total="23.9 GiB" now.free="18.4 GiB" now.free_swap="24.5 GiB" time=2025-09-27T19:30:12.486+02:00 level=DEBUG source=gpu.go:460 msg="updating cuda memory data" gpu=GPU-d0f9ed85-a082-2e7e-f1ca-ec56c347fcb0 name="NVIDIA GeForce RTX 4060 Ti" overhead="0 B" before.total="16.0 GiB" before.free="14.7 GiB" now.total="16.0 GiB" now.free="14.7 GiB" now.used="1.3 GiB" releasing nvml library time=2025-09-27T19:30:12.489+02:00 level=INFO source=server.go:217 msg="enabling flash attention" time=2025-09-27T19:30:12.499+02:00 level=DEBUG source=server.go:324 msg="adding gpu library" path=C:\Users\asd\AppData\Local\Programs\Ollama\lib\ollama\cuda_v12 time=2025-09-27T19:30:12.499+02:00 level=DEBUG source=server.go:332 msg="adding gpu dependency paths" paths=[C:\Users\asd\AppData\Local\Programs\Ollama\lib\ollama\cuda_v12] time=2025-09-27T19:30:12.500+02:00 level=INFO source=server.go:399 msg="starting runner" cmd="C:\\Users\\asd\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --model D:\\ollama\\models\\blobs\\sha256-17d51f5310e9a598e5ac914d30f401fb2d1bc3b6a06a846919099eac09364ae1 --port 60264" time=2025-09-27T19:30:12.507+02:00 level=INFO source=server.go:672 msg="loading model" "model layers"=49 requested=99 time=2025-09-27T19:30:12.508+02:00 level=DEBUG source=gpu.go:410 msg="updating system memory data" before.total="23.9 GiB" before.free="18.4 GiB" before.free_swap="24.5 GiB" now.total="23.9 GiB" now.free="18.4 GiB" now.free_swap="24.5 GiB" time=2025-09-27T19:30:12.517+02:00 level=DEBUG source=gpu.go:460 msg="updating cuda memory data" gpu=GPU-d0f9ed85-a082-2e7e-f1ca-ec56c347fcb0 name="NVIDIA GeForce RTX 4060 Ti" overhead="0 B" before.total="16.0 GiB" before.free="14.7 GiB" now.total="16.0 GiB" now.free="14.7 GiB" now.used="1.3 GiB" releasing nvml library time=2025-09-27T19:30:12.519+02:00 level=INFO source=server.go:678 msg="system memory" total="23.9 GiB" free="18.4 GiB" free_swap="24.5 GiB" time=2025-09-27T19:30:12.519+02:00 level=INFO source=server.go:686 msg="gpu memory" id=GPU-d0f9ed85-a082-2e7e-f1ca-ec56c347fcb0 available="14.2 GiB" free="14.7 GiB" minimum="457.0 MiB" overhead="0 B" time=2025-09-27T19:30:12.548+02:00 level=INFO source=runner.go:1252 msg="starting ollama engine" time=2025-09-27T19:30:12.559+02:00 level=INFO source=runner.go:1287 msg="Server listening on 127.0.0.1:60264" time=2025-09-27T19:30:12.563+02:00 level=INFO source=runner.go:1171 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:8192 KvCacheType: NumThreads:6 GPULayers:49[ID:GPU-d0f9ed85-a082-2e7e-f1ca-ec56c347fcb0 Layers:49(0..48)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2025-09-27T19:30:12.595+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32 time=2025-09-27T19:30:12.598+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.description default="" time=2025-09-27T19:30:12.599+02:00 level=INFO source=ggml.go:131 msg="" architecture=qwen3moe file_type=Q3_K_S name=Qwen3-Coder-30B-A3B-Instruct description="" num_tensors=579 num_key_values=45 time=2025-09-27T19:30:12.600+02:00 level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=C:\Users\asd\AppData\Local\Programs\Ollama\lib\ollama load_backend: loaded CPU backend from C:\Users\asd\AppData\Local\Programs\Ollama\lib\ollama\ggml-cpu-haswell.dll time=2025-09-27T19:30:13.385+02:00 level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=C:\Users\asd\AppData\Local\Programs\Ollama\lib\ollama\cuda_v12 ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 CUDA devices: Device 0: NVIDIA GeForce RTX 4060 Ti, compute capability 8.9, VMM: yes, ID: GPU-d0f9ed85-a082-2e7e-f1ca-ec56c347fcb0 load_backend: loaded CUDA backend from C:\Users\asd\AppData\Local\Programs\Ollama\lib\ollama\cuda_v12\ggml-cuda.dll time=2025-09-27T19:30:14.894+02:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=500,600,610,700,750,800,860,870,890,900,1200 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(clang) time=2025-09-27T19:30:14.901+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=qwen3moe.pooling_type default=0 time=2025-09-27T19:30:14.901+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.bos_token_id default=0 time=2025-09-27T19:30:14.901+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.add_eos_token default=false time=2025-09-27T19:30:14.901+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}" time=2025-09-27T19:30:14.901+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=qwen3moe.rope.scaling.factor default=1 time=2025-09-27T19:30:14.901+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=qwen3moe.norm_top_k_prob default=true time=2025-09-27T19:30:15.008+02:00 level=DEBUG source=ggml.go:794 msg="compute graph" nodes=2982 splits=2 time=2025-09-27T19:30:15.009+02:00 level=DEBUG source=backend.go:310 msg="model weights" device=CUDA0 size="12.2 GiB" time=2025-09-27T19:30:15.010+02:00 level=DEBUG source=backend.go:315 msg="model weights" device=CPU size="127.5 MiB" time=2025-09-27T19:30:15.010+02:00 level=DEBUG source=backend.go:321 msg="kv cache" device=CUDA0 size="768.0 MiB" time=2025-09-27T19:30:15.010+02:00 level=DEBUG source=backend.go:332 msg="compute graph" device=CUDA0 size="92.0 MiB" time=2025-09-27T19:30:15.011+02:00 level=DEBUG source=backend.go:337 msg="compute graph" device=CPU size="4.0 MiB" time=2025-09-27T19:30:15.012+02:00 level=DEBUG source=backend.go:342 msg="total memory" size="13.2 GiB" time=2025-09-27T19:30:15.012+02:00 level=DEBUG source=server.go:717 msg=memory success=true required.InputWeights=133703680U required.CPU.Graph=4194304U required.CUDA0.ID=GPU-d0f9ed85-a082-2e7e-f1ca-ec56c347fcb0 required.CUDA0.Weights="[268698752U 268698752U 268698752U 268698752U 268698752U 268698752U 268698752U 268698752U 268698752U 268698752U 268698752U 268698752U 268698752U 268698752U 268698752U 268698752U 268698752U 268698752U 268698752U 268698752U 268698752U 268698752U 268698752U 268698752U 268698752U 268698752U 268698752U 268698752U 268698752U 268698752U 268698752U 268698752U 268698752U 268698752U 268698752U 268698752U 268698752U 268698752U 268698752U 268698752U 268698752U 268698752U 268698752U 268698752U 268698752U 268698752U 268698752U 268698752U 255260672U]" required.CUDA0.Cache="[16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 0U]" required.CUDA0.Graph=96471168U time=2025-09-27T19:30:15.013+02:00 level=DEBUG source=server.go:894 msg="available gpu" id=GPU-d0f9ed85-a082-2e7e-f1ca-ec56c347fcb0 "available layer vram"="14.1 GiB" backoff=0.00 minimum="457.0 MiB" overhead="0 B" graph="92.0 MiB" time=2025-09-27T19:30:15.013+02:00 level=DEBUG source=server.go:728 msg="new layout created" layers="49[ID:GPU-d0f9ed85-a082-2e7e-f1ca-ec56c347fcb0 Layers:49(0..48)]" time=2025-09-27T19:30:15.013+02:00 level=INFO source=runner.go:1171 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:8192 KvCacheType: NumThreads:6 GPULayers:49[ID:GPU-d0f9ed85-a082-2e7e-f1ca-ec56c347fcb0 Layers:49(0..48)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2025-09-27T19:30:15.041+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32 time=2025-09-27T19:30:15.081+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=qwen3moe.pooling_type default=0 time=2025-09-27T19:30:15.081+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.bos_token_id default=0 time=2025-09-27T19:30:15.081+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.add_eos_token default=false time=2025-09-27T19:30:15.081+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}" time=2025-09-27T19:30:15.081+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=qwen3moe.rope.scaling.factor default=1 time=2025-09-27T19:30:15.081+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=qwen3moe.norm_top_k_prob default=true time=2025-09-27T19:30:15.376+02:00 level=DEBUG source=ggml.go:794 msg="compute graph" nodes=2982 splits=2 time=2025-09-27T19:30:15.377+02:00 level=DEBUG source=backend.go:310 msg="model weights" device=CUDA0 size="12.2 GiB" time=2025-09-27T19:30:15.377+02:00 level=DEBUG source=backend.go:315 msg="model weights" device=CPU size="127.5 MiB" time=2025-09-27T19:30:15.378+02:00 level=DEBUG source=backend.go:321 msg="kv cache" device=CUDA0 size="768.0 MiB" time=2025-09-27T19:30:15.378+02:00 level=DEBUG source=backend.go:332 msg="compute graph" device=CUDA0 size="92.0 MiB" time=2025-09-27T19:30:15.378+02:00 level=DEBUG source=backend.go:337 msg="compute graph" device=CPU size="4.0 MiB" time=2025-09-27T19:30:15.379+02:00 level=DEBUG source=backend.go:342 msg="total memory" size="13.2 GiB" time=2025-09-27T19:30:15.379+02:00 level=DEBUG source=server.go:717 msg=memory success=true required.InputWeights=133703680A required.CPU.Graph=4194304A required.CUDA0.ID=GPU-d0f9ed85-a082-2e7e-f1ca-ec56c347fcb0 required.CUDA0.Weights="[268698752A 268698752A 268698752A 268698752A 268698752A 268698752A 268698752A 268698752A 268698752A 268698752A 268698752A 268698752A 268698752A 268698752A 268698752A 268698752A 268698752A 268698752A 268698752A 268698752A 268698752A 268698752A 268698752A 268698752A 268698752A 268698752A 268698752A 268698752A 268698752A 268698752A 268698752A 268698752A 268698752A 268698752A 268698752A 268698752A 268698752A 268698752A 268698752A 268698752A 268698752A 268698752A 268698752A 268698752A 268698752A 268698752A 268698752A 268698752A 255260672A]" required.CUDA0.Cache="[16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 0U]" required.CUDA0.Graph=96471168A time=2025-09-27T19:30:15.380+02:00 level=DEBUG source=server.go:894 msg="available gpu" id=GPU-d0f9ed85-a082-2e7e-f1ca-ec56c347fcb0 "available layer vram"="14.1 GiB" backoff=0.00 minimum="457.0 MiB" overhead="0 B" graph="92.0 MiB" time=2025-09-27T19:30:15.380+02:00 level=DEBUG source=server.go:728 msg="new layout created" layers="49[ID:GPU-d0f9ed85-a082-2e7e-f1ca-ec56c347fcb0 Layers:49(0..48)]" time=2025-09-27T19:30:15.381+02:00 level=INFO source=runner.go:1171 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:8192 KvCacheType: NumThreads:6 GPULayers:49[ID:GPU-d0f9ed85-a082-2e7e-f1ca-ec56c347fcb0 Layers:49(0..48)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2025-09-27T19:30:15.381+02:00 level=INFO source=backend.go:310 msg="model weights" device=CUDA0 size="12.2 GiB" time=2025-09-27T19:30:15.381+02:00 level=INFO source=backend.go:315 msg="model weights" device=CPU size="127.5 MiB" time=2025-09-27T19:30:15.382+02:00 level=INFO source=backend.go:321 msg="kv cache" device=CUDA0 size="768.0 MiB" time=2025-09-27T19:30:15.381+02:00 level=INFO source=ggml.go:487 msg="offloading 48 repeating layers to GPU" time=2025-09-27T19:30:15.382+02:00 level=INFO source=ggml.go:493 msg="offloading output layer to GPU" time=2025-09-27T19:30:15.382+02:00 level=INFO source=ggml.go:498 msg="offloaded 49/49 layers to GPU" time=2025-09-27T19:30:15.382+02:00 level=INFO source=backend.go:332 msg="compute graph" device=CUDA0 size="92.0 MiB" time=2025-09-27T19:30:15.383+02:00 level=INFO source=backend.go:337 msg="compute graph" device=CPU size="4.0 MiB" time=2025-09-27T19:30:15.383+02:00 level=INFO source=backend.go:342 msg="total memory" size="13.2 GiB" time=2025-09-27T19:30:15.384+02:00 level=INFO source=sched.go:470 msg="loaded runners" count=1 time=2025-09-27T19:30:15.386+02:00 level=INFO source=server.go:1251 msg="waiting for llama runner to start responding" time=2025-09-27T19:30:15.386+02:00 level=INFO source=server.go:1285 msg="waiting for server to become available" status="llm server loading model" time=2025-09-27T19:30:15.387+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.00" time=2025-09-27T19:30:15.637+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.00" time=2025-09-27T19:30:15.888+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.00" time=2025-09-27T19:30:16.138+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.00" time=2025-09-27T19:30:16.388+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.00" time=2025-09-27T19:34:42.457+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.70" time=2025-09-27T19:34:42.708+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.70" time=2025-09-27T19:34:42.958+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.70" time=2025-09-27T19:34:43.208+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.70" time=2025-09-27T19:34:43.459+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.70" time=2025-09-27T19:34:43.709+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.70" time=2025-09-27T19:34:43.959+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.70" time=2025-09-27T19:34:44.209+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.70" time=2025-09-27T19:34:44.459+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.71" time=2025-09-27T19:34:44.709+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.71" time=2025-09-27T19:34:44.959+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.71" time=2025-09-27T19:34:45.209+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.71" time=2025-09-27T19:34:45.460+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.71" time=2025-09-27T19:34:45.710+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.71" time=2025-09-27T19:34:45.960+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.71" time=2025-09-27T19:34:46.211+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.71" time=2025-09-27T19:34:46.461+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.71" time=2025-09-27T19:34:46.712+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.71" time=2025-09-27T19:34:46.962+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.71" time=2025-09-27T19:34:47.212+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.71" time=2025-09-27T19:34:47.463+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.71" time=2025-09-27T19:34:47.713+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.71" time=2025-09-27T19:34:47.963+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.71" time=2025-09-27T19:34:48.213+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.72" time=2025-09-27T19:34:48.463+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.72" time=2025-09-27T19:34:48.713+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.72" time=2025-09-27T19:34:48.964+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.72" time=2025-09-27T19:34:49.214+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.72" time=2025-09-27T19:34:49.464+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.72" time=2025-09-27T19:34:49.714+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.72" time=2025-09-27T19:34:49.965+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.72" time=2025-09-27T19:34:50.215+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.72" time=2025-09-27T19:34:50.466+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.72" time=2025-09-27T19:34:50.716+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.72" time=2025-09-27T19:34:50.967+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.72" time=2025-09-27T19:34:51.217+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.72" time=2025-09-27T19:34:51.467+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.72" time=2025-09-27T19:34:51.717+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.72" time=2025-09-27T19:34:51.968+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.73" time=2025-09-27T19:34:52.218+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.73" time=2025-09-27T19:34:52.468+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.73" time=2025-09-27T19:34:52.718+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.73" time=2025-09-27T19:34:52.969+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.73" time=2025-09-27T19:34:53.219+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.73" time=2025-09-27T19:34:53.469+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.73" time=2025-09-27T19:34:53.720+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.73" time=2025-09-27T19:34:53.970+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.73" time=2025-09-27T19:34:54.220+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.73" time=2025-09-27T19:34:54.470+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.73" time=2025-09-27T19:34:54.721+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.73" time=2025-09-27T19:34:54.971+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.73" time=2025-09-27T19:34:55.221+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.73" time=2025-09-27T19:34:55.472+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.73" time=2025-09-27T19:34:55.722+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.73" time=2025-09-27T19:34:55.972+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.74" time=2025-09-27T19:34:56.222+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.74" time=2025-09-27T19:34:56.473+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.74" time=2025-09-27T19:34:56.723+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.74" time=2025-09-27T19:34:56.974+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.74" time=2025-09-27T19:34:57.224+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.74" time=2025-09-27T19:34:57.474+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.74" time=2025-09-27T19:34:57.724+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.74" time=2025-09-27T19:34:57.975+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.74" time=2025-09-27T19:34:58.225+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.74" time=2025-09-27T19:34:58.476+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.74" time=2025-09-27T19:34:58.726+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.74" time=2025-09-27T19:34:58.976+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.74" time=2025-09-27T19:34:59.226+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.74" time=2025-09-27T19:34:59.476+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.74" time=2025-09-27T19:34:59.727+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.75" time=2025-09-27T19:34:59.977+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.75" time=2025-09-27T19:35:00.227+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.75" time=2025-09-27T19:35:00.477+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.75" time=2025-09-27T19:35:00.728+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.75" time=2025-09-27T19:35:00.978+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.75" time=2025-09-27T19:35:01.228+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.75" time=2025-09-27T19:35:01.478+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.75" time=2025-09-27T19:35:01.728+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.75" time=2025-09-27T19:35:01.979+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.75" time=2025-09-27T19:35:02.229+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.75" time=2025-09-27T19:35:02.479+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.75" time=2025-09-27T19:35:02.730+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.75" time=2025-09-27T19:35:02.980+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.75" time=2025-09-27T19:35:03.230+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.75" time=2025-09-27T19:35:03.481+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.76" time=2025-09-27T19:35:03.731+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.76" time=2025-09-27T19:35:03.981+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.76" time=2025-09-27T19:35:04.232+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.76" time=2025-09-27T19:35:04.482+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.76" time=2025-09-27T19:35:04.732+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.76" time=2025-09-27T19:35:04.983+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.76" time=2025-09-27T19:35:05.233+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.76" time=2025-09-27T19:35:05.483+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.76" time=2025-09-27T19:35:05.734+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.76" time=2025-09-27T19:35:05.984+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.76" time=2025-09-27T19:35:06.234+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.76" time=2025-09-27T19:35:06.484+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.76" time=2025-09-27T19:35:06.734+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.76" time=2025-09-27T19:35:06.984+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.76" time=2025-09-27T19:35:07.234+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.76" time=2025-09-27T19:35:07.485+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.76" time=2025-09-27T19:35:07.735+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.77" time=2025-09-27T19:35:07.986+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.77" time=2025-09-27T19:35:08.236+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.77" time=2025-09-27T19:35:08.486+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.77" time=2025-09-27T19:35:08.736+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.77" time=2025-09-27T19:35:08.986+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.77" time=2025-09-27T19:35:09.237+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.77" time=2025-09-27T19:35:09.487+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.77" time=2025-09-27T19:35:09.737+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.77" time=2025-09-27T19:35:09.987+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.77" time=2025-09-27T19:35:10.237+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.77" time=2025-09-27T19:35:10.488+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.77" time=2025-09-27T19:35:10.738+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.77" time=2025-09-27T19:35:10.988+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.77" time=2025-09-27T19:35:11.239+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.77" time=2025-09-27T19:35:11.489+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.77" time=2025-09-27T19:35:11.739+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.77" time=2025-09-27T19:35:11.990+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.78" time=2025-09-27T19:35:12.240+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.78" time=2025-09-27T19:35:12.490+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.78" time=2025-09-27T19:35:12.741+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.78" time=2025-09-27T19:35:12.991+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.78" time=2025-09-27T19:35:13.241+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.78" time=2025-09-27T19:35:13.492+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.78" time=2025-09-27T19:35:13.742+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.78" time=2025-09-27T19:35:13.992+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.78" time=2025-09-27T19:35:14.242+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.78" time=2025-09-27T19:35:14.492+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.78" time=2025-09-27T19:35:14.743+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.78" time=2025-09-27T19:35:14.993+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.78" time=2025-09-27T19:35:15.243+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.78" time=2025-09-27T19:35:15.493+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.78" time=2025-09-27T19:35:15.743+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.78" time=2025-09-27T19:35:15.994+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.78" time=2025-09-27T19:35:16.244+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.79" time=2025-09-27T19:35:16.494+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.79" time=2025-09-27T19:35:16.744+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.79" time=2025-09-27T19:35:16.994+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.79" time=2025-09-27T19:35:17.245+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.79" time=2025-09-27T19:35:17.495+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.79" time=2025-09-27T19:35:17.745+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.79" time=2025-09-27T19:35:17.995+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.79" time=2025-09-27T19:35:18.245+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.79" time=2025-09-27T19:35:18.496+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.79" time=2025-09-27T19:35:18.746+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.80" time=2025-09-27T19:35:18.996+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.80" time=2025-09-27T19:35:19.247+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.80" time=2025-09-27T19:35:19.497+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.80" time=2025-09-27T19:35:19.748+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.80" time=2025-09-27T19:35:19.998+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.80" time=2025-09-27T19:35:20.249+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.80" time=2025-09-27T19:35:20.499+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.80" time=2025-09-27T19:35:20.749+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.80" time=2025-09-27T19:35:21.000+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.81" time=2025-09-27T19:35:21.250+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.81" time=2025-09-27T19:35:21.501+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.81" time=2025-09-27T19:35:21.751+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.81" time=2025-09-27T19:35:22.001+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.81" time=2025-09-27T19:35:22.251+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.81" time=2025-09-27T19:35:22.502+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.81" time=2025-09-27T19:35:22.752+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.81" time=2025-09-27T19:35:23.002+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.82" time=2025-09-27T19:35:23.253+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.82" time=2025-09-27T19:35:23.503+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.82" time=2025-09-27T19:35:23.754+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.82" time=2025-09-27T19:35:24.004+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.82" time=2025-09-27T19:35:24.254+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.82" time=2025-09-27T19:35:24.504+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.82" time=2025-09-27T19:35:24.754+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.82" time=2025-09-27T19:35:25.004+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.83" time=2025-09-27T19:35:25.254+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.83" time=2025-09-27T19:35:25.505+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.83" time=2025-09-27T19:35:25.755+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.83" time=2025-09-27T19:35:26.005+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.83" time=2025-09-27T19:35:26.256+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.83" time=2025-09-27T19:35:26.506+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.83" time=2025-09-27T19:35:26.756+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.83" time=2025-09-27T19:35:27.007+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.83" time=2025-09-27T19:35:27.257+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.84" time=2025-09-27T19:35:27.508+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.84" time=2025-09-27T19:35:27.758+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.84" time=2025-09-27T19:35:28.008+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.84" time=2025-09-27T19:35:28.258+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.84" time=2025-09-27T19:35:28.509+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.84" time=2025-09-27T19:35:28.759+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.84" time=2025-09-27T19:35:29.009+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.84" time=2025-09-27T19:35:29.259+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.85" time=2025-09-27T19:35:29.510+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.85" time=2025-09-27T19:35:29.760+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.85" time=2025-09-27T19:35:30.010+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.85" time=2025-09-27T19:35:30.260+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.85" time=2025-09-27T19:35:30.511+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.85" time=2025-09-27T19:35:30.761+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.85" time=2025-09-27T19:35:31.012+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.86" time=2025-09-27T19:35:31.262+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.86" time=2025-09-27T19:35:31.513+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.86" time=2025-09-27T19:35:31.763+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.86" time=2025-09-27T19:35:32.013+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.86" time=2025-09-27T19:35:32.264+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.86" time=2025-09-27T19:35:32.514+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.86" time=2025-09-27T19:35:32.764+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.86" time=2025-09-27T19:35:33.014+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.87" time=2025-09-27T19:35:33.265+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.87" time=2025-09-27T19:35:33.515+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.87" time=2025-09-27T19:35:33.765+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.87" time=2025-09-27T19:35:34.016+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.87" time=2025-09-27T19:35:34.266+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.87" time=2025-09-27T19:35:34.516+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.87" time=2025-09-27T19:35:34.767+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.87" time=2025-09-27T19:35:35.017+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.88" time=2025-09-27T19:35:35.268+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.88" time=2025-09-27T19:35:35.518+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.88" time=2025-09-27T19:35:35.768+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.88" time=2025-09-27T19:35:36.018+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.88" time=2025-09-27T19:35:36.269+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.88" time=2025-09-27T19:35:36.519+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.88" time=2025-09-27T19:35:36.769+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.88" time=2025-09-27T19:35:37.020+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.89" time=2025-09-27T19:35:37.270+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.89" time=2025-09-27T19:35:37.520+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.89" time=2025-09-27T19:35:37.771+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.89" time=2025-09-27T19:35:38.021+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.89" time=2025-09-27T19:35:38.271+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.89" time=2025-09-27T19:35:38.521+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.89" time=2025-09-27T19:35:38.771+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.89" time=2025-09-27T19:35:39.021+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.90" time=2025-09-27T19:35:39.272+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.90" time=2025-09-27T19:35:39.522+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.90" time=2025-09-27T19:35:39.772+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.90" time=2025-09-27T19:35:40.022+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.90" time=2025-09-27T19:35:40.273+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.90" time=2025-09-27T19:35:40.523+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.90" time=2025-09-27T19:35:40.774+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.90" time=2025-09-27T19:35:41.024+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.91" time=2025-09-27T19:35:41.274+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.91" time=2025-09-27T19:35:41.524+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.91" time=2025-09-27T19:35:41.775+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.91" time=2025-09-27T19:35:42.025+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.91" time=2025-09-27T19:35:42.276+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.91" time=2025-09-27T19:35:42.526+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.91" time=2025-09-27T19:35:42.777+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.92" time=2025-09-27T19:35:43.027+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.92" time=2025-09-27T19:35:43.277+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.92" time=2025-09-27T19:35:43.527+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.92" time=2025-09-27T19:35:43.778+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.92" time=2025-09-27T19:35:44.028+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.92" time=2025-09-27T19:35:44.278+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.92" time=2025-09-27T19:35:44.529+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.92" time=2025-09-27T19:35:44.779+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.93" time=2025-09-27T19:35:45.029+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.93" time=2025-09-27T19:35:45.280+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.93" time=2025-09-27T19:35:45.530+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.93" time=2025-09-27T19:35:45.780+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.93" time=2025-09-27T19:35:46.031+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.93" time=2025-09-27T19:35:46.281+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.93" time=2025-09-27T19:35:46.531+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.93" time=2025-09-27T19:35:46.782+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.94" time=2025-09-27T19:35:47.032+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.94" time=2025-09-27T19:35:47.282+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.94" time=2025-09-27T19:35:47.533+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.94" time=2025-09-27T19:35:47.783+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.94" time=2025-09-27T19:35:48.033+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.94" time=2025-09-27T19:35:48.284+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.94" time=2025-09-27T19:35:48.534+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.94" time=2025-09-27T19:35:48.784+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.95" time=2025-09-27T19:35:49.035+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.95" time=2025-09-27T19:35:49.285+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.95" time=2025-09-27T19:35:49.535+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.95" time=2025-09-27T19:35:49.785+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.95" time=2025-09-27T19:35:50.036+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.95" time=2025-09-27T19:35:50.286+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.95" time=2025-09-27T19:35:50.536+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.95" time=2025-09-27T19:35:50.787+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.96" time=2025-09-27T19:35:51.037+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.96" time=2025-09-27T19:35:51.287+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.96" time=2025-09-27T19:35:51.537+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.96" time=2025-09-27T19:35:51.787+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.96" time=2025-09-27T19:35:52.038+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.96" time=2025-09-27T19:35:52.288+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.96" time=2025-09-27T19:35:52.538+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.96" time=2025-09-27T19:35:52.788+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.96" time=2025-09-27T19:35:53.039+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.97" time=2025-09-27T19:35:53.289+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.97" time=2025-09-27T19:35:53.539+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.97" time=2025-09-27T19:35:53.789+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.97" time=2025-09-27T19:35:54.040+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.97" time=2025-09-27T19:35:54.290+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.97" time=2025-09-27T19:35:54.540+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.97" time=2025-09-27T19:35:54.791+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.97" time=2025-09-27T19:35:55.041+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.98" time=2025-09-27T19:35:55.292+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.98" time=2025-09-27T19:35:55.542+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.98" time=2025-09-27T19:35:55.792+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.98" time=2025-09-27T19:35:56.042+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.98" time=2025-09-27T19:35:56.293+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.98" time=2025-09-27T19:35:56.543+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.98" time=2025-09-27T19:35:56.794+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.99" time=2025-09-27T19:35:57.044+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.99" time=2025-09-27T19:35:57.295+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.99" time=2025-09-27T19:35:57.545+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.99" time=2025-09-27T19:35:57.795+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.99" time=2025-09-27T19:35:58.045+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.99" time=2025-09-27T19:35:58.295+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.99" time=2025-09-27T19:35:58.546+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.99" time=2025-09-27T19:35:58.796+02:00 level=DEBUG source=server.go:1295 msg="model load progress 1.00" time=2025-09-27T19:35:59.046+02:00 level=DEBUG source=server.go:1295 msg="model load progress 1.00" time=2025-09-27T19:35:59.297+02:00 level=DEBUG source=server.go:1295 msg="model load progress 1.00" time=2025-09-27T19:35:59.547+02:00 level=DEBUG source=server.go:1295 msg="model load progress 1.00" time=2025-09-27T19:35:59.766+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=qwen3moe.pooling_type default=0 time=2025-09-27T19:35:59.797+02:00 level=INFO source=server.go:1289 msg="llama runner started in 347.30 seconds" time=2025-09-27T19:35:59.798+02:00 level=DEBUG source=sched.go:482 msg="finished setting up" runner.name=registry.ollama.ai/library/Qwen3-Coder-30B-A3B:latest runner.inference=cuda runner.devices=1 runner.size="13.2 GiB" runner.vram="13.2 GiB" runner.parallel=1 runner.pid=14444 runner.model=D:\ollama\models\blobs\sha256-17d51f5310e9a598e5ac914d30f401fb2d1bc3b6a06a846919099eac09364ae1 runner.num_ctx=8192 [GIN] 2025/09/27 - 19:35:59 | 200 | 5m47s | 127.0.0.1 | POST "/api/generate" time=2025-09-27T19:35:59.799+02:00 level=DEBUG source=sched.go:490 msg="context for request finished" time=2025-09-27T19:35:59.799+02:00 level=DEBUG source=sched.go:286 msg="runner with non-zero duration has gone idle, adding timer" runner.name=registry.ollama.ai/library/Qwen3-Coder-30B-A3B:latest runner.inference=cuda runner.devices=1 runner.size="13.2 GiB" runner.vram="13.2 GiB" runner.parallel=1 runner.pid=14444 runner.model=D:\ollama\models\blobs\sha256-17d51f5310e9a598e5ac914d30f401fb2d1bc3b6a06a846919099eac09364ae1 runner.num_ctx=8192 duration=2562047h47m16.854775807s time=2025-09-27T19:35:59.799+02:00 level=DEBUG source=sched.go:304 msg="after processing request finished event" runner.name=registry.ollama.ai/library/Qwen3-Coder-30B-A3B:latest runner.inference=cuda runner.devices=1 runner.size="13.2 GiB" runner.vram="13.2 GiB" runner.parallel=1 runner.pid=14444 runner.model=D:\ollama\models\blobs\sha256-17d51f5310e9a598e5ac914d30f401fb2d1bc3b6a06a846919099eac09364ae1 runner.num_ctx=8192 refCount=0 time=2025-09-27T19:37:12.545+02:00 level=DEBUG source=sched.go:580 msg="evaluating already loaded" model=D:\ollama\models\blobs\sha256-17d51f5310e9a598e5ac914d30f401fb2d1bc3b6a06a846919099eac09364ae1 time=2025-09-27T19:37:12.547+02:00 level=DEBUG source=server.go:1388 msg="completion request" images=0 prompt=55 format="" time=2025-09-27T19:37:12.557+02:00 level=DEBUG source=cache.go:142 msg="loading cache slot" id=0 cache=0 prompt=9 used=0 remaining=9 [GIN] 2025/09/27 - 19:37:13 | 200 | 809.1608ms | 127.0.0.1 | POST "/api/chat" time=2025-09-27T19:37:13.283+02:00 level=DEBUG source=sched.go:377 msg="context for request finished" runner.name=registry.ollama.ai/library/Qwen3-Coder-30B-A3B:latest runner.inference=cuda runner.devices=1 runner.size="13.2 GiB" runner.vram="13.2 GiB" runner.parallel=1 runner.pid=14444 runner.model=D:\ollama\models\blobs\sha256-17d51f5310e9a598e5ac914d30f401fb2d1bc3b6a06a846919099eac09364ae1 runner.num_ctx=8192 time=2025-09-27T19:37:13.284+02:00 level=DEBUG source=sched.go:286 msg="runner with non-zero duration has gone idle, adding timer" runner.name=registry.ollama.ai/library/Qwen3-Coder-30B-A3B:latest runner.inference=cuda runner.devices=1 runner.size="13.2 GiB" runner.vram="13.2 GiB" runner.parallel=1 runner.pid=14444 runner.model=D:\ollama\models\blobs\sha256-17d51f5310e9a598e5ac914d30f401fb2d1bc3b6a06a846919099eac09364ae1 runner.num_ctx=8192 duration=2562047h47m16.854775807s time=2025-09-27T19:37:13.284+02:00 level=DEBUG source=sched.go:304 msg="after processing request finished event" runner.name=registry.ollama.ai/library/Qwen3-Coder-30B-A3B:latest runner.inference=cuda runner.devices=1 runner.size="13.2 GiB" runner.vram="13.2 GiB" runner.parallel=1 runner.pid=14444 runner.model=D:\ollama\models\blobs\sha256-17d51f5310e9a598e5ac914d30f401fb2d1bc3b6a06a846919099eac09364ae1 runner.num_ctx=8192 refCount=0 ``` ### OS Windows ### GPU Nvidia ### CPU Intel ### Ollama version 0.12.3
GiteaMirror added the bug label 2026-04-22 17:14:41 -05:00
Author
Owner

@jmorganca commented on GitHub (Sep 28, 2025):

Sorry about this issue. Will merge with https://github.com/ollama/ollama/issues/12428

<!-- gh-comment-id:3342411327 --> @jmorganca commented on GitHub (Sep 28, 2025): Sorry about this issue. Will merge with https://github.com/ollama/ollama/issues/12428
Author
Owner

@asdnemasd commented on GitHub (Sep 28, 2025):

Sorry about this issue. Will merge with #12428

Yes, the names are similar, but they are different. https://github.com/ollama/ollama/issues/12428#issuecomment-3341510200

<!-- gh-comment-id:3342808148 --> @asdnemasd commented on GitHub (Sep 28, 2025): > Sorry about this issue. Will merge with [#12428](https://github.com/ollama/ollama/issues/12428) Yes, the names are similar, but they are different. https://github.com/ollama/ollama/issues/12428#issuecomment-3341510200
Author
Owner

@rick-github commented on GitHub (Sep 28, 2025):

There's a gap in the log:

time=2025-09-27T19:30:16.388+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.00"
time=2025-09-27T19:34:42.457+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.70"

Was it like this in the original log?

That aside, it took 5m44s to load the model, about 36M/s as you reported. I downloaded the model and loaded it with ollama 0.12.3 on an NVME drive and didn't experience delays in loading. Your problem may be the same as #12048, where I/O contention from multiple co-routines doing reads on a slow device causes slower loading.

You can work around this by reducing the number of co-routines that ollama launches by setting GOMAXPROCS in the environment of the server. For example, even on my NVME drive, adjusting that changes the time to load:

GOMAXPROCS 0.12.3 0.12.1
unset 0m3.247s 0m4.285s
1 0m7.188s 0m4.770s
2 0m4.395s 0m3.961s
3 0m3.690s 0m4.018s
4 0m3.226s 0m3.980s
5 0m3.029s 0m3.997s
6 0m2.968s 0m4.495s
7 0m2.711s 0m3.982s
8 0m2.692s 0m4.015s
<!-- gh-comment-id:3343617929 --> @rick-github commented on GitHub (Sep 28, 2025): There's a gap in the log: ``` time=2025-09-27T19:30:16.388+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.00" time=2025-09-27T19:34:42.457+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.70" ``` Was it like this in the original log? That aside, it took 5m44s to load the model, about 36M/s as you reported. I downloaded the model and loaded it with ollama 0.12.3 on an NVME drive and didn't experience delays in loading. Your problem may be the same as #12048, where I/O contention from multiple co-routines doing reads on a slow device causes slower loading. You can work around this by reducing the number of co-routines that ollama launches by setting `GOMAXPROCS` in the environment of the server. For example, even on my NVME drive, adjusting that changes the time to load: | GOMAXPROCS | 0.12.3 | 0.12.1 | | -- | -- | -- | | unset | 0m3.247s | 0m4.285s | | 1 | 0m7.188s | 0m4.770s | | 2 | 0m4.395s | 0m3.961s | | 3 | 0m3.690s | 0m4.018s | | 4 | 0m3.226s | 0m3.980s | | 5 | 0m3.029s | 0m3.997s | | 6 | 0m2.968s | 0m4.495s | | 7 | 0m2.711s | 0m3.982s | | 8 | 0m2.692s | 0m4.015s |
Author
Owner

@asdnemasd commented on GitHub (Sep 28, 2025):

GOMAXPROCS

Thanks for the help. Setting GOMAXPROCS to 1 fixed it. You probably didn't experience the issue because you are using a SSD.

And about the log. The original log was more than 150 000 characters, but GitHub only allowed 65 000 characters, so I had to delete a major part of it.

<!-- gh-comment-id:3343998889 --> @asdnemasd commented on GitHub (Sep 28, 2025): > `GOMAXPROCS` Thanks for the help. Setting GOMAXPROCS to 1 fixed it. You probably didn't experience the issue because you are using a SSD. And about the log. The original log was more than 150 000 characters, but GitHub only allowed 65 000 characters, so I had to delete a major part of it.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#34016