[GH-ISSUE #12754] Title: ROCm GPU (RX 7900 XTX) Not Utilized on Windows 11 with iGPU Present - Fallback to CPU #54971

Closed
opened 2026-04-29 08:05:47 -05:00 by GiteaMirror · 7 comments
Owner

Originally created by @AwarePL on GitHub (Oct 23, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/12754

Originally assigned to: @dhiltgen on GitHub.

What is the issue?

Environment:

OS: Windows 11 Professional (x64) Build 26100.6725

CPU: AMD Ryzen 9 7900X3D 12-Core Processor

GPU (Dedicated): AMD Radeon RX 7900 XTX (Navi 31 XTX, 24GB GDDR6)

GPU (Integrated): AMD Radeon Graphics (Raphael, 512MB shared)

RAM: 64 GB DDR5 @ 6000 MHz (EXPO enabled)

Motherboard: ASUS ROG CROSSHAIR X670E HERO

Ollama Version: 0.12.6 (as seen in logs)

AMD Driver Version: Adrenalin 25.9.1 (Clean install performed using DDU)

Steps to Reproduce:

Ensure Ollama is installed on a Windows system with both AMD integrated graphics and a dedicated AMD RDNA3 GPU (like RX 7900 XTX).

Open cmd as Administrator.

Stop the background Ollama service: net stop "Ollama Application"

Kill any remaining processes: taskkill /F /IM "ollama.exe" /T

Set environment variables to force dGPU usage and enable debugging. Attempts included various combinations, primarily:

set ROCR_VISIBLE_DEVICES=0

set HIP_VISIBLE_DEVICES=0 (Tried alone and in combination with ROCR_VISIBLE_DEVICES)

set OLLAMA_NUM_GPU=999 (To force loading layers)

set OLLAMA_DEBUG=1

Start the Ollama server in the same admin cmd window: ollama serve

Attempt to load and run any model (e.g., via ollama run ... in another terminal or through an API call). The model used in logs was hf.co/unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF:UD-Q3_K_XL).

Expected Behavior:

Ollama should detect and select the AMD Radeon RX 7900 XTX (identified as id=0 library=ROCm compute=gfx1100 in logs) as the primary inference device.

It should recognize the available VRAM (approx. 21+ GiB available according to initial discovery logs).

When loading a model with OLLAMA_NUM_GPU=999, it should offload all or most model layers to the RX 7900 XTX VRAM (e.g., offloaded 49/49 layers to GPU).

Inference should primarily utilize the dGPU, minimizing CPU usage for heavy computation.

Actual Behavior:

The behavior slightly varied depending on the environment variables used, but consistently resulted in CPU-only operation:

With ROCR_VISIBLE_DEVICES=0 only (or initially):

Logs sometimes showed the correct initial intent (load request="{... GPULayers:49[ID:0 Layers:49(0..48)] ...}").

However, during the VRAM check phase for loading, it reported available layer vram"="0 B" and logged "insufficient VRAM to load any model layers".

It then fell back to CPU (new layout created layers=[]) and loaded 0 layers onto the GPU (offloaded 0/49 layers to GPU).

With HIP_VISIBLE_DEVICES=0 (alone or with ROCR_VISIBLE_DEVICES):

The GPU discovery process during startup failed to select the dGPU.

It immediately selected the CPU (inference compute id=cpu library=cpu).

It entered "low vram mode" (entering low vram mode "total vram"="0 B").

All subsequent model loads targeted the CPU (GPULayers:[]) and resulted in offloaded 0/49 layers to GPU.

In all tested scenarios, the model runs entirely on the CPU, utilizing significant system RAM instead of the dedicated GPU VRAM.

Logs:

Relevant DEBUG log excerpts demonstrating the issue (GPU detection failure / incorrect VRAM reporting / fallback to CPU) can be provided upon request. Key messages include: inference compute id=cpu entering low vram mode "total vram"="0 B" available layer vram"="0 B" insufficient VRAM to load any model layers offloaded 0/XX layers to GPU

Additional Context:

The integrated GPU (AMD Radeon Graphics on the 7900X3D) must remain enabled for other purposes and cannot be disabled in the BIOS/Device Manager as a workaround.

A clean installation of the latest recommended AMD Adrenalin drivers (25.9.1) using DDU was performed, but did not resolve the issue.

The issue seems related to GPU selection or VRAM reporting specifically within Ollama's ROCm integration on Windows when multiple AMD GPUs are present.

Relevant log output

time=2025-10-23T12:41:27.646+02:00 level=INFO source=routes.go:1511 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES:0 HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:16000 OLLAMA_DEBUG:DEBUG OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11223 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:C:\\Users\\lukas\\.ollama\\models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES:0]"
time=2025-10-23T12:41:27.654+02:00 level=INFO source=images.go:522 msg="total blobs: 4"
time=2025-10-23T12:41:27.655+02:00 level=INFO source=images.go:529 msg="total unused blobs removed: 0"
time=2025-10-23T12:41:27.656+02:00 level=INFO source=routes.go:1564 msg="Listening on 127.0.0.1:11223 (version 0.12.6)"
time=2025-10-23T12:41:27.656+02:00 level=DEBUG source=sched.go:123 msg="starting llm scheduler"
time=2025-10-23T12:41:27.657+02:00 level=INFO source=runner.go:80 msg="discovering available GPUs..."
time=2025-10-23T12:41:27.657+02:00 level=DEBUG source=runner.go:448 msg="spawning runner with" OLLAMA_LIBRARY_PATH="[C:\\Users\\lukas\\AppData\\Local\\Programs\\Ollama\\lib\\ollama C:\\Users\\lukas\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\rocm]" extra_envs=[]
time=2025-10-23T12:41:28.241+02:00 level=DEBUG source=runner.go:451 msg="bootstrap discovery took" duration=583.9737ms OLLAMA_LIBRARY_PATH="[C:\\Users\\lukas\\AppData\\Local\\Programs\\Ollama\\lib\\ollama C:\\Users\\lukas\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\rocm]" extra_envs=[]
time=2025-10-23T12:41:28.242+02:00 level=DEBUG source=runner.go:448 msg="spawning runner with" OLLAMA_LIBRARY_PATH="[C:\\Users\\lukas\\AppData\\Local\\Programs\\Ollama\\lib\\ollama C:\\Users\\lukas\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\cuda_v12]" extra_envs=[]
time=2025-10-23T12:41:28.344+02:00 level=DEBUG source=runner.go:451 msg="bootstrap discovery took" duration=102.7464ms OLLAMA_LIBRARY_PATH="[C:\\Users\\lukas\\AppData\\Local\\Programs\\Ollama\\lib\\ollama C:\\Users\\lukas\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\cuda_v12]" extra_envs=[]
time=2025-10-23T12:41:28.345+02:00 level=DEBUG source=runner.go:448 msg="spawning runner with" OLLAMA_LIBRARY_PATH="[C:\\Users\\lukas\\AppData\\Local\\Programs\\Ollama\\lib\\ollama C:\\Users\\lukas\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\cuda_v13]" extra_envs=[]
time=2025-10-23T12:41:28.573+02:00 level=DEBUG source=runner.go:451 msg="bootstrap discovery took" duration=228.2463ms OLLAMA_LIBRARY_PATH="[C:\\Users\\lukas\\AppData\\Local\\Programs\\Ollama\\lib\\ollama C:\\Users\\lukas\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\cuda_v13]" extra_envs=[]
time=2025-10-23T12:41:28.573+02:00 level=DEBUG source=runner.go:118 msg="filtering out unsupported or overlapping GPU library combinations" count=1
time=2025-10-23T12:41:28.574+02:00 level=DEBUG source=runner.go:130 msg="verifying GPU is supported" library=C:\Users\lukas\AppData\Local\Programs\Ollama\lib\ollama\rocm description="AMD Radeon(TM) Graphics" compute=gfx1036 pci_id=6e:00.0
time=2025-10-23T12:41:28.574+02:00 level=DEBUG source=runner.go:448 msg="spawning runner with" OLLAMA_LIBRARY_PATH="[C:\\Users\\lukas\\AppData\\Local\\Programs\\Ollama\\lib\\ollama C:\\Users\\lukas\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\rocm]" extra_envs="[GGML_CUDA_INIT=1 HIP_VISIBLE_DEVICES=0]"
time=2025-10-23T12:41:28.739+02:00 level=DEBUG source=runner.go:451 msg="bootstrap discovery took" duration=165.2113ms OLLAMA_LIBRARY_PATH="[C:\\Users\\lukas\\AppData\\Local\\Programs\\Ollama\\lib\\ollama C:\\Users\\lukas\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\rocm]" extra_envs="[GGML_CUDA_INIT=1 HIP_VISIBLE_DEVICES=0]"
time=2025-10-23T12:41:28.739+02:00 level=DEBUG source=runner.go:45 msg="GPU bootstrap discovery took" duration=1.0837682s
time=2025-10-23T12:41:28.739+02:00 level=INFO source=types.go:129 msg="inference compute" id=cpu library=cpu compute="" name=cpu description=cpu libdirs=ollama driver="" pci_id="" type="" total="63.1 GiB" available="45.7 GiB"
time=2025-10-23T12:41:28.739+02:00 level=INFO source=routes.go:1605 msg="entering low vram mode" "total vram"="0 B" threshold="20.0 GiB"
[GIN] 2025/10/23 - 12:41:33 | 404 |            0s |       127.0.0.1 | GET      "/api/events"
[GIN] 2025/10/23 - 12:41:39 | 404 |            0s |       127.0.0.1 | GET      "/api/events"
time=2025-10-23T12:41:40.753+02:00 level=DEBUG source=runner.go:259 msg="refreshing free memory"
time=2025-10-23T12:41:40.753+02:00 level=DEBUG source=runner.go:45 msg="overall device VRAM discovery took" duration=0s
time=2025-10-23T12:41:40.753+02:00 level=DEBUG source=sched.go:195 msg="updating default concurrency" OLLAMA_MAX_LOADED_MODELS=3 gpu_count=1
time=2025-10-23T12:41:40.761+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32
time=2025-10-23T12:41:40.761+02:00 level=DEBUG source=sched.go:215 msg="loading first model" model=C:\Users\lukas\.ollama\models\blobs\sha256-69cd7578d77dffc0b17e34bad9ef998d08ae0e20ccceef21bad4e7eb3d8c553b
time=2025-10-23T12:41:40.793+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32
time=2025-10-23T12:41:40.795+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=qwen3moe.pooling_type default=0
time=2025-10-23T12:41:40.795+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.bos_token_id default=0
time=2025-10-23T12:41:40.796+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.add_eos_token default=false
time=2025-10-23T12:41:40.796+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}"
time=2025-10-23T12:41:40.796+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=qwen3moe.rope.scaling.type default=""
time=2025-10-23T12:41:40.796+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=qwen3moe.rope.scaling.factor default=1
time=2025-10-23T12:41:40.796+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=qwen3moe.rope.scaling.original_context_length default=0
time=2025-10-23T12:41:40.796+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=qwen3moe.norm_top_k_prob default=true
time=2025-10-23T12:41:40.796+02:00 level=INFO source=cpu_windows.go:139 msg=packages count=1
time=2025-10-23T12:41:40.796+02:00 level=INFO source=cpu_windows.go:186 msg="" package=0 cores=12 efficiency=0 threads=24
time=2025-10-23T12:41:40.796+02:00 level=INFO source=server.go:216 msg="enabling flash attention"
time=2025-10-23T12:41:40.799+02:00 level=INFO source=server.go:400 msg="starting runner" cmd="C:\\Users\\lukas\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --model C:\\Users\\lukas\\.ollama\\models\\blobs\\sha256-69cd7578d77dffc0b17e34bad9ef998d08ae0e20ccceef21bad4e7eb3d8c553b --port 56748"
time=2025-10-23T12:41:40.799+02:00 level=DEBUG source=server.go:401 msg=subprocess HIP_PATH="C:\\Program Files\\AMD\\ROCm\\6.4\\" HIP_PATH_64="C:\\Program Files\\AMD\\ROCm\\6.4\\" HIP_VISIBLE_DEVICES=0 OLLAMA_CONTEXT_LENGTH=16000 OLLAMA_DEBUG=1 OLLAMA_GPU_OVERHEAD=0 OLLAMA_HOST=127.0.0.1:11223 OLLAMA_LOAD_TIMEOUT=5m0s OLLAMA_MAX_LOADED_MODELS=3 OLLAMA_NUM_GPU=999 OLLAMA_NUM_PARALLEL=1 PATH="C:\\Users\\lukas\\AppData\\Local\\Programs\\Ollama\\lib\\ollama;C:\\Program Files\\cmder\\vendor\\conemu-maximus5\\ConEmu\\Scripts;C:\\Program Files\\cmder\\vendor\\conemu-maximus5;C:\\Program Files\\cmder\\vendor\\conemu-maximus5\\ConEmu;C:\\WINDOWS\\System32\\AMD;C:\\Python311\\Scripts\\;C:\\Python311\\;C:\\WINDOWS\\system32;C:\\WINDOWS;C:\\WINDOWS\\System32\\Wbem;C:\\WINDOWS\\System32\\WindowsPowerShell\\v1.0\\;C:\\WINDOWS\\System32\\OpenSSH\\;C:\\Program Files\\dotnet\\;C:\\ProgramData\\chocolatey\\bin;C:\\Users\\lukas\\AppData\\Roaming\\nvm;C:\\Program Files\\nodejs;C:\\Users\\lukas\\AppData\\Local\\Microsoft\\WindowsApps;C:\\Users\\lukas\\AppData\\Roaming\\npm;C:\\Program Files\\cmder;C:\\Users\\lukas\\AppData\\Roaming\\nvm;C:\\Program Files\\nodejs;C:\\Users\\lukas\\AppData\\Local\\Programs\\Microsoft VS Code\\bin;c:\\Users\\lukas\\AppData\\Local\\Programs\\cursor\\resources\\app\\bin;c:\\Users\\lukas\\AppData\\Local\\Programs\\cursor\\resources\\app\\bin;C:\\Program Files\\Docker\\Docker\\resources\\bin;C:\\Program Files\\Git\\cmd;C:\\Program Files\\PuTTY\\;C:\\Program Files\\PowerShell\\7\\;C:\\Users\\lukas\\AppData\\Local\\Programs\\cursor\\resources\\app\\bin;C:\\Users\\lukas\\AppData\\Local\\Programs\\Ollama;C:\\Users\\lukas\\.lmstudio\\bin;C:\\Users\\lukas\\.dotnet\\tools;C:\\Users\\lukas\\AppData\\Local\\Microsoft\\WinGet\\Packages\\SST.opencode_Microsoft.Winget.Source_8wekyb3d8bbwe;C:\\Users\\lukas\\AppData\\Local\\Microsoft\\WinGet\\Packages\\mostlygeek.llama-swap_Microsoft.Winget.Source_8wekyb3d8bbwe;C:\\Users\\lukas\\AppData\\Local\\Microsoft\\WinGet\\Packages\\ggml.llamacpp_Microsoft.Winget.Source_8wekyb3d8bbwe;C:\\Program Files\\Git\\mingw64\\bin;C:\\Program Files\\Git\\usr\\bin;C:\\Program Files\\cmder\\vendor\\bin;C:\\Users\\lukas\\AppData\\Local\\Programs\\Ollama\\lib\\ollama" ROCR_VISIBLE_DEVICES=0 OLLAMA_LIBRARY_PATH=C:\Users\lukas\AppData\Local\Programs\Ollama\lib\ollama
time=2025-10-23T12:41:40.806+02:00 level=INFO source=server.go:676 msg="loading model" "model layers"=49 requested=-1
time=2025-10-23T12:41:40.806+02:00 level=INFO source=cpu_windows.go:139 msg=packages count=1
time=2025-10-23T12:41:40.806+02:00 level=INFO source=cpu_windows.go:186 msg="" package=0 cores=12 efficiency=0 threads=24
time=2025-10-23T12:41:40.806+02:00 level=INFO source=server.go:682 msg="system memory" total="63.1 GiB" free="45.7 GiB" free_swap="38.6 GiB"
time=2025-10-23T12:41:40.840+02:00 level=INFO source=runner.go:1332 msg="starting ollama engine"
time=2025-10-23T12:41:40.846+02:00 level=INFO source=runner.go:1367 msg="Server listening on 127.0.0.1:56748"
time=2025-10-23T12:41:40.852+02:00 level=INFO source=runner.go:1205 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:8192 KvCacheType: NumThreads:12 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2025-10-23T12:41:40.867+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32
time=2025-10-23T12:41:40.868+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.description default=""
time=2025-10-23T12:41:40.868+02:00 level=INFO source=ggml.go:134 msg="" architecture=qwen3moe file_type=Q3_K_M name=Qwen3-Coder-30B-A3B-Instruct description="" num_tensors=579 num_key_values=45
time=2025-10-23T12:41:40.868+02:00 level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=C:\Users\lukas\AppData\Local\Programs\Ollama\lib\ollama
load_backend: loaded CPU backend from C:\Users\lukas\AppData\Local\Programs\Ollama\lib\ollama\ggml-cpu-icelake.dll
time=2025-10-23T12:41:40.881+02:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.AVX512=1 CPU.0.AVX512_VBMI=1 CPU.0.AVX512_VNNI=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(clang)
time=2025-10-23T12:41:40.882+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=qwen3moe.pooling_type default=0
time=2025-10-23T12:41:40.882+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.bos_token_id default=0
time=2025-10-23T12:41:40.882+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.add_eos_token default=false
time=2025-10-23T12:41:40.882+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}"
time=2025-10-23T12:41:40.882+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=qwen3moe.rope.scaling.type default=""
time=2025-10-23T12:41:40.882+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=qwen3moe.rope.scaling.factor default=1
time=2025-10-23T12:41:40.882+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=qwen3moe.rope.scaling.original_context_length default=0
time=2025-10-23T12:41:40.882+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=qwen3moe.norm_top_k_prob default=true
time=2025-10-23T12:41:40.886+02:00 level=DEBUG source=ggml.go:837 msg="compute graph" nodes=2982 splits=1
time=2025-10-23T12:41:40.887+02:00 level=DEBUG source=device.go:211 msg="model weights" device=CPU size="12.9 GiB"
time=2025-10-23T12:41:40.887+02:00 level=DEBUG source=device.go:222 msg="kv cache" device=CPU size="768.0 MiB"
time=2025-10-23T12:41:40.887+02:00 level=DEBUG source=device.go:233 msg="compute graph" device=CPU size="84.0 MiB"
time=2025-10-23T12:41:40.888+02:00 level=DEBUG source=device.go:238 msg="total memory" size="13.7 GiB"
time=2025-10-23T12:41:40.888+02:00 level=DEBUG source=server.go:721 msg=memory success=true required.InputWeights=175030272 required.CPU.Weights="[273368064 300303360 300303360 297944064 271205376 273368064 271205376 273368064 271205376 271205376 271205376 271205376 271205376 271205376 271205376 271205376 271205376 271205376 271205376 271205376 271205376 297944064 271205376 271205376 271205376 271205376 271205376 271205376 297944064 271205376 271205376 271205376 271205376 273368064 271205376 271205376 271205376 273368064 273368064 271205376 273368064 297944064 297944064 297944064 300106752 300303360 301417472 301417472 255260672]" required.CPU.Cache="[16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 0]" required.CPU.Graph=88080384
time=2025-10-23T12:41:40.889+02:00 level=DEBUG source=server.go:732 msg="new layout created" layers=[]
time=2025-10-23T12:41:40.889+02:00 level=INFO source=runner.go:1205 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:8192 KvCacheType: NumThreads:12 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2025-10-23T12:41:40.906+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32
time=2025-10-23T12:41:40.909+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=qwen3moe.pooling_type default=0
time=2025-10-23T12:41:40.909+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.bos_token_id default=0
time=2025-10-23T12:41:40.909+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.add_eos_token default=false
time=2025-10-23T12:41:40.909+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}"
time=2025-10-23T12:41:40.909+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=qwen3moe.rope.scaling.type default=""
time=2025-10-23T12:41:40.909+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=qwen3moe.rope.scaling.factor default=1
time=2025-10-23T12:41:40.909+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=qwen3moe.rope.scaling.original_context_length default=0
time=2025-10-23T12:41:40.909+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=qwen3moe.norm_top_k_prob default=true
time=2025-10-23T12:41:41.007+02:00 level=DEBUG source=ggml.go:837 msg="compute graph" nodes=2982 splits=1
time=2025-10-23T12:41:41.007+02:00 level=DEBUG source=device.go:211 msg="model weights" device=CPU size="12.9 GiB"
time=2025-10-23T12:41:41.008+02:00 level=DEBUG source=device.go:222 msg="kv cache" device=CPU size="768.0 MiB"
time=2025-10-23T12:41:41.008+02:00 level=DEBUG source=device.go:233 msg="compute graph" device=CPU size="84.0 MiB"
time=2025-10-23T12:41:41.008+02:00 level=DEBUG source=device.go:238 msg="total memory" size="13.7 GiB"
time=2025-10-23T12:41:41.008+02:00 level=DEBUG source=server.go:721 msg=memory success=true required.InputWeights=175030272 required.CPU.Weights="[273368064 300303360 300303360 297944064 271205376 273368064 271205376 273368064 271205376 271205376 271205376 271205376 271205376 271205376 271205376 271205376 271205376 271205376 271205376 271205376 271205376 297944064 271205376 271205376 271205376 271205376 271205376 271205376 297944064 271205376 271205376 271205376 271205376 273368064 271205376 271205376 271205376 273368064 273368064 271205376 273368064 297944064 297944064 297944064 300106752 300303360 301417472 301417472 255260672]" required.CPU.Cache="[16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 0]" required.CPU.Graph=88080384
time=2025-10-23T12:41:41.008+02:00 level=DEBUG source=server.go:732 msg="new layout created" layers=[]
time=2025-10-23T12:41:41.009+02:00 level=INFO source=runner.go:1205 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:8192 KvCacheType: NumThreads:12 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2025-10-23T12:41:41.009+02:00 level=INFO source=ggml.go:480 msg="offloading 0 repeating layers to GPU"
time=2025-10-23T12:41:41.009+02:00 level=INFO source=ggml.go:484 msg="offloading output layer to CPU"
time=2025-10-23T12:41:41.009+02:00 level=INFO source=ggml.go:492 msg="offloaded 0/49 layers to GPU"
time=2025-10-23T12:41:41.010+02:00 level=INFO source=device.go:211 msg="model weights" device=CPU size="12.9 GiB"
time=2025-10-23T12:41:41.014+02:00 level=INFO source=device.go:222 msg="kv cache" device=CPU size="768.0 MiB"
time=2025-10-23T12:41:41.015+02:00 level=INFO source=device.go:233 msg="compute graph" device=CPU size="84.0 MiB"
time=2025-10-23T12:41:41.016+02:00 level=INFO source=device.go:238 msg="total memory" size="13.7 GiB"
time=2025-10-23T12:41:41.017+02:00 level=INFO source=sched.go:482 msg="loaded runners" count=1
time=2025-10-23T12:41:41.017+02:00 level=INFO source=server.go:1272 msg="waiting for llama runner to start responding"
time=2025-10-23T12:41:41.021+02:00 level=INFO source=server.go:1306 msg="waiting for server to become available" status="llm server loading model"
time=2025-10-23T12:41:41.022+02:00 level=DEBUG source=server.go:1316 msg="model load progress 0.01"
time=2025-10-23T12:41:41.284+02:00 level=DEBUG source=server.go:1316 msg="model load progress 0.27"
time=2025-10-23T12:41:41.550+02:00 level=DEBUG source=server.go:1316 msg="model load progress 0.53"
time=2025-10-23T12:41:41.814+02:00 level=DEBUG source=server.go:1316 msg="model load progress 0.80"
time=2025-10-23T12:41:42.042+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=qwen3moe.pooling_type default=0
time=2025-10-23T12:41:42.075+02:00 level=INFO source=server.go:1310 msg="llama runner started in 1.28 seconds"
time=2025-10-23T12:41:42.075+02:00 level=DEBUG source=sched.go:494 msg="finished setting up" runner.name=hf.co/unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF:UD-Q3_K_XL runner.size="13.7 GiB" runner.vram="0 B" runner.parallel=1 runner.pid=36752 runner.model=C:\Users\lukas\.ollama\models\blobs\sha256-69cd7578d77dffc0b17e34bad9ef998d08ae0e20ccceef21bad4e7eb3d8c553b runner.num_ctx=8192
time=2025-10-23T12:41:42.113+02:00 level=DEBUG source=server.go:1422 msg="completion request" images=0 prompt=6988 format=""
time=2025-10-23T12:41:42.131+02:00 level=DEBUG source=cache.go:142 msg="loading cache slot" id=0 cache=0 prompt=1771 used=0 remaining=1771
[GIN] 2025/10/23 - 12:41:49 | 404 |            0s |       127.0.0.1 | GET      "/api/events"
[GIN] 2025/10/23 - 12:41:58 | 404 |            0s |       127.0.0.1 | GET      "/api/events"
[GIN] 2025/10/23 - 12:42:10 | 404 |            0s |       127.0.0.1 | GET      "/api/events"
[GIN] 2025/10/23 - 12:42:16 | 404 |            0s |       127.0.0.1 | GET      "/api/events"
[GIN] 2025/10/23 - 12:42:30 | 404 |            0s |       127.0.0.1 | GET      "/api/events"
[GIN] 2025/10/23 - 12:42:38 | 200 |   57.8249144s |       127.0.0.1 | POST     "/api/chat"
time=2025-10-23T12:42:38.571+02:00 level=DEBUG source=sched.go:502 msg="context for request finished"
time=2025-10-23T12:42:38.572+02:00 level=DEBUG source=sched.go:294 msg="runner with non-zero duration has gone idle, adding timer" runner.name=hf.co/unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF:UD-Q3_K_XL runner.size="13.7 GiB" runner.vram="0 B" runner.parallel=1 runner.pid=36752 runner.model=C:\Users\lukas\.ollama\models\blobs\sha256-69cd7578d77dffc0b17e34bad9ef998d08ae0e20ccceef21bad4e7eb3d8c553b runner.num_ctx=8192 duration=30m0s
time=2025-10-23T12:42:38.572+02:00 level=DEBUG source=sched.go:312 msg="after processing request finished event" runner.name=hf.co/unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF:UD-Q3_K_XL runner.size="13.7 GiB" runner.vram="0 B" runner.parallel=1 runner.pid=36752 runner.model=C:\Users\lukas\.ollama\models\blobs\sha256-69cd7578d77dffc0b17e34bad9ef998d08ae0e20ccceef21bad4e7eb3d8c553b runner.num_ctx=8192 refCount=0

OS

No response

GPU

No response

CPU

No response

Ollama version

No response

Originally created by @AwarePL on GitHub (Oct 23, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/12754 Originally assigned to: @dhiltgen on GitHub. ### What is the issue? Environment: OS: Windows 11 Professional (x64) Build 26100.6725 CPU: AMD Ryzen 9 7900X3D 12-Core Processor GPU (Dedicated): AMD Radeon RX 7900 XTX (Navi 31 XTX, 24GB GDDR6) GPU (Integrated): AMD Radeon Graphics (Raphael, 512MB shared) RAM: 64 GB DDR5 @ 6000 MHz (EXPO enabled) Motherboard: ASUS ROG CROSSHAIR X670E HERO Ollama Version: 0.12.6 (as seen in logs) AMD Driver Version: Adrenalin 25.9.1 (Clean install performed using DDU) Steps to Reproduce: Ensure Ollama is installed on a Windows system with both AMD integrated graphics and a dedicated AMD RDNA3 GPU (like RX 7900 XTX). Open cmd as Administrator. Stop the background Ollama service: net stop "Ollama Application" Kill any remaining processes: taskkill /F /IM "ollama.exe" /T Set environment variables to force dGPU usage and enable debugging. Attempts included various combinations, primarily: set ROCR_VISIBLE_DEVICES=0 set HIP_VISIBLE_DEVICES=0 (Tried alone and in combination with ROCR_VISIBLE_DEVICES) set OLLAMA_NUM_GPU=999 (To force loading layers) set OLLAMA_DEBUG=1 Start the Ollama server in the same admin cmd window: ollama serve Attempt to load and run any model (e.g., via ollama run ... in another terminal or through an API call). The model used in logs was hf.co/unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF:UD-Q3_K_XL). Expected Behavior: Ollama should detect and select the AMD Radeon RX 7900 XTX (identified as id=0 library=ROCm compute=gfx1100 in logs) as the primary inference device. It should recognize the available VRAM (approx. 21+ GiB available according to initial discovery logs). When loading a model with OLLAMA_NUM_GPU=999, it should offload all or most model layers to the RX 7900 XTX VRAM (e.g., offloaded 49/49 layers to GPU). Inference should primarily utilize the dGPU, minimizing CPU usage for heavy computation. Actual Behavior: The behavior slightly varied depending on the environment variables used, but consistently resulted in CPU-only operation: With ROCR_VISIBLE_DEVICES=0 only (or initially): Logs sometimes showed the correct initial intent (load request="{... GPULayers:49[ID:0 Layers:49(0..48)] ...}"). However, during the VRAM check phase for loading, it reported available layer vram"="0 B" and logged "insufficient VRAM to load any model layers". It then fell back to CPU (new layout created layers=[]) and loaded 0 layers onto the GPU (offloaded 0/49 layers to GPU). With HIP_VISIBLE_DEVICES=0 (alone or with ROCR_VISIBLE_DEVICES): The GPU discovery process during startup failed to select the dGPU. It immediately selected the CPU (inference compute id=cpu library=cpu). It entered "low vram mode" (entering low vram mode "total vram"="0 B"). All subsequent model loads targeted the CPU (GPULayers:[]) and resulted in offloaded 0/49 layers to GPU. In all tested scenarios, the model runs entirely on the CPU, utilizing significant system RAM instead of the dedicated GPU VRAM. Logs: Relevant DEBUG log excerpts demonstrating the issue (GPU detection failure / incorrect VRAM reporting / fallback to CPU) can be provided upon request. Key messages include: inference compute id=cpu entering low vram mode "total vram"="0 B" available layer vram"="0 B" insufficient VRAM to load any model layers offloaded 0/XX layers to GPU Additional Context: The integrated GPU (AMD Radeon Graphics on the 7900X3D) must remain enabled for other purposes and cannot be disabled in the BIOS/Device Manager as a workaround. A clean installation of the latest recommended AMD Adrenalin drivers (25.9.1) using DDU was performed, but did not resolve the issue. The issue seems related to GPU selection or VRAM reporting specifically within Ollama's ROCm integration on Windows when multiple AMD GPUs are present. ### Relevant log output ```shell time=2025-10-23T12:41:27.646+02:00 level=INFO source=routes.go:1511 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES:0 HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:16000 OLLAMA_DEBUG:DEBUG OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11223 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:C:\\Users\\lukas\\.ollama\\models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES:0]" time=2025-10-23T12:41:27.654+02:00 level=INFO source=images.go:522 msg="total blobs: 4" time=2025-10-23T12:41:27.655+02:00 level=INFO source=images.go:529 msg="total unused blobs removed: 0" time=2025-10-23T12:41:27.656+02:00 level=INFO source=routes.go:1564 msg="Listening on 127.0.0.1:11223 (version 0.12.6)" time=2025-10-23T12:41:27.656+02:00 level=DEBUG source=sched.go:123 msg="starting llm scheduler" time=2025-10-23T12:41:27.657+02:00 level=INFO source=runner.go:80 msg="discovering available GPUs..." time=2025-10-23T12:41:27.657+02:00 level=DEBUG source=runner.go:448 msg="spawning runner with" OLLAMA_LIBRARY_PATH="[C:\\Users\\lukas\\AppData\\Local\\Programs\\Ollama\\lib\\ollama C:\\Users\\lukas\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\rocm]" extra_envs=[] time=2025-10-23T12:41:28.241+02:00 level=DEBUG source=runner.go:451 msg="bootstrap discovery took" duration=583.9737ms OLLAMA_LIBRARY_PATH="[C:\\Users\\lukas\\AppData\\Local\\Programs\\Ollama\\lib\\ollama C:\\Users\\lukas\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\rocm]" extra_envs=[] time=2025-10-23T12:41:28.242+02:00 level=DEBUG source=runner.go:448 msg="spawning runner with" OLLAMA_LIBRARY_PATH="[C:\\Users\\lukas\\AppData\\Local\\Programs\\Ollama\\lib\\ollama C:\\Users\\lukas\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\cuda_v12]" extra_envs=[] time=2025-10-23T12:41:28.344+02:00 level=DEBUG source=runner.go:451 msg="bootstrap discovery took" duration=102.7464ms OLLAMA_LIBRARY_PATH="[C:\\Users\\lukas\\AppData\\Local\\Programs\\Ollama\\lib\\ollama C:\\Users\\lukas\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\cuda_v12]" extra_envs=[] time=2025-10-23T12:41:28.345+02:00 level=DEBUG source=runner.go:448 msg="spawning runner with" OLLAMA_LIBRARY_PATH="[C:\\Users\\lukas\\AppData\\Local\\Programs\\Ollama\\lib\\ollama C:\\Users\\lukas\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\cuda_v13]" extra_envs=[] time=2025-10-23T12:41:28.573+02:00 level=DEBUG source=runner.go:451 msg="bootstrap discovery took" duration=228.2463ms OLLAMA_LIBRARY_PATH="[C:\\Users\\lukas\\AppData\\Local\\Programs\\Ollama\\lib\\ollama C:\\Users\\lukas\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\cuda_v13]" extra_envs=[] time=2025-10-23T12:41:28.573+02:00 level=DEBUG source=runner.go:118 msg="filtering out unsupported or overlapping GPU library combinations" count=1 time=2025-10-23T12:41:28.574+02:00 level=DEBUG source=runner.go:130 msg="verifying GPU is supported" library=C:\Users\lukas\AppData\Local\Programs\Ollama\lib\ollama\rocm description="AMD Radeon(TM) Graphics" compute=gfx1036 pci_id=6e:00.0 time=2025-10-23T12:41:28.574+02:00 level=DEBUG source=runner.go:448 msg="spawning runner with" OLLAMA_LIBRARY_PATH="[C:\\Users\\lukas\\AppData\\Local\\Programs\\Ollama\\lib\\ollama C:\\Users\\lukas\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\rocm]" extra_envs="[GGML_CUDA_INIT=1 HIP_VISIBLE_DEVICES=0]" time=2025-10-23T12:41:28.739+02:00 level=DEBUG source=runner.go:451 msg="bootstrap discovery took" duration=165.2113ms OLLAMA_LIBRARY_PATH="[C:\\Users\\lukas\\AppData\\Local\\Programs\\Ollama\\lib\\ollama C:\\Users\\lukas\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\rocm]" extra_envs="[GGML_CUDA_INIT=1 HIP_VISIBLE_DEVICES=0]" time=2025-10-23T12:41:28.739+02:00 level=DEBUG source=runner.go:45 msg="GPU bootstrap discovery took" duration=1.0837682s time=2025-10-23T12:41:28.739+02:00 level=INFO source=types.go:129 msg="inference compute" id=cpu library=cpu compute="" name=cpu description=cpu libdirs=ollama driver="" pci_id="" type="" total="63.1 GiB" available="45.7 GiB" time=2025-10-23T12:41:28.739+02:00 level=INFO source=routes.go:1605 msg="entering low vram mode" "total vram"="0 B" threshold="20.0 GiB" [GIN] 2025/10/23 - 12:41:33 | 404 | 0s | 127.0.0.1 | GET "/api/events" [GIN] 2025/10/23 - 12:41:39 | 404 | 0s | 127.0.0.1 | GET "/api/events" time=2025-10-23T12:41:40.753+02:00 level=DEBUG source=runner.go:259 msg="refreshing free memory" time=2025-10-23T12:41:40.753+02:00 level=DEBUG source=runner.go:45 msg="overall device VRAM discovery took" duration=0s time=2025-10-23T12:41:40.753+02:00 level=DEBUG source=sched.go:195 msg="updating default concurrency" OLLAMA_MAX_LOADED_MODELS=3 gpu_count=1 time=2025-10-23T12:41:40.761+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32 time=2025-10-23T12:41:40.761+02:00 level=DEBUG source=sched.go:215 msg="loading first model" model=C:\Users\lukas\.ollama\models\blobs\sha256-69cd7578d77dffc0b17e34bad9ef998d08ae0e20ccceef21bad4e7eb3d8c553b time=2025-10-23T12:41:40.793+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32 time=2025-10-23T12:41:40.795+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=qwen3moe.pooling_type default=0 time=2025-10-23T12:41:40.795+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.bos_token_id default=0 time=2025-10-23T12:41:40.796+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.add_eos_token default=false time=2025-10-23T12:41:40.796+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}" time=2025-10-23T12:41:40.796+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=qwen3moe.rope.scaling.type default="" time=2025-10-23T12:41:40.796+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=qwen3moe.rope.scaling.factor default=1 time=2025-10-23T12:41:40.796+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=qwen3moe.rope.scaling.original_context_length default=0 time=2025-10-23T12:41:40.796+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=qwen3moe.norm_top_k_prob default=true time=2025-10-23T12:41:40.796+02:00 level=INFO source=cpu_windows.go:139 msg=packages count=1 time=2025-10-23T12:41:40.796+02:00 level=INFO source=cpu_windows.go:186 msg="" package=0 cores=12 efficiency=0 threads=24 time=2025-10-23T12:41:40.796+02:00 level=INFO source=server.go:216 msg="enabling flash attention" time=2025-10-23T12:41:40.799+02:00 level=INFO source=server.go:400 msg="starting runner" cmd="C:\\Users\\lukas\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --model C:\\Users\\lukas\\.ollama\\models\\blobs\\sha256-69cd7578d77dffc0b17e34bad9ef998d08ae0e20ccceef21bad4e7eb3d8c553b --port 56748" time=2025-10-23T12:41:40.799+02:00 level=DEBUG source=server.go:401 msg=subprocess HIP_PATH="C:\\Program Files\\AMD\\ROCm\\6.4\\" HIP_PATH_64="C:\\Program Files\\AMD\\ROCm\\6.4\\" HIP_VISIBLE_DEVICES=0 OLLAMA_CONTEXT_LENGTH=16000 OLLAMA_DEBUG=1 OLLAMA_GPU_OVERHEAD=0 OLLAMA_HOST=127.0.0.1:11223 OLLAMA_LOAD_TIMEOUT=5m0s OLLAMA_MAX_LOADED_MODELS=3 OLLAMA_NUM_GPU=999 OLLAMA_NUM_PARALLEL=1 PATH="C:\\Users\\lukas\\AppData\\Local\\Programs\\Ollama\\lib\\ollama;C:\\Program Files\\cmder\\vendor\\conemu-maximus5\\ConEmu\\Scripts;C:\\Program Files\\cmder\\vendor\\conemu-maximus5;C:\\Program Files\\cmder\\vendor\\conemu-maximus5\\ConEmu;C:\\WINDOWS\\System32\\AMD;C:\\Python311\\Scripts\\;C:\\Python311\\;C:\\WINDOWS\\system32;C:\\WINDOWS;C:\\WINDOWS\\System32\\Wbem;C:\\WINDOWS\\System32\\WindowsPowerShell\\v1.0\\;C:\\WINDOWS\\System32\\OpenSSH\\;C:\\Program Files\\dotnet\\;C:\\ProgramData\\chocolatey\\bin;C:\\Users\\lukas\\AppData\\Roaming\\nvm;C:\\Program Files\\nodejs;C:\\Users\\lukas\\AppData\\Local\\Microsoft\\WindowsApps;C:\\Users\\lukas\\AppData\\Roaming\\npm;C:\\Program Files\\cmder;C:\\Users\\lukas\\AppData\\Roaming\\nvm;C:\\Program Files\\nodejs;C:\\Users\\lukas\\AppData\\Local\\Programs\\Microsoft VS Code\\bin;c:\\Users\\lukas\\AppData\\Local\\Programs\\cursor\\resources\\app\\bin;c:\\Users\\lukas\\AppData\\Local\\Programs\\cursor\\resources\\app\\bin;C:\\Program Files\\Docker\\Docker\\resources\\bin;C:\\Program Files\\Git\\cmd;C:\\Program Files\\PuTTY\\;C:\\Program Files\\PowerShell\\7\\;C:\\Users\\lukas\\AppData\\Local\\Programs\\cursor\\resources\\app\\bin;C:\\Users\\lukas\\AppData\\Local\\Programs\\Ollama;C:\\Users\\lukas\\.lmstudio\\bin;C:\\Users\\lukas\\.dotnet\\tools;C:\\Users\\lukas\\AppData\\Local\\Microsoft\\WinGet\\Packages\\SST.opencode_Microsoft.Winget.Source_8wekyb3d8bbwe;C:\\Users\\lukas\\AppData\\Local\\Microsoft\\WinGet\\Packages\\mostlygeek.llama-swap_Microsoft.Winget.Source_8wekyb3d8bbwe;C:\\Users\\lukas\\AppData\\Local\\Microsoft\\WinGet\\Packages\\ggml.llamacpp_Microsoft.Winget.Source_8wekyb3d8bbwe;C:\\Program Files\\Git\\mingw64\\bin;C:\\Program Files\\Git\\usr\\bin;C:\\Program Files\\cmder\\vendor\\bin;C:\\Users\\lukas\\AppData\\Local\\Programs\\Ollama\\lib\\ollama" ROCR_VISIBLE_DEVICES=0 OLLAMA_LIBRARY_PATH=C:\Users\lukas\AppData\Local\Programs\Ollama\lib\ollama time=2025-10-23T12:41:40.806+02:00 level=INFO source=server.go:676 msg="loading model" "model layers"=49 requested=-1 time=2025-10-23T12:41:40.806+02:00 level=INFO source=cpu_windows.go:139 msg=packages count=1 time=2025-10-23T12:41:40.806+02:00 level=INFO source=cpu_windows.go:186 msg="" package=0 cores=12 efficiency=0 threads=24 time=2025-10-23T12:41:40.806+02:00 level=INFO source=server.go:682 msg="system memory" total="63.1 GiB" free="45.7 GiB" free_swap="38.6 GiB" time=2025-10-23T12:41:40.840+02:00 level=INFO source=runner.go:1332 msg="starting ollama engine" time=2025-10-23T12:41:40.846+02:00 level=INFO source=runner.go:1367 msg="Server listening on 127.0.0.1:56748" time=2025-10-23T12:41:40.852+02:00 level=INFO source=runner.go:1205 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:8192 KvCacheType: NumThreads:12 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2025-10-23T12:41:40.867+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32 time=2025-10-23T12:41:40.868+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.description default="" time=2025-10-23T12:41:40.868+02:00 level=INFO source=ggml.go:134 msg="" architecture=qwen3moe file_type=Q3_K_M name=Qwen3-Coder-30B-A3B-Instruct description="" num_tensors=579 num_key_values=45 time=2025-10-23T12:41:40.868+02:00 level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=C:\Users\lukas\AppData\Local\Programs\Ollama\lib\ollama load_backend: loaded CPU backend from C:\Users\lukas\AppData\Local\Programs\Ollama\lib\ollama\ggml-cpu-icelake.dll time=2025-10-23T12:41:40.881+02:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.AVX512=1 CPU.0.AVX512_VBMI=1 CPU.0.AVX512_VNNI=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(clang) time=2025-10-23T12:41:40.882+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=qwen3moe.pooling_type default=0 time=2025-10-23T12:41:40.882+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.bos_token_id default=0 time=2025-10-23T12:41:40.882+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.add_eos_token default=false time=2025-10-23T12:41:40.882+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}" time=2025-10-23T12:41:40.882+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=qwen3moe.rope.scaling.type default="" time=2025-10-23T12:41:40.882+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=qwen3moe.rope.scaling.factor default=1 time=2025-10-23T12:41:40.882+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=qwen3moe.rope.scaling.original_context_length default=0 time=2025-10-23T12:41:40.882+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=qwen3moe.norm_top_k_prob default=true time=2025-10-23T12:41:40.886+02:00 level=DEBUG source=ggml.go:837 msg="compute graph" nodes=2982 splits=1 time=2025-10-23T12:41:40.887+02:00 level=DEBUG source=device.go:211 msg="model weights" device=CPU size="12.9 GiB" time=2025-10-23T12:41:40.887+02:00 level=DEBUG source=device.go:222 msg="kv cache" device=CPU size="768.0 MiB" time=2025-10-23T12:41:40.887+02:00 level=DEBUG source=device.go:233 msg="compute graph" device=CPU size="84.0 MiB" time=2025-10-23T12:41:40.888+02:00 level=DEBUG source=device.go:238 msg="total memory" size="13.7 GiB" time=2025-10-23T12:41:40.888+02:00 level=DEBUG source=server.go:721 msg=memory success=true required.InputWeights=175030272 required.CPU.Weights="[273368064 300303360 300303360 297944064 271205376 273368064 271205376 273368064 271205376 271205376 271205376 271205376 271205376 271205376 271205376 271205376 271205376 271205376 271205376 271205376 271205376 297944064 271205376 271205376 271205376 271205376 271205376 271205376 297944064 271205376 271205376 271205376 271205376 273368064 271205376 271205376 271205376 273368064 273368064 271205376 273368064 297944064 297944064 297944064 300106752 300303360 301417472 301417472 255260672]" required.CPU.Cache="[16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 0]" required.CPU.Graph=88080384 time=2025-10-23T12:41:40.889+02:00 level=DEBUG source=server.go:732 msg="new layout created" layers=[] time=2025-10-23T12:41:40.889+02:00 level=INFO source=runner.go:1205 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:8192 KvCacheType: NumThreads:12 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2025-10-23T12:41:40.906+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32 time=2025-10-23T12:41:40.909+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=qwen3moe.pooling_type default=0 time=2025-10-23T12:41:40.909+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.bos_token_id default=0 time=2025-10-23T12:41:40.909+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.add_eos_token default=false time=2025-10-23T12:41:40.909+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}" time=2025-10-23T12:41:40.909+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=qwen3moe.rope.scaling.type default="" time=2025-10-23T12:41:40.909+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=qwen3moe.rope.scaling.factor default=1 time=2025-10-23T12:41:40.909+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=qwen3moe.rope.scaling.original_context_length default=0 time=2025-10-23T12:41:40.909+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=qwen3moe.norm_top_k_prob default=true time=2025-10-23T12:41:41.007+02:00 level=DEBUG source=ggml.go:837 msg="compute graph" nodes=2982 splits=1 time=2025-10-23T12:41:41.007+02:00 level=DEBUG source=device.go:211 msg="model weights" device=CPU size="12.9 GiB" time=2025-10-23T12:41:41.008+02:00 level=DEBUG source=device.go:222 msg="kv cache" device=CPU size="768.0 MiB" time=2025-10-23T12:41:41.008+02:00 level=DEBUG source=device.go:233 msg="compute graph" device=CPU size="84.0 MiB" time=2025-10-23T12:41:41.008+02:00 level=DEBUG source=device.go:238 msg="total memory" size="13.7 GiB" time=2025-10-23T12:41:41.008+02:00 level=DEBUG source=server.go:721 msg=memory success=true required.InputWeights=175030272 required.CPU.Weights="[273368064 300303360 300303360 297944064 271205376 273368064 271205376 273368064 271205376 271205376 271205376 271205376 271205376 271205376 271205376 271205376 271205376 271205376 271205376 271205376 271205376 297944064 271205376 271205376 271205376 271205376 271205376 271205376 297944064 271205376 271205376 271205376 271205376 273368064 271205376 271205376 271205376 273368064 273368064 271205376 273368064 297944064 297944064 297944064 300106752 300303360 301417472 301417472 255260672]" required.CPU.Cache="[16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 0]" required.CPU.Graph=88080384 time=2025-10-23T12:41:41.008+02:00 level=DEBUG source=server.go:732 msg="new layout created" layers=[] time=2025-10-23T12:41:41.009+02:00 level=INFO source=runner.go:1205 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:8192 KvCacheType: NumThreads:12 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2025-10-23T12:41:41.009+02:00 level=INFO source=ggml.go:480 msg="offloading 0 repeating layers to GPU" time=2025-10-23T12:41:41.009+02:00 level=INFO source=ggml.go:484 msg="offloading output layer to CPU" time=2025-10-23T12:41:41.009+02:00 level=INFO source=ggml.go:492 msg="offloaded 0/49 layers to GPU" time=2025-10-23T12:41:41.010+02:00 level=INFO source=device.go:211 msg="model weights" device=CPU size="12.9 GiB" time=2025-10-23T12:41:41.014+02:00 level=INFO source=device.go:222 msg="kv cache" device=CPU size="768.0 MiB" time=2025-10-23T12:41:41.015+02:00 level=INFO source=device.go:233 msg="compute graph" device=CPU size="84.0 MiB" time=2025-10-23T12:41:41.016+02:00 level=INFO source=device.go:238 msg="total memory" size="13.7 GiB" time=2025-10-23T12:41:41.017+02:00 level=INFO source=sched.go:482 msg="loaded runners" count=1 time=2025-10-23T12:41:41.017+02:00 level=INFO source=server.go:1272 msg="waiting for llama runner to start responding" time=2025-10-23T12:41:41.021+02:00 level=INFO source=server.go:1306 msg="waiting for server to become available" status="llm server loading model" time=2025-10-23T12:41:41.022+02:00 level=DEBUG source=server.go:1316 msg="model load progress 0.01" time=2025-10-23T12:41:41.284+02:00 level=DEBUG source=server.go:1316 msg="model load progress 0.27" time=2025-10-23T12:41:41.550+02:00 level=DEBUG source=server.go:1316 msg="model load progress 0.53" time=2025-10-23T12:41:41.814+02:00 level=DEBUG source=server.go:1316 msg="model load progress 0.80" time=2025-10-23T12:41:42.042+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=qwen3moe.pooling_type default=0 time=2025-10-23T12:41:42.075+02:00 level=INFO source=server.go:1310 msg="llama runner started in 1.28 seconds" time=2025-10-23T12:41:42.075+02:00 level=DEBUG source=sched.go:494 msg="finished setting up" runner.name=hf.co/unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF:UD-Q3_K_XL runner.size="13.7 GiB" runner.vram="0 B" runner.parallel=1 runner.pid=36752 runner.model=C:\Users\lukas\.ollama\models\blobs\sha256-69cd7578d77dffc0b17e34bad9ef998d08ae0e20ccceef21bad4e7eb3d8c553b runner.num_ctx=8192 time=2025-10-23T12:41:42.113+02:00 level=DEBUG source=server.go:1422 msg="completion request" images=0 prompt=6988 format="" time=2025-10-23T12:41:42.131+02:00 level=DEBUG source=cache.go:142 msg="loading cache slot" id=0 cache=0 prompt=1771 used=0 remaining=1771 [GIN] 2025/10/23 - 12:41:49 | 404 | 0s | 127.0.0.1 | GET "/api/events" [GIN] 2025/10/23 - 12:41:58 | 404 | 0s | 127.0.0.1 | GET "/api/events" [GIN] 2025/10/23 - 12:42:10 | 404 | 0s | 127.0.0.1 | GET "/api/events" [GIN] 2025/10/23 - 12:42:16 | 404 | 0s | 127.0.0.1 | GET "/api/events" [GIN] 2025/10/23 - 12:42:30 | 404 | 0s | 127.0.0.1 | GET "/api/events" [GIN] 2025/10/23 - 12:42:38 | 200 | 57.8249144s | 127.0.0.1 | POST "/api/chat" time=2025-10-23T12:42:38.571+02:00 level=DEBUG source=sched.go:502 msg="context for request finished" time=2025-10-23T12:42:38.572+02:00 level=DEBUG source=sched.go:294 msg="runner with non-zero duration has gone idle, adding timer" runner.name=hf.co/unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF:UD-Q3_K_XL runner.size="13.7 GiB" runner.vram="0 B" runner.parallel=1 runner.pid=36752 runner.model=C:\Users\lukas\.ollama\models\blobs\sha256-69cd7578d77dffc0b17e34bad9ef998d08ae0e20ccceef21bad4e7eb3d8c553b runner.num_ctx=8192 duration=30m0s time=2025-10-23T12:42:38.572+02:00 level=DEBUG source=sched.go:312 msg="after processing request finished event" runner.name=hf.co/unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF:UD-Q3_K_XL runner.size="13.7 GiB" runner.vram="0 B" runner.parallel=1 runner.pid=36752 runner.model=C:\Users\lukas\.ollama\models\blobs\sha256-69cd7578d77dffc0b17e34bad9ef998d08ae0e20ccceef21bad4e7eb3d8c553b runner.num_ctx=8192 refCount=0 ``` ### OS _No response_ ### GPU _No response_ ### CPU _No response_ ### Ollama version _No response_
GiteaMirror added the bugamdwindows labels 2026-04-29 08:05:48 -05:00
Author
Owner

@dhiltgen commented on GitHub (Oct 23, 2025):

@AwarePL can you confirm if you do not set HIP_VISIBLE_DEVICES or ROCR_VISIBLE_DEVICES Ollama correctly discovers your discrete GPUs?

<!-- gh-comment-id:3439465423 --> @dhiltgen commented on GitHub (Oct 23, 2025): @AwarePL can you confirm if you do not set HIP_VISIBLE_DEVICES or ROCR_VISIBLE_DEVICES Ollama correctly discovers your discrete GPUs?
Author
Owner

@AwarePL commented on GitHub (Oct 24, 2025):

It does, just does not use it ;)

here's log:

DE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:16000 OLLAMA_DEBUG:DEBUG OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11223 OLLAMA_INTE
L_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:C:\\Users\\lukas\\.ollama\\mode
ls OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* htt
ps://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webv
iew://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES:]"                                                                                             
time=2025-10-24T09:44:37.249+02:00 level=INFO source=images.go:522 msg="total blobs: 4"                                                                                                            
time=2025-10-24T09:44:37.249+02:00 level=INFO source=images.go:529 msg="total unused blobs removed: 0"                                                                                             
time=2025-10-24T09:44:37.250+02:00 level=INFO source=routes.go:1564 msg="Listening on 127.0.0.1:11223 (version 0.12.6)"                                                                            
time=2025-10-24T09:44:37.250+02:00 level=DEBUG source=sched.go:123 msg="starting llm scheduler"                                                                                                    
time=2025-10-24T09:44:37.251+02:00 level=INFO source=runner.go:80 msg="discovering available GPUs..."                                                                                              
time=2025-10-24T09:44:37.251+02:00 level=DEBUG source=runner.go:448 msg="spawning runner with" OLLAMA_LIBRARY_PATH="[C:\\Users\\lukas\\AppData\\Local\\Programs\\Ollama\\lib\\ollama C:\\Users\\luk
as\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\cuda_v12]" extra_envs=[]                                                                                                                        
time=2025-10-24T09:44:37.352+02:00 level=DEBUG source=runner.go:451 msg="bootstrap discovery took" duration=100.4785ms OLLAMA_LIBRARY_PATH="[C:\\Users\\lukas\\AppData\\Local\\Programs\\Ollama\\li
b\\ollama C:\\Users\\lukas\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\cuda_v12]" extra_envs=[]                                                                                                
time=2025-10-24T09:44:37.353+02:00 level=DEBUG source=runner.go:448 msg="spawning runner with" OLLAMA_LIBRARY_PATH="[C:\\Users\\lukas\\AppData\\Local\\Programs\\Ollama\\lib\\ollama C:\\Users\\luk
as\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\cuda_v13]" extra_envs=[]                                                                                                                        
time=2025-10-24T09:44:37.546+02:00 level=DEBUG source=runner.go:451 msg="bootstrap discovery took" duration=192.9205ms OLLAMA_LIBRARY_PATH="[C:\\Users\\lukas\\AppData\\Local\\Programs\\Ollama\\li
b\\ollama C:\\Users\\lukas\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\cuda_v13]" extra_envs=[]                                                                                                
time=2025-10-24T09:44:37.546+02:00 level=DEBUG source=runner.go:448 msg="spawning runner with" OLLAMA_LIBRARY_PATH="[C:\\Users\\lukas\\AppData\\Local\\Programs\\Ollama\\lib\\ollama C:\\Users\\luk
as\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\rocm]" extra_envs=[]                                                                                                                            
time=2025-10-24T09:44:38.349+02:00 level=DEBUG source=runner.go:451 msg="bootstrap discovery took" duration=802.2932ms OLLAMA_LIBRARY_PATH="[C:\\Users\\lukas\\AppData\\Local\\Programs\\Ollama\\li
b\\ollama C:\\Users\\lukas\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\rocm]" extra_envs=[]                                                                                                    
time=2025-10-24T09:44:38.349+02:00 level=DEBUG source=runner.go:118 msg="filtering out unsupported or overlapping GPU library combinations" count=2                                                
time=2025-10-24T09:44:38.349+02:00 level=DEBUG source=runner.go:130 msg="verifying GPU is supported" library=C:\Users\lukas\AppData\Local\Programs\Ollama\lib\ollama\rocm description="AMD Radeon(T
M) Graphics" compute=gfx1036 pci_id=6e:00.0                                                                                                                                                        
time=2025-10-24T09:44:38.349+02:00 level=DEBUG source=runner.go:130 msg="verifying GPU is supported" library=C:\Users\lukas\AppData\Local\Programs\Ollama\lib\ollama\rocm description="AMD Radeon R
X 7900 XTX" compute=gfx1100 pci_id=03:00.0                                                                                                                                                         
time=2025-10-24T09:44:38.349+02:00 level=DEBUG source=runner.go:448 msg="spawning runner with" OLLAMA_LIBRARY_PATH="[C:\\Users\\lukas\\AppData\\Local\\Programs\\Ollama\\lib\\ollama C:\\Users\\luk
as\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\rocm]" extra_envs="[GGML_CUDA_INIT=1 HIP_VISIBLE_DEVICES=1]"                                                                                    
time=2025-10-24T09:44:38.349+02:00 level=DEBUG source=runner.go:448 msg="spawning runner with" OLLAMA_LIBRARY_PATH="[C:\\Users\\lukas\\AppData\\Local\\Programs\\Ollama\\lib\\ollama C:\\Users\\luk
as\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\rocm]" extra_envs="[GGML_CUDA_INIT=1 HIP_VISIBLE_DEVICES=0]"                                                                                    
time=2025-10-24T09:44:38.534+02:00 level=DEBUG source=runner.go:451 msg="bootstrap discovery took" duration=184.1261ms OLLAMA_LIBRARY_PATH="[C:\\Users\\lukas\\AppData\\Local\\Programs\\Ollama\\li
b\\ollama C:\\Users\\lukas\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\rocm]" extra_envs="[GGML_CUDA_INIT=1 HIP_VISIBLE_DEVICES=0]"                                                            
time=2025-10-24T09:44:40.256+02:00 level=DEBUG source=runner.go:451 msg="bootstrap discovery took" duration=1.9061222s OLLAMA_LIBRARY_PATH="[C:\\Users\\lukas\\AppData\\Local\\Programs\\Ollama\\li
b\\ollama C:\\Users\\lukas\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\rocm]" extra_envs="[GGML_CUDA_INIT=1 HIP_VISIBLE_DEVICES=1]"                                                            
time=2025-10-24T09:44:40.256+02:00 level=DEBUG source=runner.go:45 msg="GPU bootstrap discovery took" duration=3.0060072s                                                                          
time=2025-10-24T09:44:40.257+02:00 level=INFO source=types.go:112 msg="inference compute" id=0 library=ROCm compute=gfx1100 name=ROCm1 description="AMD Radeon RX 7900 XTX" libdirs=ollama,rocm dri
ver=60450.10 pci_id=03:00.0 type=discrete total="24.0 GiB" available="22.5 GiB"                                                                                                                    
time=2025-10-24T09:45:06.560+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32                                                                   
[GIN] 2025/10/24 - 09:45:06 | 200 |      43.355ms |       127.0.0.1 | POST     "/api/show"                                                                                                         
time=2025-10-24T09:45:06.815+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32                                                                   
[GIN] 2025/10/24 - 09:45:06 | 200 |      40.737ms |       127.0.0.1 | POST     "/api/show"                                                                                                         
time=2025-10-24T09:45:24.184+02:00 level=DEBUG source=runner.go:259 msg="refreshing free memory"                                                                                                   
time=2025-10-24T09:45:24.184+02:00 level=DEBUG source=runner.go:323 msg="unable to refresh all GPUs with existing runners, performing bootstrap discovery"                                         
time=2025-10-24T09:45:24.184+02:00 level=DEBUG source=runner.go:448 msg="spawning runner with" OLLAMA_LIBRARY_PATH="[C:\\Users\\lukas\\AppData\\Local\\Programs\\Ollama\\lib\\ollama C:\\Users\\luk
as\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\rocm]" extra_envs=[]                                                                                                                            
time=2025-10-24T09:45:24.977+02:00 level=DEBUG source=runner.go:451 msg="bootstrap discovery took" duration=793.0132ms OLLAMA_LIBRARY_PATH="[C:\\Users\\lukas\\AppData\\Local\\Programs\\Ollama\\li
b\\ollama C:\\Users\\lukas\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\rocm]" extra_envs=[]                                                                                                    
time=2025-10-24T09:45:24.978+02:00 level=DEBUG source=runner.go:45 msg="overall device VRAM discovery took" duration=793.5228ms                                                                    
time=2025-10-24T09:45:24.978+02:00 level=DEBUG source=sched.go:195 msg="updating default concurrency" OLLAMA_MAX_LOADED_MODELS=3 gpu_count=1                                                       
time=2025-10-24T09:45:24.991+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32                                                                   
time=2025-10-24T09:45:24.992+02:00 level=DEBUG source=sched.go:215 msg="loading first model" model=C:\Users\lukas\.ollama\models\blobs\sha256-69cd7578d77dffc0b17e34bad9ef998d08ae0e20ccceef21bad4e
7eb3d8c553b                                                                                                                                                                                        
time=2025-10-24T09:45:25.026+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32                                                                   
time=2025-10-24T09:45:25.027+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=qwen3moe.pooling_type default=0                                                                
time=2025-10-24T09:45:25.027+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.bos_token_id default=0                                                          
time=2025-10-24T09:45:25.027+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.add_eos_token default=false                                                     
time=2025-10-24T09:45:25.027+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}"                                     
time=2025-10-24T09:45:25.028+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=qwen3moe.rope.scaling.type default=""                                                          
time=2025-10-24T09:45:25.028+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=qwen3moe.rope.scaling.factor default=1                                                         
time=2025-10-24T09:45:25.028+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=qwen3moe.rope.scaling.original_context_length default=0                                        
time=2025-10-24T09:45:25.028+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=qwen3moe.norm_top_k_prob default=true                                                          
time=2025-10-24T09:45:25.028+02:00 level=INFO source=cpu_windows.go:139 msg=packages count=1                                                                                                       
time=2025-10-24T09:45:25.029+02:00 level=INFO source=cpu_windows.go:186 msg="" package=0 cores=12 efficiency=0 threads=24                                                                          
time=2025-10-24T09:45:25.029+02:00 level=INFO source=server.go:216 msg="enabling flash attention"                                                                                                  
time=2025-10-24T09:45:25.029+02:00 level=DEBUG source=server.go:331 msg="adding gpu dependency paths" paths="[C:\\Users\\lukas\\AppData\\Local\\Programs\\Ollama\\lib\\ollama C:\\Users\\lukas\\App
Data\\Local\\Programs\\Ollama\\lib\\ollama\\rocm C:\\Users\\lukas\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\rocm]"                                                                           
time=2025-10-24T09:45:25.029+02:00 level=INFO source=server.go:400 msg="starting runner" cmd="C:\\Users\\lukas\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --model C:\\Use
rs\\lukas\\.ollama\\models\\blobs\\sha256-69cd7578d77dffc0b17e34bad9ef998d08ae0e20ccceef21bad4e7eb3d8c553b --port 50996"                                                                           
time=2025-10-24T09:45:25.030+02:00 level=DEBUG source=server.go:401 msg=subprocess HIP_PATH="C:\\Program Files\\AMD\\ROCm\\6.4\\" HIP_PATH_64="C:\\Program Files\\AMD\\ROCm\\6.4\\" OLLAMA_CONTEXT_
LENGTH=16000 OLLAMA_DEBUG=1 OLLAMA_GPU_OVERHEAD=0 OLLAMA_HOST=127.0.0.1:11223 OLLAMA_LOAD_TIMEOUT=5m0s OLLAMA_MAX_LOADED_MODELS=3 OLLAMA_NUM_GPU=999 OLLAMA_NUM_PARALLEL=1 PATH="C:\\Users\\lukas\\
AppData\\Local\\Programs\\Ollama\\lib\\ollama;C:\\Users\\lukas\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\rocm;C:\\Users\\lukas\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\rocm;C:\\Users
\\lukas\\AppData\\Local\\Programs\\Ollama\\lib\\ollama;C:\\Program Files\\cmder\\vendor\\conemu-maximus5\\ConEmu\\Scripts;C:\\Program Files\\cmder\\vendor\\conemu-maximus5;C:\\Program Files\\cmde
r\\vendor\\conemu-maximus5\\ConEmu;C:\\WINDOWS\\System32\\AMD;C:\\Python311\\Scripts\\;C:\\Python311\\;C:\\WINDOWS\\system32;C:\\WINDOWS;C:\\WINDOWS\\System32\\Wbem;C:\\WINDOWS\\System32\\Windows
PowerShell\\v1.0\\;C:\\WINDOWS\\System32\\OpenSSH\\;C:\\Program Files\\dotnet\\;C:\\ProgramData\\chocolatey\\bin;C:\\Users\\lukas\\AppData\\Roaming\\nvm;C:\\Program Files\\nodejs;C:\\Users\\lukas
\\AppData\\Local\\Microsoft\\WindowsApps;C:\\Users\\lukas\\AppData\\Roaming\\npm;C:\\Program Files\\cmder;C:\\Users\\lukas\\AppData\\Roaming\\nvm;C:\\Program Files\\nodejs;C:\\Users\\lukas\\AppDa
ta\\Local\\Programs\\Microsoft VS Code\\bin;c:\\Users\\lukas\\AppData\\Local\\Programs\\cursor\\resources\\app\\bin;c:\\Users\\lukas\\AppData\\Local\\Programs\\cursor\\resources\\app\\bin;C:\\Pro
gram Files\\Docker\\Docker\\resources\\bin;C:\\Program Files\\Git\\cmd;C:\\Program Files\\PuTTY\\;C:\\Program Files\\PowerShell\\7\\;C:\\Users\\lukas\\AppData\\Local\\Programs\\cursor\\resources\
\app\\bin;C:\\Users\\lukas\\AppData\\Local\\Programs\\Ollama;C:\\Users\\lukas\\.lmstudio\\bin;C:\\Users\\lukas\\.dotnet\\tools;C:\\Users\\lukas\\AppData\\Local\\Microsoft\\WinGet\\Packages\\SST.o
pencode_Microsoft.Winget.Source_8wekyb3d8bbwe;C:\\Users\\lukas\\AppData\\Local\\Microsoft\\WinGet\\Packages\\mostlygeek.llama-swap_Microsoft.Winget.Source_8wekyb3d8bbwe;C:\\Users\\lukas\\AppData\
\Local\\Microsoft\\WinGet\\Packages\\ggml.llamacpp_Microsoft.Winget.Source_8wekyb3d8bbwe;C:\\Program Files\\Git\\mingw64\\bin;C:\\Program Files\\Git\\usr\\bin;C:\\Program Files\\cmder\\vendor\\bi
n;C:\\Users\\lukas\\AppData\\Local\\Programs\\Ollama\\lib\\ollama" OLLAMA_LIBRARY_PATH=C:\Users\lukas\AppData\Local\Programs\Ollama\lib\ollama;C:\Users\lukas\AppData\Local\Programs\Ollama\lib\oll
ama;C:\Users\lukas\AppData\Local\Programs\Ollama\lib\ollama\rocm;C:\Users\lukas\AppData\Local\Programs\Ollama\lib\ollama\rocm HIP_VISIBLE_DEVICES=1                                                
time=2025-10-24T09:45:25.035+02:00 level=INFO source=server.go:676 msg="loading model" "model layers"=49 requested=-1                                                                              
time=2025-10-24T09:45:25.036+02:00 level=INFO source=cpu_windows.go:139 msg=packages count=1                                                                                                       
time=2025-10-24T09:45:25.036+02:00 level=INFO source=cpu_windows.go:186 msg="" package=0 cores=12 efficiency=0 threads=24                                                                          
time=2025-10-24T09:45:25.036+02:00 level=INFO source=server.go:682 msg="system memory" total="63.1 GiB" free="46.3 GiB" free_swap="45.9 GiB"                                                       
time=2025-10-24T09:45:25.036+02:00 level=INFO source=server.go:690 msg="gpu memory" id=0 library=ROCm available="0 B" free="438.0 MiB" minimum="457.0 MiB" overhead="0 B"                          
time=2025-10-24T09:45:25.067+02:00 level=INFO source=runner.go:1332 msg="starting ollama engine"                                                                                                   
time=2025-10-24T09:45:25.072+02:00 level=INFO source=runner.go:1367 msg="Server listening on 127.0.0.1:50996"                                                                                      
time=2025-10-24T09:45:25.082+02:00 level=INFO source=runner.go:1205 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:8192 KvCacheType: NumThreads:1
2 GPULayers:49[ID:0 Layers:49(0..48)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"                                                                                                
time=2025-10-24T09:45:25.097+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32                                                                   
time=2025-10-24T09:45:25.098+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.description default=""                                                                 
time=2025-10-24T09:45:25.098+02:00 level=INFO source=ggml.go:134 msg="" architecture=qwen3moe file_type=Q3_K_M name=Qwen3-Coder-30B-A3B-Instruct description="" num_tensors=579 num_key_values=45  
time=2025-10-24T09:45:25.098+02:00 level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=C:\Users\lukas\AppData\Local\Programs\Ollama\lib\ollama                                
load_backend: loaded CPU backend from C:\Users\lukas\AppData\Local\Programs\Ollama\lib\ollama\ggml-cpu-icelake.dll                                                                                 
time=2025-10-24T09:45:25.110+02:00 level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=C:\Users\lukas\AppData\Local\Programs\Ollama\lib\ollama\rocm                           
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no                                                                                                                                                         
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no                                                                                                                                                         
ggml_cuda_init: found 1 ROCm devices:                                                                                                                                                              
  Device 0: AMD Radeon RX 7900 XTX, gfx1100 (0x1100), VMM: no, Wave Size: 32, ID: 0                                                                                                                
load_backend: loaded ROCm backend from C:\Users\lukas\AppData\Local\Programs\Ollama\lib\ollama\rocm\ggml-hip.dll                                                                                   
time=2025-10-24T09:45:25.143+02:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.AVX512=1 CPU.0.AVX512_V
BMI=1 CPU.0.AVX512_VNNI=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 ROCm.0.NO_VMM=1 ROCm.0.NO_PEER_COPY=1 ROCm.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(clang)                                             
time=2025-10-24T09:45:25.147+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=qwen3moe.pooling_type default=0                                                                
time=2025-10-24T09:45:25.147+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.bos_token_id default=0                                                          
time=2025-10-24T09:45:25.147+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.add_eos_token default=false                                                     
time=2025-10-24T09:45:25.147+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}"                                     
time=2025-10-24T09:45:25.147+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=qwen3moe.rope.scaling.type default=""                                                          
time=2025-10-24T09:45:25.147+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=qwen3moe.rope.scaling.factor default=1                                                         
time=2025-10-24T09:45:25.147+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=qwen3moe.rope.scaling.original_context_length default=0                                        
time=2025-10-24T09:45:25.147+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=qwen3moe.norm_top_k_prob default=true                                                          
time=2025-10-24T09:45:25.810+02:00 level=DEBUG source=ggml.go:837 msg="compute graph" nodes=2982 splits=2                                                                                          
time=2025-10-24T09:45:25.812+02:00 level=DEBUG source=device.go:206 msg="model weights" device=ROCm0 size="12.7 GiB"                                                                               
time=2025-10-24T09:45:25.812+02:00 level=DEBUG source=device.go:211 msg="model weights" device=CPU size="166.9 MiB"                                                                                
time=2025-10-24T09:45:25.813+02:00 level=DEBUG source=device.go:217 msg="kv cache" device=ROCm0 size="768.0 MiB"                                                                                   
time=2025-10-24T09:45:25.813+02:00 level=DEBUG source=device.go:228 msg="compute graph" device=ROCm0 size="115.1 MiB"                                                                              
time=2025-10-24T09:45:25.813+02:00 level=DEBUG source=device.go:233 msg="compute graph" device=CPU size="4.0 MiB"                                                                                  
time=2025-10-24T09:45:25.813+02:00 level=DEBUG source=device.go:238 msg="total memory" size="13.7 GiB"                                                                                             
time=2025-10-24T09:45:25.813+02:00 level=DEBUG source=server.go:721 msg=memory success=true required.InputWeights=175030272 required.CPU.Graph=4194304 required.ROCm0.ID=0 required.ROCm0.Weights="
[273368192 300303616 300303616 297944320 271205504 273368192 271205504 273368192 271205504 271205504 271205504 271205504 271205504 271205504 271205504 271205504 271205504 271205504 271205504 2712
05504 271205504 297944320 271205504 271205504 271205504 271205504 271205504 271205504 297944320 271205504 271205504 271205504 271205504 273368192 271205504 271205504 271205504 273368192 273368192
 271205504 273368192 297944320 297944320 297944320 300107008 300303616 301417728 301417728 255260672]" required.ROCm0.Cache="[16777216 16777216 16777216 16777216 16777216 16777216 16777216 167772
16 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 167
77216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 0]" required.ROCm0.Graph=12
0693632                                                                                                                                                                                            
time=2025-10-24T09:45:25.814+02:00 level=DEBUG source=server.go:915 msg="available gpu" id=0 library=ROCm "available layer vram"="0 B" backoff=0.00 minimum="457.0 MiB" overhead="0 B" graph="115.1
 MiB"                                                                                                                                                                                              
time=2025-10-24T09:45:25.814+02:00 level=DEBUG source=server.go:990 msg="insufficient VRAM to load any model layers"                                                                               
time=2025-10-24T09:45:25.814+02:00 level=DEBUG source=server.go:732 msg="new layout created" layers=[]                                                                                             
time=2025-10-24T09:45:25.815+02:00 level=INFO source=runner.go:1205 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:8192 KvCacheType: NumThreads:1
2 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"                                                                                                                       
time=2025-10-24T09:45:25.830+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32                                                                   
time=2025-10-24T09:45:25.833+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=qwen3moe.pooling_type default=0                                                                
time=2025-10-24T09:45:25.833+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.bos_token_id default=0                                                          
time=2025-10-24T09:45:25.833+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.add_eos_token default=false                                                     
time=2025-10-24T09:45:25.833+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}"                                     
time=2025-10-24T09:45:25.833+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=qwen3moe.rope.scaling.type default=""                                                          
time=2025-10-24T09:45:25.833+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=qwen3moe.rope.scaling.factor default=1                                                         
time=2025-10-24T09:45:25.833+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=qwen3moe.rope.scaling.original_context_length default=0                                        
time=2025-10-24T09:45:25.833+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=qwen3moe.norm_top_k_prob default=true                                                          
time=2025-10-24T09:45:25.837+02:00 level=DEBUG source=ggml.go:837 msg="compute graph" nodes=2982 splits=1                                                                                          
time=2025-10-24T09:45:25.838+02:00 level=DEBUG source=device.go:211 msg="model weights" device=CPU size="12.9 GiB"                                                                                 
time=2025-10-24T09:45:25.838+02:00 level=DEBUG source=device.go:222 msg="kv cache" device=CPU size="768.0 MiB"                                                                                     
time=2025-10-24T09:45:25.838+02:00 level=DEBUG source=device.go:233 msg="compute graph" device=CPU size="84.0 MiB"                                                                                 
time=2025-10-24T09:45:25.838+02:00 level=DEBUG source=device.go:238 msg="total memory" size="13.7 GiB"                                                                                             
time=2025-10-24T09:45:25.839+02:00 level=DEBUG source=server.go:721 msg=memory success=true required.InputWeights=175030272 required.CPU.Weights="[273368064 300303360 300303360 297944064 27120537
6 273368064 271205376 273368064 271205376 271205376 271205376 271205376 271205376 271205376 271205376 271205376 271205376 271205376 271205376 271205376 271205376 297944064 271205376 271205376 271
205376 271205376 271205376 271205376 297944064 271205376 271205376 271205376 271205376 273368064 271205376 271205376 271205376 273368064 273368064 271205376 273368064 297944064 297944064 29794406
4 300106752 300303360 301417472 301417472 255260672]" required.CPU.Cache="[16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 167
77216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 
16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 0]" required.CPU.Graph=88080384                                               
time=2025-10-24T09:45:25.840+02:00 level=DEBUG source=server.go:915 msg="available gpu" id=0 library=ROCm "available layer vram"="0 B" backoff=0.00 minimum="457.0 MiB" overhead="0 B" graph="0 B" 
time=2025-10-24T09:45:25.840+02:00 level=DEBUG source=server.go:990 msg="insufficient VRAM to load any model layers"                                                                               
time=2025-10-24T09:45:25.840+02:00 level=DEBUG source=server.go:732 msg="new layout created" layers=[]                                                                                             
time=2025-10-24T09:45:25.841+02:00 level=INFO source=runner.go:1205 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:8192 KvCacheType: NumThreads
:12 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"                                                                                                                     
time=2025-10-24T09:45:25.858+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32                                                                   
time=2025-10-24T09:45:25.861+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=qwen3moe.pooling_type default=0                                                                
time=2025-10-24T09:45:25.861+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.bos_token_id default=0                                                          
time=2025-10-24T09:45:25.861+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.add_eos_token default=false                                                     
time=2025-10-24T09:45:25.861+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}"                                     
time=2025-10-24T09:45:25.861+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=qwen3moe.rope.scaling.type default=""                                                          
time=2025-10-24T09:45:25.861+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=qwen3moe.rope.scaling.factor default=1                                                         
time=2025-10-24T09:45:25.861+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=qwen3moe.rope.scaling.original_context_length default=0                                        
time=2025-10-24T09:45:25.861+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=qwen3moe.norm_top_k_prob default=true                                                          
time=2025-10-24T09:45:25.938+02:00 level=DEBUG source=ggml.go:837 msg="compute graph" nodes=2982 splits=1                                                                                          
time=2025-10-24T09:45:25.939+02:00 level=DEBUG source=device.go:211 msg="model weights" device=CPU size="12.9 GiB"                                                                                 
time=2025-10-24T09:45:25.939+02:00 level=DEBUG source=device.go:222 msg="kv cache" device=CPU size="768.0 MiB"                                                                                     
time=2025-10-24T09:45:25.939+02:00 level=DEBUG source=device.go:233 msg="compute graph" device=CPU size="84.0 MiB"                                                                                 
time=2025-10-24T09:45:25.939+02:00 level=DEBUG source=device.go:238 msg="total memory" size="13.7 GiB"                                                                                             
time=2025-10-24T09:45:25.939+02:00 level=DEBUG source=server.go:721 msg=memory success=true required.InputWeights=175030272 required.CPU.Weights="[273368064 300303360 300303360 297944064 27120537
6 273368064 271205376 273368064 271205376 271205376 271205376 271205376 271205376 271205376 271205376 271205376 271205376 271205376 271205376 271205376 271205376 297944064 271205376 271205376 271
205376 271205376 271205376 271205376 297944064 271205376 271205376 271205376 271205376 273368064 271205376 271205376 271205376 273368064 273368064 271205376 273368064 297944064 297944064 29794406
4 300106752 300303360 301417472 301417472 255260672]" required.CPU.Cache="[16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 167
77216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 
16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 0]" required.CPU.Graph=88080384                                               
time=2025-10-24T09:45:25.940+02:00 level=DEBUG source=server.go:915 msg="available gpu" id=0 library=ROCm "available layer vram"="0 B" backoff=0.00 minimum="457.0 MiB" overhead="0 B" graph="0 B" 
time=2025-10-24T09:45:25.940+02:00 level=DEBUG source=server.go:990 msg="insufficient VRAM to load any model layers"                                                                               
time=2025-10-24T09:45:25.940+02:00 level=DEBUG source=server.go:732 msg="new layout created" layers=[]                                                                                             
time=2025-10-24T09:45:25.941+02:00 level=INFO source=runner.go:1205 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:8192 KvCacheType: NumThread
s:12 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"                                                                                                                    
time=2025-10-24T09:45:25.941+02:00 level=INFO source=ggml.go:480 msg="offloading 0 repeating layers to GPU"                                                                                        
time=2025-10-24T09:45:25.941+02:00 level=INFO source=ggml.go:484 msg="offloading output layer to CPU"                                                                                              
time=2025-10-24T09:45:25.941+02:00 level=INFO source=ggml.go:492 msg="offloaded 0/49 layers to GPU"                                                                                                
time=2025-10-24T09:45:25.942+02:00 level=INFO source=device.go:211 msg="model weights" device=CPU size="12.9 GiB"                                                                                  
time=2025-10-24T09:45:25.942+02:00 level=INFO source=device.go:222 msg="kv cache" device=CPU size="768.0 MiB"                                                                                      
time=2025-10-24T09:45:25.943+02:00 level=INFO source=device.go:233 msg="compute graph" device=CPU size="84.0 MiB"                                                                                  
time=2025-10-24T09:45:25.943+02:00 level=INFO source=device.go:238 msg="total memory" size="13.7 GiB"                                                                                              
time=2025-10-24T09:45:25.943+02:00 level=INFO source=sched.go:482 msg="loaded runners" count=1                                                                                                     
time=2025-10-24T09:45:25.943+02:00 level=INFO source=server.go:1272 msg="waiting for llama runner to start responding"                                                                             
time=2025-10-24T09:45:25.945+02:00 level=INFO source=server.go:1306 msg="waiting for server to become available" status="llm server loading model"                                                 
time=2025-10-24T09:45:25.945+02:00 level=DEBUG source=server.go:1316 msg="model load progress 0.00"                                                                                                
time=2025-10-24T09:45:26.197+02:00 level=DEBUG source=server.go:1316 msg="model load progress 0.06"                                                                                                
time=2025-10-24T09:45:26.448+02:00 level=DEBUG source=server.go:1316 msg="model load progress 0.12"                                                                                                
time=2025-10-24T09:45:26.699+02:00 level=DEBUG source=server.go:1316 msg="model load progress 0.18"                                                                                                
time=2025-10-24T09:45:26.950+02:00 level=DEBUG source=server.go:1316 msg="model load progress 0.24"                                                                                                
time=2025-10-24T09:45:27.202+02:00 level=DEBUG source=server.go:1316 msg="model load progress 0.30"                                                                                                
time=2025-10-24T09:45:27.454+02:00 level=DEBUG source=server.go:1316 msg="model load progress 0.35"                                                                                                
time=2025-10-24T09:45:27.706+02:00 level=DEBUG source=server.go:1316 msg="model load progress 0.41"                                                                                                
time=2025-10-24T09:45:27.957+02:00 level=DEBUG source=server.go:1316 msg="model load progress 0.47"                                                                                                
time=2025-10-24T09:45:28.208+02:00 level=DEBUG source=server.go:1316 msg="model load progress 0.53"                                                                                                
time=2025-10-24T09:45:28.460+02:00 level=DEBUG source=server.go:1316 msg="model load progress 0.59"                                                                                                
time=2025-10-24T09:45:28.712+02:00 level=DEBUG source=server.go:1316 msg="model load progress 0.65"                                                                                                
time=2025-10-24T09:45:28.964+02:00 level=DEBUG source=server.go:1316 msg="model load progress 0.71"                                                                                                
time=2025-10-24T09:45:29.215+02:00 level=DEBUG source=server.go:1316 msg="model load progress 0.77"                                                                                                
time=2025-10-24T09:45:29.467+02:00 level=DEBUG source=server.go:1316 msg="model load progress 0.83"                                                                                                
time=2025-10-24T09:45:29.718+02:00 level=DEBUG source=server.go:1316 msg="model load progress 0.89"                                                                                                
time=2025-10-24T09:45:29.970+02:00 level=DEBUG source=server.go:1316 msg="model load progress 0.95"                                                                                                
time=2025-10-24T09:45:30.205+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=qwen3moe.pooling_type default=0                                                                
time=2025-10-24T09:45:30.221+02:00 level=INFO source=server.go:1310 msg="llama runner started in 5.19 seconds"                                                                                     
time=2025-10-24T09:45:30.222+02:00 level=DEBUG source=sched.go:494 msg="finished setting up" runner.name=hf.co/unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF:UD-Q3_K_XL runner.size="13.7 GiB" runner.v
ram="0 B" runner.parallel=1 runner.pid=19568 runner.model=C:\Users\lukas\.ollama\models\blobs\sha256-69cd7578d77dffc0b17e34bad9ef998d08ae0e20ccceef21bad4e7eb3d8c553b runner.num_ctx=8192          
time=2025-10-24T09:45:30.260+02:00 level=DEBUG source=server.go:1422 msg="completion request" images=0 prompt=7480 format=""                                                                       
time=2025-10-24T09:45:30.276+02:00 level=DEBUG source=cache.go:142 msg="loading cache slot" id=0 cache=0 prompt=1927 used=0 remaining=1927                                                         
[GIN] 2025/10/24 - 09:45:58 | 200 |   34.3001205s |       127.0.0.1 | POST     "/api/chat"                                                                                                         
time=2025-10-24T09:45:58.424+02:00 level=DEBUG source=sched.go:502 msg="context for request finished"                                                                                              
time=2025-10-24T09:45:58.424+02:00 level=DEBUG source=sched.go:294 msg="runner with non-zero duration has gone idle, adding timer" runner.name=hf.co/unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF:UD-Q
3_K_XL runner.size="13.7 GiB" runner.vram="0 B" runner.parallel=1 runner.pid=19568 runner.model=C:\Users\lukas\.ollama\models\blobs\sha256-69cd7578d77dffc0b17e34bad9ef998d08ae0e20ccceef21bad4e7eb
3d8c553b runner.num_ctx=8192 duration=30m0s                                                                                                                                                        
time=2025-10-24T09:45:58.425+02:00 level=DEBUG source=sched.go:312 msg="after processing request finished event" runner.name=hf.co/unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF:UD-Q3_K_XL runner.size
="13.7 GiB" runner.vram="0 B" runner.parallel=1 runner.pid=19568 runner.model=C:\Users\lukas\.ollama\models\blobs\sha256-69cd7578d77dffc0b17e34bad9ef998d08ae0e20ccceef21bad4e7eb3d8c553b runner.nu
m_ctx=8192 refCount=0                                                                                                                                                                              
                                                                                                                                                                                                   ```
<!-- gh-comment-id:3440042587 --> @AwarePL commented on GitHub (Oct 24, 2025): It does, just does not use it ;) here's log: ```time=2025-10-24T09:44:37.242+02:00 level=INFO source=routes.go:1511 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRI DE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:16000 OLLAMA_DEBUG:DEBUG OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11223 OLLAMA_INTE L_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:C:\\Users\\lukas\\.ollama\\mode ls OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* htt ps://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webv iew://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES:]" time=2025-10-24T09:44:37.249+02:00 level=INFO source=images.go:522 msg="total blobs: 4" time=2025-10-24T09:44:37.249+02:00 level=INFO source=images.go:529 msg="total unused blobs removed: 0" time=2025-10-24T09:44:37.250+02:00 level=INFO source=routes.go:1564 msg="Listening on 127.0.0.1:11223 (version 0.12.6)" time=2025-10-24T09:44:37.250+02:00 level=DEBUG source=sched.go:123 msg="starting llm scheduler" time=2025-10-24T09:44:37.251+02:00 level=INFO source=runner.go:80 msg="discovering available GPUs..." time=2025-10-24T09:44:37.251+02:00 level=DEBUG source=runner.go:448 msg="spawning runner with" OLLAMA_LIBRARY_PATH="[C:\\Users\\lukas\\AppData\\Local\\Programs\\Ollama\\lib\\ollama C:\\Users\\luk as\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\cuda_v12]" extra_envs=[] time=2025-10-24T09:44:37.352+02:00 level=DEBUG source=runner.go:451 msg="bootstrap discovery took" duration=100.4785ms OLLAMA_LIBRARY_PATH="[C:\\Users\\lukas\\AppData\\Local\\Programs\\Ollama\\li b\\ollama C:\\Users\\lukas\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\cuda_v12]" extra_envs=[] time=2025-10-24T09:44:37.353+02:00 level=DEBUG source=runner.go:448 msg="spawning runner with" OLLAMA_LIBRARY_PATH="[C:\\Users\\lukas\\AppData\\Local\\Programs\\Ollama\\lib\\ollama C:\\Users\\luk as\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\cuda_v13]" extra_envs=[] time=2025-10-24T09:44:37.546+02:00 level=DEBUG source=runner.go:451 msg="bootstrap discovery took" duration=192.9205ms OLLAMA_LIBRARY_PATH="[C:\\Users\\lukas\\AppData\\Local\\Programs\\Ollama\\li b\\ollama C:\\Users\\lukas\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\cuda_v13]" extra_envs=[] time=2025-10-24T09:44:37.546+02:00 level=DEBUG source=runner.go:448 msg="spawning runner with" OLLAMA_LIBRARY_PATH="[C:\\Users\\lukas\\AppData\\Local\\Programs\\Ollama\\lib\\ollama C:\\Users\\luk as\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\rocm]" extra_envs=[] time=2025-10-24T09:44:38.349+02:00 level=DEBUG source=runner.go:451 msg="bootstrap discovery took" duration=802.2932ms OLLAMA_LIBRARY_PATH="[C:\\Users\\lukas\\AppData\\Local\\Programs\\Ollama\\li b\\ollama C:\\Users\\lukas\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\rocm]" extra_envs=[] time=2025-10-24T09:44:38.349+02:00 level=DEBUG source=runner.go:118 msg="filtering out unsupported or overlapping GPU library combinations" count=2 time=2025-10-24T09:44:38.349+02:00 level=DEBUG source=runner.go:130 msg="verifying GPU is supported" library=C:\Users\lukas\AppData\Local\Programs\Ollama\lib\ollama\rocm description="AMD Radeon(T M) Graphics" compute=gfx1036 pci_id=6e:00.0 time=2025-10-24T09:44:38.349+02:00 level=DEBUG source=runner.go:130 msg="verifying GPU is supported" library=C:\Users\lukas\AppData\Local\Programs\Ollama\lib\ollama\rocm description="AMD Radeon R X 7900 XTX" compute=gfx1100 pci_id=03:00.0 time=2025-10-24T09:44:38.349+02:00 level=DEBUG source=runner.go:448 msg="spawning runner with" OLLAMA_LIBRARY_PATH="[C:\\Users\\lukas\\AppData\\Local\\Programs\\Ollama\\lib\\ollama C:\\Users\\luk as\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\rocm]" extra_envs="[GGML_CUDA_INIT=1 HIP_VISIBLE_DEVICES=1]" time=2025-10-24T09:44:38.349+02:00 level=DEBUG source=runner.go:448 msg="spawning runner with" OLLAMA_LIBRARY_PATH="[C:\\Users\\lukas\\AppData\\Local\\Programs\\Ollama\\lib\\ollama C:\\Users\\luk as\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\rocm]" extra_envs="[GGML_CUDA_INIT=1 HIP_VISIBLE_DEVICES=0]" time=2025-10-24T09:44:38.534+02:00 level=DEBUG source=runner.go:451 msg="bootstrap discovery took" duration=184.1261ms OLLAMA_LIBRARY_PATH="[C:\\Users\\lukas\\AppData\\Local\\Programs\\Ollama\\li b\\ollama C:\\Users\\lukas\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\rocm]" extra_envs="[GGML_CUDA_INIT=1 HIP_VISIBLE_DEVICES=0]" time=2025-10-24T09:44:40.256+02:00 level=DEBUG source=runner.go:451 msg="bootstrap discovery took" duration=1.9061222s OLLAMA_LIBRARY_PATH="[C:\\Users\\lukas\\AppData\\Local\\Programs\\Ollama\\li b\\ollama C:\\Users\\lukas\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\rocm]" extra_envs="[GGML_CUDA_INIT=1 HIP_VISIBLE_DEVICES=1]" time=2025-10-24T09:44:40.256+02:00 level=DEBUG source=runner.go:45 msg="GPU bootstrap discovery took" duration=3.0060072s time=2025-10-24T09:44:40.257+02:00 level=INFO source=types.go:112 msg="inference compute" id=0 library=ROCm compute=gfx1100 name=ROCm1 description="AMD Radeon RX 7900 XTX" libdirs=ollama,rocm dri ver=60450.10 pci_id=03:00.0 type=discrete total="24.0 GiB" available="22.5 GiB" time=2025-10-24T09:45:06.560+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32 [GIN] 2025/10/24 - 09:45:06 | 200 | 43.355ms | 127.0.0.1 | POST "/api/show" time=2025-10-24T09:45:06.815+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32 [GIN] 2025/10/24 - 09:45:06 | 200 | 40.737ms | 127.0.0.1 | POST "/api/show" time=2025-10-24T09:45:24.184+02:00 level=DEBUG source=runner.go:259 msg="refreshing free memory" time=2025-10-24T09:45:24.184+02:00 level=DEBUG source=runner.go:323 msg="unable to refresh all GPUs with existing runners, performing bootstrap discovery" time=2025-10-24T09:45:24.184+02:00 level=DEBUG source=runner.go:448 msg="spawning runner with" OLLAMA_LIBRARY_PATH="[C:\\Users\\lukas\\AppData\\Local\\Programs\\Ollama\\lib\\ollama C:\\Users\\luk as\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\rocm]" extra_envs=[] time=2025-10-24T09:45:24.977+02:00 level=DEBUG source=runner.go:451 msg="bootstrap discovery took" duration=793.0132ms OLLAMA_LIBRARY_PATH="[C:\\Users\\lukas\\AppData\\Local\\Programs\\Ollama\\li b\\ollama C:\\Users\\lukas\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\rocm]" extra_envs=[] time=2025-10-24T09:45:24.978+02:00 level=DEBUG source=runner.go:45 msg="overall device VRAM discovery took" duration=793.5228ms time=2025-10-24T09:45:24.978+02:00 level=DEBUG source=sched.go:195 msg="updating default concurrency" OLLAMA_MAX_LOADED_MODELS=3 gpu_count=1 time=2025-10-24T09:45:24.991+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32 time=2025-10-24T09:45:24.992+02:00 level=DEBUG source=sched.go:215 msg="loading first model" model=C:\Users\lukas\.ollama\models\blobs\sha256-69cd7578d77dffc0b17e34bad9ef998d08ae0e20ccceef21bad4e 7eb3d8c553b time=2025-10-24T09:45:25.026+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32 time=2025-10-24T09:45:25.027+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=qwen3moe.pooling_type default=0 time=2025-10-24T09:45:25.027+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.bos_token_id default=0 time=2025-10-24T09:45:25.027+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.add_eos_token default=false time=2025-10-24T09:45:25.027+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}" time=2025-10-24T09:45:25.028+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=qwen3moe.rope.scaling.type default="" time=2025-10-24T09:45:25.028+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=qwen3moe.rope.scaling.factor default=1 time=2025-10-24T09:45:25.028+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=qwen3moe.rope.scaling.original_context_length default=0 time=2025-10-24T09:45:25.028+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=qwen3moe.norm_top_k_prob default=true time=2025-10-24T09:45:25.028+02:00 level=INFO source=cpu_windows.go:139 msg=packages count=1 time=2025-10-24T09:45:25.029+02:00 level=INFO source=cpu_windows.go:186 msg="" package=0 cores=12 efficiency=0 threads=24 time=2025-10-24T09:45:25.029+02:00 level=INFO source=server.go:216 msg="enabling flash attention" time=2025-10-24T09:45:25.029+02:00 level=DEBUG source=server.go:331 msg="adding gpu dependency paths" paths="[C:\\Users\\lukas\\AppData\\Local\\Programs\\Ollama\\lib\\ollama C:\\Users\\lukas\\App Data\\Local\\Programs\\Ollama\\lib\\ollama\\rocm C:\\Users\\lukas\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\rocm]" time=2025-10-24T09:45:25.029+02:00 level=INFO source=server.go:400 msg="starting runner" cmd="C:\\Users\\lukas\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --model C:\\Use rs\\lukas\\.ollama\\models\\blobs\\sha256-69cd7578d77dffc0b17e34bad9ef998d08ae0e20ccceef21bad4e7eb3d8c553b --port 50996" time=2025-10-24T09:45:25.030+02:00 level=DEBUG source=server.go:401 msg=subprocess HIP_PATH="C:\\Program Files\\AMD\\ROCm\\6.4\\" HIP_PATH_64="C:\\Program Files\\AMD\\ROCm\\6.4\\" OLLAMA_CONTEXT_ LENGTH=16000 OLLAMA_DEBUG=1 OLLAMA_GPU_OVERHEAD=0 OLLAMA_HOST=127.0.0.1:11223 OLLAMA_LOAD_TIMEOUT=5m0s OLLAMA_MAX_LOADED_MODELS=3 OLLAMA_NUM_GPU=999 OLLAMA_NUM_PARALLEL=1 PATH="C:\\Users\\lukas\\ AppData\\Local\\Programs\\Ollama\\lib\\ollama;C:\\Users\\lukas\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\rocm;C:\\Users\\lukas\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\rocm;C:\\Users \\lukas\\AppData\\Local\\Programs\\Ollama\\lib\\ollama;C:\\Program Files\\cmder\\vendor\\conemu-maximus5\\ConEmu\\Scripts;C:\\Program Files\\cmder\\vendor\\conemu-maximus5;C:\\Program Files\\cmde r\\vendor\\conemu-maximus5\\ConEmu;C:\\WINDOWS\\System32\\AMD;C:\\Python311\\Scripts\\;C:\\Python311\\;C:\\WINDOWS\\system32;C:\\WINDOWS;C:\\WINDOWS\\System32\\Wbem;C:\\WINDOWS\\System32\\Windows PowerShell\\v1.0\\;C:\\WINDOWS\\System32\\OpenSSH\\;C:\\Program Files\\dotnet\\;C:\\ProgramData\\chocolatey\\bin;C:\\Users\\lukas\\AppData\\Roaming\\nvm;C:\\Program Files\\nodejs;C:\\Users\\lukas \\AppData\\Local\\Microsoft\\WindowsApps;C:\\Users\\lukas\\AppData\\Roaming\\npm;C:\\Program Files\\cmder;C:\\Users\\lukas\\AppData\\Roaming\\nvm;C:\\Program Files\\nodejs;C:\\Users\\lukas\\AppDa ta\\Local\\Programs\\Microsoft VS Code\\bin;c:\\Users\\lukas\\AppData\\Local\\Programs\\cursor\\resources\\app\\bin;c:\\Users\\lukas\\AppData\\Local\\Programs\\cursor\\resources\\app\\bin;C:\\Pro gram Files\\Docker\\Docker\\resources\\bin;C:\\Program Files\\Git\\cmd;C:\\Program Files\\PuTTY\\;C:\\Program Files\\PowerShell\\7\\;C:\\Users\\lukas\\AppData\\Local\\Programs\\cursor\\resources\ \app\\bin;C:\\Users\\lukas\\AppData\\Local\\Programs\\Ollama;C:\\Users\\lukas\\.lmstudio\\bin;C:\\Users\\lukas\\.dotnet\\tools;C:\\Users\\lukas\\AppData\\Local\\Microsoft\\WinGet\\Packages\\SST.o pencode_Microsoft.Winget.Source_8wekyb3d8bbwe;C:\\Users\\lukas\\AppData\\Local\\Microsoft\\WinGet\\Packages\\mostlygeek.llama-swap_Microsoft.Winget.Source_8wekyb3d8bbwe;C:\\Users\\lukas\\AppData\ \Local\\Microsoft\\WinGet\\Packages\\ggml.llamacpp_Microsoft.Winget.Source_8wekyb3d8bbwe;C:\\Program Files\\Git\\mingw64\\bin;C:\\Program Files\\Git\\usr\\bin;C:\\Program Files\\cmder\\vendor\\bi n;C:\\Users\\lukas\\AppData\\Local\\Programs\\Ollama\\lib\\ollama" OLLAMA_LIBRARY_PATH=C:\Users\lukas\AppData\Local\Programs\Ollama\lib\ollama;C:\Users\lukas\AppData\Local\Programs\Ollama\lib\oll ama;C:\Users\lukas\AppData\Local\Programs\Ollama\lib\ollama\rocm;C:\Users\lukas\AppData\Local\Programs\Ollama\lib\ollama\rocm HIP_VISIBLE_DEVICES=1 time=2025-10-24T09:45:25.035+02:00 level=INFO source=server.go:676 msg="loading model" "model layers"=49 requested=-1 time=2025-10-24T09:45:25.036+02:00 level=INFO source=cpu_windows.go:139 msg=packages count=1 time=2025-10-24T09:45:25.036+02:00 level=INFO source=cpu_windows.go:186 msg="" package=0 cores=12 efficiency=0 threads=24 time=2025-10-24T09:45:25.036+02:00 level=INFO source=server.go:682 msg="system memory" total="63.1 GiB" free="46.3 GiB" free_swap="45.9 GiB" time=2025-10-24T09:45:25.036+02:00 level=INFO source=server.go:690 msg="gpu memory" id=0 library=ROCm available="0 B" free="438.0 MiB" minimum="457.0 MiB" overhead="0 B" time=2025-10-24T09:45:25.067+02:00 level=INFO source=runner.go:1332 msg="starting ollama engine" time=2025-10-24T09:45:25.072+02:00 level=INFO source=runner.go:1367 msg="Server listening on 127.0.0.1:50996" time=2025-10-24T09:45:25.082+02:00 level=INFO source=runner.go:1205 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:8192 KvCacheType: NumThreads:1 2 GPULayers:49[ID:0 Layers:49(0..48)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2025-10-24T09:45:25.097+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32 time=2025-10-24T09:45:25.098+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.description default="" time=2025-10-24T09:45:25.098+02:00 level=INFO source=ggml.go:134 msg="" architecture=qwen3moe file_type=Q3_K_M name=Qwen3-Coder-30B-A3B-Instruct description="" num_tensors=579 num_key_values=45 time=2025-10-24T09:45:25.098+02:00 level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=C:\Users\lukas\AppData\Local\Programs\Ollama\lib\ollama load_backend: loaded CPU backend from C:\Users\lukas\AppData\Local\Programs\Ollama\lib\ollama\ggml-cpu-icelake.dll time=2025-10-24T09:45:25.110+02:00 level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=C:\Users\lukas\AppData\Local\Programs\Ollama\lib\ollama\rocm ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon RX 7900 XTX, gfx1100 (0x1100), VMM: no, Wave Size: 32, ID: 0 load_backend: loaded ROCm backend from C:\Users\lukas\AppData\Local\Programs\Ollama\lib\ollama\rocm\ggml-hip.dll time=2025-10-24T09:45:25.143+02:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.AVX512=1 CPU.0.AVX512_V BMI=1 CPU.0.AVX512_VNNI=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 ROCm.0.NO_VMM=1 ROCm.0.NO_PEER_COPY=1 ROCm.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(clang) time=2025-10-24T09:45:25.147+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=qwen3moe.pooling_type default=0 time=2025-10-24T09:45:25.147+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.bos_token_id default=0 time=2025-10-24T09:45:25.147+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.add_eos_token default=false time=2025-10-24T09:45:25.147+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}" time=2025-10-24T09:45:25.147+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=qwen3moe.rope.scaling.type default="" time=2025-10-24T09:45:25.147+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=qwen3moe.rope.scaling.factor default=1 time=2025-10-24T09:45:25.147+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=qwen3moe.rope.scaling.original_context_length default=0 time=2025-10-24T09:45:25.147+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=qwen3moe.norm_top_k_prob default=true time=2025-10-24T09:45:25.810+02:00 level=DEBUG source=ggml.go:837 msg="compute graph" nodes=2982 splits=2 time=2025-10-24T09:45:25.812+02:00 level=DEBUG source=device.go:206 msg="model weights" device=ROCm0 size="12.7 GiB" time=2025-10-24T09:45:25.812+02:00 level=DEBUG source=device.go:211 msg="model weights" device=CPU size="166.9 MiB" time=2025-10-24T09:45:25.813+02:00 level=DEBUG source=device.go:217 msg="kv cache" device=ROCm0 size="768.0 MiB" time=2025-10-24T09:45:25.813+02:00 level=DEBUG source=device.go:228 msg="compute graph" device=ROCm0 size="115.1 MiB" time=2025-10-24T09:45:25.813+02:00 level=DEBUG source=device.go:233 msg="compute graph" device=CPU size="4.0 MiB" time=2025-10-24T09:45:25.813+02:00 level=DEBUG source=device.go:238 msg="total memory" size="13.7 GiB" time=2025-10-24T09:45:25.813+02:00 level=DEBUG source=server.go:721 msg=memory success=true required.InputWeights=175030272 required.CPU.Graph=4194304 required.ROCm0.ID=0 required.ROCm0.Weights=" [273368192 300303616 300303616 297944320 271205504 273368192 271205504 273368192 271205504 271205504 271205504 271205504 271205504 271205504 271205504 271205504 271205504 271205504 271205504 2712 05504 271205504 297944320 271205504 271205504 271205504 271205504 271205504 271205504 297944320 271205504 271205504 271205504 271205504 273368192 271205504 271205504 271205504 273368192 273368192 271205504 273368192 297944320 297944320 297944320 300107008 300303616 301417728 301417728 255260672]" required.ROCm0.Cache="[16777216 16777216 16777216 16777216 16777216 16777216 16777216 167772 16 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 167 77216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 0]" required.ROCm0.Graph=12 0693632 time=2025-10-24T09:45:25.814+02:00 level=DEBUG source=server.go:915 msg="available gpu" id=0 library=ROCm "available layer vram"="0 B" backoff=0.00 minimum="457.0 MiB" overhead="0 B" graph="115.1 MiB" time=2025-10-24T09:45:25.814+02:00 level=DEBUG source=server.go:990 msg="insufficient VRAM to load any model layers" time=2025-10-24T09:45:25.814+02:00 level=DEBUG source=server.go:732 msg="new layout created" layers=[] time=2025-10-24T09:45:25.815+02:00 level=INFO source=runner.go:1205 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:8192 KvCacheType: NumThreads:1 2 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2025-10-24T09:45:25.830+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32 time=2025-10-24T09:45:25.833+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=qwen3moe.pooling_type default=0 time=2025-10-24T09:45:25.833+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.bos_token_id default=0 time=2025-10-24T09:45:25.833+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.add_eos_token default=false time=2025-10-24T09:45:25.833+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}" time=2025-10-24T09:45:25.833+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=qwen3moe.rope.scaling.type default="" time=2025-10-24T09:45:25.833+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=qwen3moe.rope.scaling.factor default=1 time=2025-10-24T09:45:25.833+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=qwen3moe.rope.scaling.original_context_length default=0 time=2025-10-24T09:45:25.833+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=qwen3moe.norm_top_k_prob default=true time=2025-10-24T09:45:25.837+02:00 level=DEBUG source=ggml.go:837 msg="compute graph" nodes=2982 splits=1 time=2025-10-24T09:45:25.838+02:00 level=DEBUG source=device.go:211 msg="model weights" device=CPU size="12.9 GiB" time=2025-10-24T09:45:25.838+02:00 level=DEBUG source=device.go:222 msg="kv cache" device=CPU size="768.0 MiB" time=2025-10-24T09:45:25.838+02:00 level=DEBUG source=device.go:233 msg="compute graph" device=CPU size="84.0 MiB" time=2025-10-24T09:45:25.838+02:00 level=DEBUG source=device.go:238 msg="total memory" size="13.7 GiB" time=2025-10-24T09:45:25.839+02:00 level=DEBUG source=server.go:721 msg=memory success=true required.InputWeights=175030272 required.CPU.Weights="[273368064 300303360 300303360 297944064 27120537 6 273368064 271205376 273368064 271205376 271205376 271205376 271205376 271205376 271205376 271205376 271205376 271205376 271205376 271205376 271205376 271205376 297944064 271205376 271205376 271 205376 271205376 271205376 271205376 297944064 271205376 271205376 271205376 271205376 273368064 271205376 271205376 271205376 273368064 273368064 271205376 273368064 297944064 297944064 29794406 4 300106752 300303360 301417472 301417472 255260672]" required.CPU.Cache="[16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 167 77216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 0]" required.CPU.Graph=88080384 time=2025-10-24T09:45:25.840+02:00 level=DEBUG source=server.go:915 msg="available gpu" id=0 library=ROCm "available layer vram"="0 B" backoff=0.00 minimum="457.0 MiB" overhead="0 B" graph="0 B" time=2025-10-24T09:45:25.840+02:00 level=DEBUG source=server.go:990 msg="insufficient VRAM to load any model layers" time=2025-10-24T09:45:25.840+02:00 level=DEBUG source=server.go:732 msg="new layout created" layers=[] time=2025-10-24T09:45:25.841+02:00 level=INFO source=runner.go:1205 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:8192 KvCacheType: NumThreads :12 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2025-10-24T09:45:25.858+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32 time=2025-10-24T09:45:25.861+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=qwen3moe.pooling_type default=0 time=2025-10-24T09:45:25.861+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.bos_token_id default=0 time=2025-10-24T09:45:25.861+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.add_eos_token default=false time=2025-10-24T09:45:25.861+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}" time=2025-10-24T09:45:25.861+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=qwen3moe.rope.scaling.type default="" time=2025-10-24T09:45:25.861+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=qwen3moe.rope.scaling.factor default=1 time=2025-10-24T09:45:25.861+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=qwen3moe.rope.scaling.original_context_length default=0 time=2025-10-24T09:45:25.861+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=qwen3moe.norm_top_k_prob default=true time=2025-10-24T09:45:25.938+02:00 level=DEBUG source=ggml.go:837 msg="compute graph" nodes=2982 splits=1 time=2025-10-24T09:45:25.939+02:00 level=DEBUG source=device.go:211 msg="model weights" device=CPU size="12.9 GiB" time=2025-10-24T09:45:25.939+02:00 level=DEBUG source=device.go:222 msg="kv cache" device=CPU size="768.0 MiB" time=2025-10-24T09:45:25.939+02:00 level=DEBUG source=device.go:233 msg="compute graph" device=CPU size="84.0 MiB" time=2025-10-24T09:45:25.939+02:00 level=DEBUG source=device.go:238 msg="total memory" size="13.7 GiB" time=2025-10-24T09:45:25.939+02:00 level=DEBUG source=server.go:721 msg=memory success=true required.InputWeights=175030272 required.CPU.Weights="[273368064 300303360 300303360 297944064 27120537 6 273368064 271205376 273368064 271205376 271205376 271205376 271205376 271205376 271205376 271205376 271205376 271205376 271205376 271205376 271205376 271205376 297944064 271205376 271205376 271 205376 271205376 271205376 271205376 297944064 271205376 271205376 271205376 271205376 273368064 271205376 271205376 271205376 273368064 273368064 271205376 273368064 297944064 297944064 29794406 4 300106752 300303360 301417472 301417472 255260672]" required.CPU.Cache="[16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 167 77216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 16777216 0]" required.CPU.Graph=88080384 time=2025-10-24T09:45:25.940+02:00 level=DEBUG source=server.go:915 msg="available gpu" id=0 library=ROCm "available layer vram"="0 B" backoff=0.00 minimum="457.0 MiB" overhead="0 B" graph="0 B" time=2025-10-24T09:45:25.940+02:00 level=DEBUG source=server.go:990 msg="insufficient VRAM to load any model layers" time=2025-10-24T09:45:25.940+02:00 level=DEBUG source=server.go:732 msg="new layout created" layers=[] time=2025-10-24T09:45:25.941+02:00 level=INFO source=runner.go:1205 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:8192 KvCacheType: NumThread s:12 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2025-10-24T09:45:25.941+02:00 level=INFO source=ggml.go:480 msg="offloading 0 repeating layers to GPU" time=2025-10-24T09:45:25.941+02:00 level=INFO source=ggml.go:484 msg="offloading output layer to CPU" time=2025-10-24T09:45:25.941+02:00 level=INFO source=ggml.go:492 msg="offloaded 0/49 layers to GPU" time=2025-10-24T09:45:25.942+02:00 level=INFO source=device.go:211 msg="model weights" device=CPU size="12.9 GiB" time=2025-10-24T09:45:25.942+02:00 level=INFO source=device.go:222 msg="kv cache" device=CPU size="768.0 MiB" time=2025-10-24T09:45:25.943+02:00 level=INFO source=device.go:233 msg="compute graph" device=CPU size="84.0 MiB" time=2025-10-24T09:45:25.943+02:00 level=INFO source=device.go:238 msg="total memory" size="13.7 GiB" time=2025-10-24T09:45:25.943+02:00 level=INFO source=sched.go:482 msg="loaded runners" count=1 time=2025-10-24T09:45:25.943+02:00 level=INFO source=server.go:1272 msg="waiting for llama runner to start responding" time=2025-10-24T09:45:25.945+02:00 level=INFO source=server.go:1306 msg="waiting for server to become available" status="llm server loading model" time=2025-10-24T09:45:25.945+02:00 level=DEBUG source=server.go:1316 msg="model load progress 0.00" time=2025-10-24T09:45:26.197+02:00 level=DEBUG source=server.go:1316 msg="model load progress 0.06" time=2025-10-24T09:45:26.448+02:00 level=DEBUG source=server.go:1316 msg="model load progress 0.12" time=2025-10-24T09:45:26.699+02:00 level=DEBUG source=server.go:1316 msg="model load progress 0.18" time=2025-10-24T09:45:26.950+02:00 level=DEBUG source=server.go:1316 msg="model load progress 0.24" time=2025-10-24T09:45:27.202+02:00 level=DEBUG source=server.go:1316 msg="model load progress 0.30" time=2025-10-24T09:45:27.454+02:00 level=DEBUG source=server.go:1316 msg="model load progress 0.35" time=2025-10-24T09:45:27.706+02:00 level=DEBUG source=server.go:1316 msg="model load progress 0.41" time=2025-10-24T09:45:27.957+02:00 level=DEBUG source=server.go:1316 msg="model load progress 0.47" time=2025-10-24T09:45:28.208+02:00 level=DEBUG source=server.go:1316 msg="model load progress 0.53" time=2025-10-24T09:45:28.460+02:00 level=DEBUG source=server.go:1316 msg="model load progress 0.59" time=2025-10-24T09:45:28.712+02:00 level=DEBUG source=server.go:1316 msg="model load progress 0.65" time=2025-10-24T09:45:28.964+02:00 level=DEBUG source=server.go:1316 msg="model load progress 0.71" time=2025-10-24T09:45:29.215+02:00 level=DEBUG source=server.go:1316 msg="model load progress 0.77" time=2025-10-24T09:45:29.467+02:00 level=DEBUG source=server.go:1316 msg="model load progress 0.83" time=2025-10-24T09:45:29.718+02:00 level=DEBUG source=server.go:1316 msg="model load progress 0.89" time=2025-10-24T09:45:29.970+02:00 level=DEBUG source=server.go:1316 msg="model load progress 0.95" time=2025-10-24T09:45:30.205+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=qwen3moe.pooling_type default=0 time=2025-10-24T09:45:30.221+02:00 level=INFO source=server.go:1310 msg="llama runner started in 5.19 seconds" time=2025-10-24T09:45:30.222+02:00 level=DEBUG source=sched.go:494 msg="finished setting up" runner.name=hf.co/unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF:UD-Q3_K_XL runner.size="13.7 GiB" runner.v ram="0 B" runner.parallel=1 runner.pid=19568 runner.model=C:\Users\lukas\.ollama\models\blobs\sha256-69cd7578d77dffc0b17e34bad9ef998d08ae0e20ccceef21bad4e7eb3d8c553b runner.num_ctx=8192 time=2025-10-24T09:45:30.260+02:00 level=DEBUG source=server.go:1422 msg="completion request" images=0 prompt=7480 format="" time=2025-10-24T09:45:30.276+02:00 level=DEBUG source=cache.go:142 msg="loading cache slot" id=0 cache=0 prompt=1927 used=0 remaining=1927 [GIN] 2025/10/24 - 09:45:58 | 200 | 34.3001205s | 127.0.0.1 | POST "/api/chat" time=2025-10-24T09:45:58.424+02:00 level=DEBUG source=sched.go:502 msg="context for request finished" time=2025-10-24T09:45:58.424+02:00 level=DEBUG source=sched.go:294 msg="runner with non-zero duration has gone idle, adding timer" runner.name=hf.co/unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF:UD-Q 3_K_XL runner.size="13.7 GiB" runner.vram="0 B" runner.parallel=1 runner.pid=19568 runner.model=C:\Users\lukas\.ollama\models\blobs\sha256-69cd7578d77dffc0b17e34bad9ef998d08ae0e20ccceef21bad4e7eb 3d8c553b runner.num_ctx=8192 duration=30m0s time=2025-10-24T09:45:58.425+02:00 level=DEBUG source=sched.go:312 msg="after processing request finished event" runner.name=hf.co/unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF:UD-Q3_K_XL runner.size ="13.7 GiB" runner.vram="0 B" runner.parallel=1 runner.pid=19568 runner.model=C:\Users\lukas\.ollama\models\blobs\sha256-69cd7578d77dffc0b17e34bad9ef998d08ae0e20ccceef21bad4e7eb3d8c553b runner.nu m_ctx=8192 refCount=0 ```
Author
Owner

@ganakee commented on GitHub (Oct 24, 2025):

A similar issue appears with an AMD 6650 (10.3.0). Linux no longer sees the integrated GPU and falls-back to CPU with 0.12.6.

I temporarily re-installed 0.12.2. nvtop on Linux shows the AMD GPU working fine in 0.12.2 but no GPU usage in 0.12.6. Same setup. Same config.

As reported, perhaps ollama 0.12.6 does not seem to be seeing the environment variables--either from Linux .bashrc nor in a ollama.service via systemd.

I set HIP_VISIBLE_DEVICES and/or ROCR_VISIBLE_DEVICES in 12.6 with no apparent effect. (and also HFX OVERRIDE).

<!-- gh-comment-id:3443939682 --> @ganakee commented on GitHub (Oct 24, 2025): A similar issue appears with an AMD 6650 (10.3.0). Linux no longer sees the integrated GPU and falls-back to CPU with 0.12.6. I temporarily re-installed 0.12.2. `nvtop `on Linux shows the AMD GPU working fine in 0.12.2 but no GPU usage in 0.12.6. Same setup. Same config. As reported, perhaps ollama 0.12.6 does not seem to be seeing the environment variables--either from Linux `.bashrc `nor in a `ollama.service` via `systemd`. I set `HIP_VISIBLE_DEVICES `and/or `ROCR_VISIBLE_DEVICES` in 12.6 with no apparent effect. (and also HFX OVERRIDE).
Author
Owner

@dhiltgen commented on GitHub (Oct 25, 2025):

I think the problem is likely related to IDs getting mixed up due to the iGPU being hidden. ROCm enumerates ID=0 as the iGPU, then ID=1 as the discrete GPU, but once we filter out the iGPU, the discrete becomes 0, and somewhere when we're looking VRAM info that is getting mapped back to the iGPU incorrectly. This might be fixed on main already, but I'll try to repro and confirm.

<!-- gh-comment-id:3445298372 --> @dhiltgen commented on GitHub (Oct 25, 2025): I think the problem is likely related to IDs getting mixed up due to the iGPU being hidden. ROCm enumerates ID=0 as the iGPU, then ID=1 as the discrete GPU, but once we filter out the iGPU, the discrete becomes 0, and somewhere when we're looking VRAM info that is getting mapped back to the iGPU incorrectly. This might be fixed on main already, but I'll try to repro and confirm.
Author
Owner

@AwarePL commented on GitHub (Oct 25, 2025):

downgrade to 0.12.3 solved it!

time=2025-10-26T01:39:15.714+02:00 level=INFO source=images.go:518 msg="total blobs: 4"
time=2025-10-26T01:39:15.714+02:00 level=INFO source=images.go:525 msg="total unused blobs removed: 0"
time=2025-10-26T01:39:15.715+02:00 level=INFO source=routes.go:1528 msg="Listening on 127.0.0.1:11223 (version 0.12.3)"
time=2025-10-26T01:39:15.715+02:00 level=DEBUG source=sched.go:121 msg="starting llm scheduler"
time=2025-10-26T01:39:15.716+02:00 level=INFO source=gpu.go:217 msg="looking for compatible GPUs"
time=2025-10-26T01:39:15.716+02:00 level=INFO source=gpu_windows.go:167 msg=packages count=1
time=2025-10-26T01:39:15.716+02:00 level=INFO source=gpu_windows.go:214 msg="" package=0 cores=12 efficiency=0 threads=24
time=2025-10-26T01:39:15.716+02:00 level=DEBUG source=gpu.go:98 msg="searching for GPU discovery libraries for NVIDIA"
time=2025-10-26T01:39:15.716+02:00 level=DEBUG source=gpu.go:520 msg="Searching for GPU library" name=nvml.dll
time=2025-10-26T01:39:15.716+02:00 level=DEBUG source=gpu.go:544 msg="gpu library search" globs="[C:\\Users\\lukas\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\nvml.dll C:\\Program Files\\cmder\\vendor\\conemu-maximus5\\ConEmu\\Scripts\\nvml.dll C:\\Program Files\\cmder\\vendor\\conemu-maximus5\\nvml.dll C:\\Program Files\\cmder\\vendor\\conemu-maximus5\\ConEmu\\nvml.dll C:\\WINDOWS\\System32\\AMD\\nvml.dll C:\\Python311\\Scripts\\nvml.dll C:\\Python311\\nvml.dll C:\\WINDOWS\\system32\\nvml.dll C:\\WINDOWS\\nvml.dll C:\\WINDOWS\\System32\\Wbem\\nvml.dll C:\\WINDOWS\\System32\\WindowsPowerShell\\v1.0\\nvml.dll C:\\WINDOWS\\System32\\OpenSSH\\nvml.dll C:\\Program Files\\dotnet\\nvml.dll C:\\ProgramData\\chocolatey\\bin\\nvml.dll C:\\Users\\lukas\\AppData\\Roaming\\nvm\\nvml.dll C:\\Program Files\\nodejs\\nvml.dll C:\\Users\\lukas\\AppData\\Local\\Microsoft\\WindowsApps\\nvml.dll C:\\Users\\lukas\\AppData\\Roaming\\npm\\nvml.dll C:\\Program Files\\cmder\\nvml.dll C:\\Users\\lukas\\AppData\\Roaming\\nvm\\nvml.dll C:\\Program Files\\nodejs\\nvml.dll C:\\Users\\lukas\\AppData\\Local\\Programs\\Microsoft VS Code\\bin\\nvml.dll c:\\Users\\lukas\\AppData\\Local\\Programs\\cursor\\resources\\app\\bin\\nvml.dll c:\\Users\\lukas\\AppData\\Local\\Programs\\cursor\\resources\\app\\bin\\nvml.dll C:\\Program Files\\Docker\\Docker\\resources\\bin\\nvml.dll C:\\Program Files\\Git\\cmd\\nvml.dll C:\\Program Files\\PuTTY\\nvml.dll C:\\Program Files\\PowerShell\\7\\nvml.dll C:\\Users\\lukas\\AppData\\Local\\Microsoft\\WindowsApps\\nvml.dll C:\\Users\\lukas\\AppData\\Roaming\\npm\\nvml.dll C:\\Program Files\\cmder\\nvml.dll C:\\Users\\lukas\\AppData\\Roaming\\nvm\\nvml.dll C:\\Program Files\\nodejs\\nvml.dll C:\\Users\\lukas\\AppData\\Local\\Programs\\Microsoft VS Code\\bin\\nvml.dll C:\\Users\\lukas\\AppData\\Local\\Programs\\cursor\\resources\\app\\bin\\nvml.dll C:\\Users\\lukas\\AppData\\Local\\Programs\\Ollama\\nvml.dll C:\\Users\\lukas\\.lmstudio\\bin\\nvml.dll C:\\Users\\lukas\\.dotnet\\tools\\nvml.dll C:\\Users\\lukas\\AppData\\Local\\Microsoft\\WinGet\\Packages\\SST.opencode_Microsoft.Winget.Source_8wekyb3d8bbwe\\nvml.dll C:\\Users\\lukas\\AppData\\Local\\Microsoft\\WinGet\\Packages\\mostlygeek.llama-swap_Microsoft.Winget.Source_8wekyb3d8bbwe\\nvml.dll C:\\Users\\lukas\\AppData\\Local\\Microsoft\\WinGet\\Packages\\ggml.llamacpp_Microsoft.Winget.Source_8wekyb3d8bbwe\\nvml.dll C:\\Program Files\\Git\\mingw64\\bin\\nvml.dll C:\\Program Files\\Git\\usr\\bin\\nvml.dll C:\\Program Files\\cmder\\vendor\\bin\\nvml.dll c:\\Windows\\System32\\nvml.dll]"
time=2025-10-26T01:39:15.718+02:00 level=DEBUG source=gpu.go:577 msg="discovered GPU libraries" paths=[]
time=2025-10-26T01:39:15.718+02:00 level=DEBUG source=gpu.go:520 msg="Searching for GPU library" name=nvcuda.dll
time=2025-10-26T01:39:15.718+02:00 level=DEBUG source=gpu.go:544 msg="gpu library search" globs="[C:\\Users\\lukas\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\nvcuda.dll C:\\Program Files\\cmder\\vendor\\conemu-maximus5\\ConEmu\\Scripts\\nvcuda.dll C:\\Program Files\\cmder\\vendor\\conemu-maximus5\\nvcuda.dll C:\\Program Files\\cmder\\vendor\\conemu-maximus5\\ConEmu\\nvcuda.dll C:\\WINDOWS\\System32\\AMD\\nvcuda.dll C:\\Python311\\Scripts\\nvcuda.dll C:\\Python311\\nvcuda.dll C:\\WINDOWS\\system32\\nvcuda.dll C:\\WINDOWS\\nvcuda.dll C:\\WINDOWS\\System32\\Wbem\\nvcuda.dll C:\\WINDOWS\\System32\\WindowsPowerShell\\v1.0\\nvcuda.dll C:\\WINDOWS\\System32\\OpenSSH\\nvcuda.dll C:\\Program Files\\dotnet\\nvcuda.dll C:\\ProgramData\\chocolatey\\bin\\nvcuda.dll C:\\Users\\lukas\\AppData\\Roaming\\nvm\\nvcuda.dll C:\\Program Files\\nodejs\\nvcuda.dll C:\\Users\\lukas\\AppData\\Local\\Microsoft\\WindowsApps\\nvcuda.dll C:\\Users\\lukas\\AppData\\Roaming\\npm\\nvcuda.dll C:\\Program Files\\cmder\\nvcuda.dll C:\\Users\\lukas\\AppData\\Roaming\\nvm\\nvcuda.dll C:\\Program Files\\nodejs\\nvcuda.dll C:\\Users\\lukas\\AppData\\Local\\Programs\\Microsoft VS Code\\bin\\nvcuda.dll c:\\Users\\lukas\\AppData\\Local\\Programs\\cursor\\resources\\app\\bin\\nvcuda.dll c:\\Users\\lukas\\AppData\\Local\\Programs\\cursor\\resources\\app\\bin\\nvcuda.dll C:\\Program Files\\Docker\\Docker\\resources\\bin\\nvcuda.dll C:\\Program Files\\Git\\cmd\\nvcuda.dll C:\\Program Files\\PuTTY\\nvcuda.dll C:\\Program Files\\PowerShell\\7\\nvcuda.dll C:\\Users\\lukas\\AppData\\Local\\Microsoft\\WindowsApps\\nvcuda.dll C:\\Users\\lukas\\AppData\\Roaming\\npm\\nvcuda.dll C:\\Program Files\\cmder\\nvcuda.dll C:\\Users\\lukas\\AppData\\Roaming\\nvm\\nvcuda.dll C:\\Program Files\\nodejs\\nvcuda.dll C:\\Users\\lukas\\AppData\\Local\\Programs\\Microsoft VS Code\\bin\\nvcuda.dll C:\\Users\\lukas\\AppData\\Local\\Programs\\cursor\\resources\\app\\bin\\nvcuda.dll C:\\Users\\lukas\\AppData\\Local\\Programs\\Ollama\\nvcuda.dll C:\\Users\\lukas\\.lmstudio\\bin\\nvcuda.dll C:\\Users\\lukas\\.dotnet\\tools\\nvcuda.dll C:\\Users\\lukas\\AppData\\Local\\Microsoft\\WinGet\\Packages\\SST.opencode_Microsoft.Winget.Source_8wekyb3d8bbwe\\nvcuda.dll C:\\Users\\lukas\\AppData\\Local\\Microsoft\\WinGet\\Packages\\mostlygeek.llama-swap_Microsoft.Winget.Source_8wekyb3d8bbwe\\nvcuda.dll C:\\Users\\lukas\\AppData\\Local\\Microsoft\\WinGet\\Packages\\ggml.llamacpp_Microsoft.Winget.Source_8wekyb3d8bbwe\\nvcuda.dll C:\\Program Files\\Git\\mingw64\\bin\\nvcuda.dll C:\\Program Files\\Git\\usr\\bin\\nvcuda.dll C:\\Program Files\\cmder\\vendor\\bin\\nvcuda.dll c:\\windows\\system*\\nvcuda.dll]"
time=2025-10-26T01:39:15.721+02:00 level=DEBUG source=gpu.go:577 msg="discovered GPU libraries" paths=[]
time=2025-10-26T01:39:15.721+02:00 level=DEBUG source=gpu.go:520 msg="Searching for GPU library" name=cudart64_*.dll
time=2025-10-26T01:39:15.721+02:00 level=DEBUG source=gpu.go:544 msg="gpu library search" globs="[C:\\Users\\lukas\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\cudart64_*.dll C:\\Program Files\\cmder\\vendor\\conemu-maximus5\\ConEmu\\Scripts\\cudart64_*.dll C:\\Program Files\\cmder\\vendor\\conemu-maximus5\\cudart64_*.dll C:\\Program Files\\cmder\\vendor\\conemu-maximus5\\ConEmu\\cudart64_*.dll C:\\WINDOWS\\System32\\AMD\\cudart64_*.dll C:\\Python311\\Scripts\\cudart64_*.dll C:\\Python311\\cudart64_*.dll C:\\WINDOWS\\system32\\cudart64_*.dll C:\\WINDOWS\\cudart64_*.dll C:\\WINDOWS\\System32\\Wbem\\cudart64_*.dll C:\\WINDOWS\\System32\\WindowsPowerShell\\v1.0\\cudart64_*.dll C:\\WINDOWS\\System32\\OpenSSH\\cudart64_*.dll C:\\Program Files\\dotnet\\cudart64_*.dll C:\\ProgramData\\chocolatey\\bin\\cudart64_*.dll C:\\Users\\lukas\\AppData\\Roaming\\nvm\\cudart64_*.dll C:\\Program Files\\nodejs\\cudart64_*.dll C:\\Users\\lukas\\AppData\\Local\\Microsoft\\WindowsApps\\cudart64_*.dll C:\\Users\\lukas\\AppData\\Roaming\\npm\\cudart64_*.dll C:\\Program Files\\cmder\\cudart64_*.dll C:\\Users\\lukas\\AppData\\Roaming\\nvm\\cudart64_*.dll C:\\Program Files\\nodejs\\cudart64_*.dll C:\\Users\\lukas\\AppData\\Local\\Programs\\Microsoft VS Code\\bin\\cudart64_*.dll c:\\Users\\lukas\\AppData\\Local\\Programs\\cursor\\resources\\app\\bin\\cudart64_*.dll c:\\Users\\lukas\\AppData\\Local\\Programs\\cursor\\resources\\app\\bin\\cudart64_*.dll C:\\Program Files\\Docker\\Docker\\resources\\bin\\cudart64_*.dll C:\\Program Files\\Git\\cmd\\cudart64_*.dll C:\\Program Files\\PuTTY\\cudart64_*.dll C:\\Program Files\\PowerShell\\7\\cudart64_*.dll C:\\Users\\lukas\\AppData\\Local\\Microsoft\\WindowsApps\\cudart64_*.dll C:\\Users\\lukas\\AppData\\Roaming\\npm\\cudart64_*.dll C:\\Program Files\\cmder\\cudart64_*.dll C:\\Users\\lukas\\AppData\\Roaming\\nvm\\cudart64_*.dll C:\\Program Files\\nodejs\\cudart64_*.dll C:\\Users\\lukas\\AppData\\Local\\Programs\\Microsoft VS Code\\bin\\cudart64_*.dll C:\\Users\\lukas\\AppData\\Local\\Programs\\cursor\\resources\\app\\bin\\cudart64_*.dll C:\\Users\\lukas\\AppData\\Local\\Programs\\Ollama\\cudart64_*.dll C:\\Users\\lukas\\.lmstudio\\bin\\cudart64_*.dll C:\\Users\\lukas\\.dotnet\\tools\\cudart64_*.dll C:\\Users\\lukas\\AppData\\Local\\Microsoft\\WinGet\\Packages\\SST.opencode_Microsoft.Winget.Source_8wekyb3d8bbwe\\cudart64_*.dll C:\\Users\\lukas\\AppData\\Local\\Microsoft\\WinGet\\Packages\\mostlygeek.llama-swap_Microsoft.Winget.Source_8wekyb3d8bbwe\\cudart64_*.dll C:\\Users\\lukas\\AppData\\Local\\Microsoft\\WinGet\\Packages\\ggml.llamacpp_Microsoft.Winget.Source_8wekyb3d8bbwe\\cudart64_*.dll C:\\Program Files\\Git\\mingw64\\bin\\cudart64_*.dll C:\\Program Files\\Git\\usr\\bin\\cudart64_*.dll C:\\Program Files\\cmder\\vendor\\bin\\cudart64_*.dll C:\\Users\\lukas\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\cuda_v*\\cudart64_*.dll c:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v*\\bin\\cudart64_*.dll]"
time=2025-10-26T01:39:15.730+02:00 level=DEBUG source=gpu.go:577 msg="discovered GPU libraries" paths=[C:\Users\lukas\AppData\Local\Programs\Ollama\lib\ollama\cuda_v12\cudart64_12.dll]
cudaSetDevice err: 35
time=2025-10-26T01:39:15.732+02:00 level=DEBUG source=gpu.go:593 msg="Unable to load cudart library C:\\Users\\lukas\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\cuda_v12\\cudart64_12.dll: your nvidia driver is too old or missing.  If you have a CUDA GPU please upgrade to run ollama"
time=2025-10-26T01:39:15.752+02:00 level=DEBUG source=amd_hip_windows.go:88 msg=hipDriverGetVersion version=60450101
time=2025-10-26T01:39:15.752+02:00 level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir C:\\Users\\lukas\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\rocm"
time=2025-10-26T01:39:15.752+02:00 level=DEBUG source=amd_common.go:44 msg="detected ROCM next to ollama executable C:\\Users\\lukas\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\rocm"
time=2025-10-26T01:39:15.754+02:00 level=DEBUG source=amd_windows.go:73 msg="detected hip devices" count=2
time=2025-10-26T01:39:15.754+02:00 level=DEBUG source=amd_windows.go:93 msg="hip device" id=0 name="AMD Radeon(TM) Graphics" gfx=gfx1036
time=2025-10-26T01:39:16.070+02:00 level=INFO source=amd_windows.go:128 msg="unsupported Radeon iGPU detected skipping" id=0 total="24.2 GiB"
time=2025-10-26T01:39:16.070+02:00 level=DEBUG source=amd_windows.go:93 msg="hip device" id=1 name="AMD Radeon RX 7900 XTX" gfx=gfx1100
time=2025-10-26T01:39:16.361+02:00 level=DEBUG source=amd_windows.go:147 msg="amdgpu is supported" gpu=1 gpu_type=gfx1100
time=2025-10-26T01:39:16.361+02:00 level=DEBUG source=amd_windows.go:150 msg="amdgpu memory" gpu=1 total="24.0 GiB"
time=2025-10-26T01:39:16.361+02:00 level=DEBUG source=amd_windows.go:151 msg="amdgpu memory" gpu=1 available="23.8 GiB"
time=2025-10-26T01:39:16.362+02:00 level=INFO source=types.go:131 msg="inference compute" id=0 library=rocm variant="" compute=gfx1100 driver=6.4 name="AMD Radeon RX 7900 XTX" total="24.0 GiB" available="23.8 GiB"
time=2025-10-26T01:39:32.045+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32
[GIN] 2025/10/26 - 01:39:32 | 200 |     40.6336ms |       127.0.0.1 | POST     "/api/show"
time=2025-10-26T01:39:32.454+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32
[GIN] 2025/10/26 - 01:39:32 | 200 |     37.5929ms |       127.0.0.1 | POST     "/api/show"
time=2025-10-26T01:39:49.787+02:00 level=DEBUG source=gpu.go:410 msg="updating system memory data" before.total="63.1 GiB" before.free="48.7 GiB" before.free_swap="49.0 GiB" now.total="63.1 GiB" now.free="46.7 GiB" now.free_swap="46.4 GiB"
time=2025-10-26T01:39:50.126+02:00 level=DEBUG source=amd_windows.go:198 msg="updating rocm free memory" gpu=0 name="AMD Radeon RX 7900 XTX" before="23.8 GiB" now="23.7 GiB"
time=2025-10-26T01:39:50.126+02:00 level=INFO source=sched.go:192 msg="one or more GPUs detected that are unable to accurately report free memory - disabling default concurrency"
time=2025-10-26T01:39:50.135+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32
time=2025-10-26T01:39:50.136+02:00 level=DEBUG source=sched.go:208 msg="loading first model" model=C:\Users\lukas\.ollama\models\blobs\sha256-69cd7578d77dffc0b17e34bad9ef998d08ae0e20ccceef21bad4e7eb3d8c553b
time=2025-10-26T01:39:50.170+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32
time=2025-10-26T01:39:50.171+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=qwen3moe.pooling_type default=0
time=2025-10-26T01:39:50.171+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.bos_token_id default=0
time=2025-10-26T01:39:50.171+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.add_eos_token default=false
time=2025-10-26T01:39:50.171+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}"
time=2025-10-26T01:39:50.173+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=qwen3moe.rope.scaling.factor default=1
time=2025-10-26T01:39:50.173+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=qwen3moe.norm_top_k_prob default=true
time=2025-10-26T01:39:50.173+02:00 level=DEBUG source=gpu.go:410 msg="updating system memory data" before.total="63.1 GiB" before.free="46.7 GiB" before.free_swap="46.4 GiB" now.total="63.1 GiB" now.free="46.7 GiB" now.free_swap="46.2 GiB"
time=2025-10-26T01:39:50.501+02:00 level=DEBUG source=amd_windows.go:198 msg="updating rocm free memory" gpu=0 name="AMD Radeon RX 7900 XTX" before="23.7 GiB" now="23.5 GiB"
time=2025-10-26T01:39:50.508+02:00 level=DEBUG source=server.go:324 msg="adding gpu library" path=C:\Users\lukas\AppData\Local\Programs\Ollama\lib\ollama\rocm
time=2025-10-26T01:39:50.508+02:00 level=DEBUG source=server.go:332 msg="adding gpu dependency paths" paths=[C:\Users\lukas\AppData\Local\Programs\Ollama\lib\ollama\rocm]
time=2025-10-26T01:39:50.508+02:00 level=INFO source=server.go:399 msg="starting runner" cmd="C:\\Users\\lukas\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --model C:\\Users\\lukas\\.ollama\\models\\blobs\\sha256-69cd7578d77dffc0b17e34bad9ef998d08ae0e20ccceef21bad4e7eb3d8c553b --port 50373"
time=2025-10-26T01:39:50.509+02:00 level=DEBUG source=server.go:400 msg=subprocess HIP_PATH="C:\\Program Files\\AMD\\ROCm\\6.4\\" HIP_PATH_64="C:\\Program Files\\AMD\\ROCm\\6.4\\" OLLAMA_CONTEXT_LENGTH=16000 OLLAMA_DEBUG=1 OLLAMA_GPU_OVERHEAD=0 OLLAMA_HOST=127.0.0.1:11223 OLLAMA_LOAD_TIMEOUT=5m0s OLLAMA_MAX_LOADED_MODELS=1 OLLAMA_NUM_GPU=999 OLLAMA_NUM_PARALLEL=1 PATH="C:\\Users\\lukas\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\rocm;C:\\Users\\lukas\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\rocm;C:\\Users\\lukas\\AppData\\Local\\Programs\\Ollama\\lib\\ollama;C:\\Program Files\\cmder\\vendor\\conemu-maximus5\\ConEmu\\Scripts;C:\\Program Files\\cmder\\vendor\\conemu-maximus5;C:\\Program Files\\cmder\\vendor\\conemu-maximus5\\ConEmu;C:\\WINDOWS\\System32\\AMD;C:\\Python311\\Scripts\\;C:\\Python311\\;C:\\WINDOWS\\system32;C:\\WINDOWS;C:\\WINDOWS\\System32\\Wbem;C:\\WINDOWS\\System32\\WindowsPowerShell\\v1.0\\;C:\\WINDOWS\\System32\\OpenSSH\\;C:\\Program Files\\dotnet\\;C:\\ProgramData\\chocolatey\\bin;C:\\Users\\lukas\\AppData\\Roaming\\nvm;C:\\Program Files\\nodejs;C:\\Users\\lukas\\AppData\\Local\\Microsoft\\WindowsApps;C:\\Users\\lukas\\AppData\\Roaming\\npm;C:\\Program Files\\cmder;C:\\Users\\lukas\\AppData\\Roaming\\nvm;C:\\Program Files\\nodejs;C:\\Users\\lukas\\AppData\\Local\\Programs\\Microsoft VS Code\\bin;c:\\Users\\lukas\\AppData\\Local\\Programs\\cursor\\resources\\app\\bin;c:\\Users\\lukas\\AppData\\Local\\Programs\\cursor\\resources\\app\\bin;C:\\Program Files\\Docker\\Docker\\resources\\bin;C:\\Program Files\\Git\\cmd;C:\\Program Files\\PuTTY\\;C:\\Program Files\\PowerShell\\7\\;C:\\Users\\lukas\\AppData\\Local\\Microsoft\\WindowsApps;C:\\Users\\lukas\\AppData\\Roaming\\npm;C:\\Program Files\\cmder;C:\\Users\\lukas\\AppData\\Roaming\\nvm;C:\\Program Files\\nodejs;C:\\Users\\lukas\\AppData\\Local\\Programs\\Microsoft VS Code\\bin;C:\\Users\\lukas\\AppData\\Local\\Programs\\cursor\\resources\\app\\bin;C:\\Users\\lukas\\AppData\\Local\\Programs\\Ollama;C:\\Users\\lukas\\.lmstudio\\bin;C:\\Users\\lukas\\.dotnet\\tools;C:\\Users\\lukas\\AppData\\Local\\Microsoft\\WinGet\\Packages\\SST.opencode_Microsoft.Winget.Source_8wekyb3d8bbwe;C:\\Users\\lukas\\AppData\\Local\\Microsoft\\WinGet\\Packages\\mostlygeek.llama-swap_Microsoft.Winget.Source_8wekyb3d8bbwe;C:\\Users\\lukas\\AppData\\Local\\Microsoft\\WinGet\\Packages\\ggml.llamacpp_Microsoft.Winget.Source_8wekyb3d8bbwe;C:\\Program Files\\Git\\mingw64\\bin;C:\\Program Files\\Git\\usr\\bin;C:\\Program Files\\cmder\\vendor\\bin;C:\\Users\\lukas\\AppData\\Local\\Programs\\Ollama\\lib\\ollama" OLLAMA_LIBRARY_PATH=C:\Users\lukas\AppData\Local\Programs\Ollama\lib\ollama;C:\Users\lukas\AppData\Local\Programs\Ollama\lib\ollama\rocm HIP_VISIBLE_DEVICES=1
time=2025-10-26T01:39:50.512+02:00 level=INFO source=server.go:672 msg="loading model" "model layers"=49 requested=-1
time=2025-10-26T01:39:50.513+02:00 level=DEBUG source=gpu.go:410 msg="updating system memory data" before.total="63.1 GiB" before.free="46.7 GiB" before.free_swap="46.2 GiB" now.total="63.1 GiB" now.free="46.7 GiB" now.free_swap="46.0 GiB"
time=2025-10-26T01:39:50.546+02:00 level=INFO source=runner.go:1252 msg="starting ollama engine"
time=2025-10-26T01:39:50.554+02:00 level=INFO source=runner.go:1287 msg="Server listening on 127.0.0.1:50373"
time=2025-10-26T01:39:50.828+02:00 level=DEBUG source=amd_windows.go:198 msg="updating rocm free memory" gpu=0 name="AMD Radeon RX 7900 XTX" before="23.5 GiB" now="23.4 GiB"
time=2025-10-26T01:39:50.829+02:00 level=INFO source=server.go:678 msg="system memory" total="63.1 GiB" free="46.7 GiB" free_swap="46.0 GiB"
time=2025-10-26T01:39:50.829+02:00 level=INFO source=server.go:686 msg="gpu memory" id=0 available="23.2 GiB" free="23.7 GiB" minimum="457.0 MiB" overhead="0 B"
time=2025-10-26T01:39:50.832+02:00 level=INFO source=runner.go:1171 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:false KvSize:8192 KvCacheType: NumThreads:12 GPULayers:49[ID:0 Layers:49(0..48)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2025-10-26T01:39:50.847+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32
time=2025-10-26T01:39:50.848+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.description default=""
time=2025-10-26T01:39:50.848+02:00 level=INFO source=ggml.go:131 msg="" architecture=qwen3moe file_type=Q3_K_M name=Qwen3-Coder-30B-A3B-Instruct description="" num_tensors=579 num_key_values=45
time=2025-10-26T01:39:50.848+02:00 level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=C:\Users\lukas\AppData\Local\Programs\Ollama\lib\ollama
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
  Device 0: AMD Radeon RX 7900 XTX, gfx1100 (0x1100), VMM: no, Wave Size: 32, ID: 0
load_backend: loaded ROCm backend from C:\Users\lukas\AppData\Local\Programs\Ollama\lib\ollama\ggml-hip.dll
load_backend: loaded CPU backend from C:\Users\lukas\AppData\Local\Programs\Ollama\lib\ollama\ggml-cpu-icelake.dll
time=2025-10-26T01:39:50.904+02:00 level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=C:\Users\lukas\AppData\Local\Programs\Ollama\lib\ollama\rocm
time=2025-10-26T01:39:50.905+02:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.AVX512=1 CPU.0.AVX512_VBMI=1 CPU.0.AVX512_VNNI=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 ROCm.0.NO_VMM=1 ROCm.0.NO_PEER_COPY=1 ROCm.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(clang)
time=2025-10-26T01:39:50.907+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=qwen3moe.pooling_type default=0
time=2025-10-26T01:39:50.907+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.bos_token_id default=0
time=2025-10-26T01:39:50.907+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.add_eos_token default=false
time=2025-10-26T01:39:50.907+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}"
time=2025-10-26T01:39:50.907+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=qwen3moe.rope.scaling.factor default=1
time=2025-10-26T01:39:50.907+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=qwen3moe.norm_top_k_prob default=true
time=2025-10-26T01:39:51.243+02:00 level=DEBUG source=ggml.go:794 msg="compute graph" nodes=3174 splits=2
time=2025-10-26T01:39:51.243+02:00 level=DEBUG source=backend.go:310 msg="model weights" device=ROCm0 size="12.7 GiB"
time=2025-10-26T01:39:51.243+02:00 level=DEBUG source=backend.go:315 msg="model weights" device=CPU size="166.9 MiB"
time=2025-10-26T01:39:51.243+02:00 level=DEBUG source=backend.go:321 msg="kv cache" device=ROCm0 size="768.0 MiB"
time=2025-10-26T01:39:51.246+02:00 level=DEBUG source=backend.go:332 msg="compute graph" device=ROCm0 size="552.0 MiB"
time=2025-10-26T01:39:51.246+02:00 level=DEBUG source=backend.go:337 msg="compute graph" device=CPU size="4.0 MiB"
time=2025-10-26T01:39:51.246+02:00 level=DEBUG source=backend.go:342 msg="total memory" size="14.1 GiB"
time=2025-10-26T01:39:51.246+02:00 level=DEBUG source=server.go:717 msg=memory success=true required.InputWeights=175030272U required.CPU.Graph=4194304U required.ROCm0.ID=0 required.ROCm0.Weights="[273368192U 300303616U 300303616U 297944320U 271205504U 273368192U 271205504U 273368192U 271205504U 271205504U 271205504U 271205504U 271205504U 271205504U 271205504U 271205504U 271205504U 271205504U 271205504U 271205504U 271205504U 297944320U 271205504U 271205504U 271205504U 271205504U 271205504U 271205504U 297944320U 271205504U 271205504U 271205504U 271205504U 273368192U 271205504U 271205504U 271205504U 273368192U 273368192U 271205504U 273368192U 297944320U 297944320U 297944320U 300107008U 300303616U 301417728U 301417728U 255260672U]" required.ROCm0.Cache="[16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 0U]" required.ROCm0.Graph=578816128U
time=2025-10-26T01:39:51.247+02:00 level=DEBUG source=server.go:894 msg="available gpu" id=0 "available layer vram"="22.7 GiB" backoff=0.00 minimum="457.0 MiB" overhead="0 B" graph="552.0 MiB"
time=2025-10-26T01:39:51.247+02:00 level=DEBUG source=server.go:728 msg="new layout created" layers="49[ID:0 Layers:49(0..48)]"
time=2025-10-26T01:39:51.248+02:00 level=INFO source=runner.go:1171 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:false KvSize:8192 KvCacheType: NumThreads:12 GPULayers:49[ID:0 Layers:49(0..48)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2025-10-26T01:39:51.262+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32
time=2025-10-26T01:39:51.524+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=qwen3moe.pooling_type default=0
time=2025-10-26T01:39:51.524+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.bos_token_id default=0
time=2025-10-26T01:39:51.524+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.add_eos_token default=false
time=2025-10-26T01:39:51.524+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}"
time=2025-10-26T01:39:51.524+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=qwen3moe.rope.scaling.factor default=1
time=2025-10-26T01:39:51.524+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=qwen3moe.norm_top_k_prob default=true
time=2025-10-26T01:39:51.608+02:00 level=DEBUG source=ggml.go:794 msg="compute graph" nodes=3174 splits=2
time=2025-10-26T01:39:51.610+02:00 level=DEBUG source=backend.go:310 msg="model weights" device=ROCm0 size="12.7 GiB"
time=2025-10-26T01:39:51.610+02:00 level=DEBUG source=backend.go:315 msg="model weights" device=CPU size="166.9 MiB"
time=2025-10-26T01:39:51.610+02:00 level=DEBUG source=backend.go:321 msg="kv cache" device=ROCm0 size="768.0 MiB"
time=2025-10-26T01:39:51.610+02:00 level=DEBUG source=backend.go:332 msg="compute graph" device=ROCm0 size="552.0 MiB"
time=2025-10-26T01:39:51.610+02:00 level=DEBUG source=backend.go:337 msg="compute graph" device=CPU size="4.0 MiB"
time=2025-10-26T01:39:51.611+02:00 level=DEBUG source=backend.go:342 msg="total memory" size="14.1 GiB"
time=2025-10-26T01:39:51.611+02:00 level=DEBUG source=server.go:717 msg=memory success=true required.InputWeights=175030272A required.CPU.Graph=4194304A required.ROCm0.ID=0 required.ROCm0.Weights="[273368192A 300303616A 300303616A 297944320A 271205504A 273368192A 271205504A 273368192A 271205504A 271205504A 271205504A 271205504A 271205504A 271205504A 271205504A 271205504A 271205504A 271205504A 271205504A 271205504A 271205504A 297944320A 271205504A 271205504A 271205504A 271205504A 271205504A 271205504A 297944320A 271205504A 271205504A 271205504A 271205504A 273368192A 271205504A 271205504A 271205504A 273368192A 273368192A 271205504A 273368192A 297944320A 297944320A 297944320A 300107008A 300303616A 301417728A 301417728A 255260672A]" required.ROCm0.Cache="[16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 0U]" required.ROCm0.Graph=578816128A
time=2025-10-26T01:39:51.611+02:00 level=DEBUG source=server.go:894 msg="available gpu" id=0 "available layer vram"="22.7 GiB" backoff=0.00 minimum="457.0 MiB" overhead="0 B" graph="552.0 MiB"
time=2025-10-26T01:39:51.611+02:00 level=DEBUG source=server.go:728 msg="new layout created" layers="49[ID:0 Layers:49(0..48)]"
time=2025-10-26T01:39:51.612+02:00 level=INFO source=runner.go:1171 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:false KvSize:8192 KvCacheType: NumThreads:12 GPULayers:49[ID:0 Layers:49(0..48)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2025-10-26T01:39:51.612+02:00 level=INFO source=ggml.go:487 msg="offloading 48 repeating layers to GPU"
time=2025-10-26T01:39:51.612+02:00 level=INFO source=ggml.go:493 msg="offloading output layer to GPU"
time=2025-10-26T01:39:51.612+02:00 level=INFO source=ggml.go:498 msg="offloaded 49/49 layers to GPU"
time=2025-10-26T01:39:51.613+02:00 level=INFO source=backend.go:310 msg="model weights" device=ROCm0 size="12.7 GiB"
time=2025-10-26T01:39:51.614+02:00 level=INFO source=backend.go:315 msg="model weights" device=CPU size="166.9 MiB"
time=2025-10-26T01:39:51.614+02:00 level=INFO source=backend.go:321 msg="kv cache" device=ROCm0 size="768.0 MiB"
time=2025-10-26T01:39:51.614+02:00 level=INFO source=backend.go:332 msg="compute graph" device=ROCm0 size="552.0 MiB"
time=2025-10-26T01:39:51.615+02:00 level=INFO source=backend.go:337 msg="compute graph" device=CPU size="4.0 MiB"
time=2025-10-26T01:39:51.615+02:00 level=INFO source=backend.go:342 msg="total memory" size="14.1 GiB"
time=2025-10-26T01:39:51.616+02:00 level=INFO source=sched.go:470 msg="loaded runners" count=1
time=2025-10-26T01:39:51.616+02:00 level=INFO source=server.go:1251 msg="waiting for llama runner to start responding"
time=2025-10-26T01:39:51.617+02:00 level=INFO source=server.go:1285 msg="waiting for server to become available" status="llm server loading model"
time=2025-10-26T01:39:51.617+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.00"
time=2025-10-26T01:39:51.871+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.05"
time=2025-10-26T01:39:52.126+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.09"
time=2025-10-26T01:39:52.378+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.14"
time=2025-10-26T01:39:52.631+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.18"
time=2025-10-26T01:39:52.885+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.22"
time=2025-10-26T01:39:53.138+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.27"
time=2025-10-26T01:39:53.393+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.31"
time=2025-10-26T01:39:53.655+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.36"
time=2025-10-26T01:39:53.912+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.41"
time=2025-10-26T01:39:54.165+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.45"
time=2025-10-26T01:39:54.420+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.49"
time=2025-10-26T01:39:54.674+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.54"
time=2025-10-26T01:39:54.927+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.59"
time=2025-10-26T01:39:55.183+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.63"
time=2025-10-26T01:39:55.440+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.67"
time=2025-10-26T01:39:55.694+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.72"
time=2025-10-26T01:39:55.951+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.77"
time=2025-10-26T01:39:56.208+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.81"
time=2025-10-26T01:39:56.465+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.85"
time=2025-10-26T01:39:56.731+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.90"
time=2025-10-26T01:39:56.995+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.94"
time=2025-10-26T01:39:57.247+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.98"
time=2025-10-26T01:39:57.346+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=qwen3moe.pooling_type default=0
time=2025-10-26T01:39:57.499+02:00 level=INFO source=server.go:1289 msg="llama runner started in 6.99 seconds"
time=2025-10-26T01:39:57.499+02:00 level=DEBUG source=sched.go:482 msg="finished setting up" runner.name=hf.co/unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF:UD-Q3_K_XL runner.inference=rocm runner.devices=1 runner.size="14.1 GiB" runner.vram="14.1 GiB" runner.parallel=1 runner.pid=6692 runner.model=C:\Users\lukas\.ollama\models\blobs\sha256-69cd7578d77dffc0b17e34bad9ef998d08ae0e20ccceef21bad4e7eb3d8c553b runner.num_ctx=8192
time=2025-10-26T01:39:57.541+02:00 level=DEBUG source=server.go:1388 msg="completion request" images=0 prompt=7795 format=""
time=2025-10-26T01:39:57.556+02:00 level=DEBUG source=cache.go:142 msg="loading cache slot" id=0 cache=0 prompt=2024 used=0 remaining=2024
[GIN] 2025/10/26 - 01:40:01 | 200 |   11.6872713s |       127.0.0.1 | POST     "/api/chat"
time=2025-10-26T01:40:01.415+02:00 level=DEBUG source=sched.go:490 msg="context for request finished"
time=2025-10-26T01:40:01.416+02:00 level=DEBUG source=sched.go:286 msg="runner with non-zero duration has gone idle, adding timer" runner.name=hf.co/unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF:UD-Q3_K_XL runner.inference=rocm runner.devices=1 runner.size="14.1 GiB" runner.vram="14.1 GiB" runner.parallel=1 runner.pid=6692 runner.model=C:\Users\lukas\.ollama\models\blobs\sha256-69cd7578d77dffc0b17e34bad9ef998d08ae0e20ccceef21bad4e7eb3d8c553b runner.num_ctx=8192 duration=30m0s
time=2025-10-26T01:40:01.416+02:00 level=DEBUG source=sched.go:304 msg="after processing request finished event" runner.name=hf.co/unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF:UD-Q3_K_XL runner.inference=rocm runner.devices=1 runner.size="14.1 GiB" runner.vram="14.1 GiB" runner.parallel=1 runner.pid=6692 runner.model=C:\Users\lukas\.ollama\models\blobs\sha256-69cd7578d77dffc0b17e34bad9ef998d08ae0e20ccceef21bad4e7eb3d8c553b runner.num_ctx=8192 refCount=0```
<!-- gh-comment-id:3447868573 --> @AwarePL commented on GitHub (Oct 25, 2025): downgrade to 0.12.3 solved it! ```time=2025-10-26T01:39:15.706+02:00 level=INFO source=routes.go:1475 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:16000 OLLAMA_DEBUG:DEBUG OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11223 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:C:\\Users\\lukas\\.ollama\\models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES:]" time=2025-10-26T01:39:15.714+02:00 level=INFO source=images.go:518 msg="total blobs: 4" time=2025-10-26T01:39:15.714+02:00 level=INFO source=images.go:525 msg="total unused blobs removed: 0" time=2025-10-26T01:39:15.715+02:00 level=INFO source=routes.go:1528 msg="Listening on 127.0.0.1:11223 (version 0.12.3)" time=2025-10-26T01:39:15.715+02:00 level=DEBUG source=sched.go:121 msg="starting llm scheduler" time=2025-10-26T01:39:15.716+02:00 level=INFO source=gpu.go:217 msg="looking for compatible GPUs" time=2025-10-26T01:39:15.716+02:00 level=INFO source=gpu_windows.go:167 msg=packages count=1 time=2025-10-26T01:39:15.716+02:00 level=INFO source=gpu_windows.go:214 msg="" package=0 cores=12 efficiency=0 threads=24 time=2025-10-26T01:39:15.716+02:00 level=DEBUG source=gpu.go:98 msg="searching for GPU discovery libraries for NVIDIA" time=2025-10-26T01:39:15.716+02:00 level=DEBUG source=gpu.go:520 msg="Searching for GPU library" name=nvml.dll time=2025-10-26T01:39:15.716+02:00 level=DEBUG source=gpu.go:544 msg="gpu library search" globs="[C:\\Users\\lukas\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\nvml.dll C:\\Program Files\\cmder\\vendor\\conemu-maximus5\\ConEmu\\Scripts\\nvml.dll C:\\Program Files\\cmder\\vendor\\conemu-maximus5\\nvml.dll C:\\Program Files\\cmder\\vendor\\conemu-maximus5\\ConEmu\\nvml.dll C:\\WINDOWS\\System32\\AMD\\nvml.dll C:\\Python311\\Scripts\\nvml.dll C:\\Python311\\nvml.dll C:\\WINDOWS\\system32\\nvml.dll C:\\WINDOWS\\nvml.dll C:\\WINDOWS\\System32\\Wbem\\nvml.dll C:\\WINDOWS\\System32\\WindowsPowerShell\\v1.0\\nvml.dll C:\\WINDOWS\\System32\\OpenSSH\\nvml.dll C:\\Program Files\\dotnet\\nvml.dll C:\\ProgramData\\chocolatey\\bin\\nvml.dll C:\\Users\\lukas\\AppData\\Roaming\\nvm\\nvml.dll C:\\Program Files\\nodejs\\nvml.dll C:\\Users\\lukas\\AppData\\Local\\Microsoft\\WindowsApps\\nvml.dll C:\\Users\\lukas\\AppData\\Roaming\\npm\\nvml.dll C:\\Program Files\\cmder\\nvml.dll C:\\Users\\lukas\\AppData\\Roaming\\nvm\\nvml.dll C:\\Program Files\\nodejs\\nvml.dll C:\\Users\\lukas\\AppData\\Local\\Programs\\Microsoft VS Code\\bin\\nvml.dll c:\\Users\\lukas\\AppData\\Local\\Programs\\cursor\\resources\\app\\bin\\nvml.dll c:\\Users\\lukas\\AppData\\Local\\Programs\\cursor\\resources\\app\\bin\\nvml.dll C:\\Program Files\\Docker\\Docker\\resources\\bin\\nvml.dll C:\\Program Files\\Git\\cmd\\nvml.dll C:\\Program Files\\PuTTY\\nvml.dll C:\\Program Files\\PowerShell\\7\\nvml.dll C:\\Users\\lukas\\AppData\\Local\\Microsoft\\WindowsApps\\nvml.dll C:\\Users\\lukas\\AppData\\Roaming\\npm\\nvml.dll C:\\Program Files\\cmder\\nvml.dll C:\\Users\\lukas\\AppData\\Roaming\\nvm\\nvml.dll C:\\Program Files\\nodejs\\nvml.dll C:\\Users\\lukas\\AppData\\Local\\Programs\\Microsoft VS Code\\bin\\nvml.dll C:\\Users\\lukas\\AppData\\Local\\Programs\\cursor\\resources\\app\\bin\\nvml.dll C:\\Users\\lukas\\AppData\\Local\\Programs\\Ollama\\nvml.dll C:\\Users\\lukas\\.lmstudio\\bin\\nvml.dll C:\\Users\\lukas\\.dotnet\\tools\\nvml.dll C:\\Users\\lukas\\AppData\\Local\\Microsoft\\WinGet\\Packages\\SST.opencode_Microsoft.Winget.Source_8wekyb3d8bbwe\\nvml.dll C:\\Users\\lukas\\AppData\\Local\\Microsoft\\WinGet\\Packages\\mostlygeek.llama-swap_Microsoft.Winget.Source_8wekyb3d8bbwe\\nvml.dll C:\\Users\\lukas\\AppData\\Local\\Microsoft\\WinGet\\Packages\\ggml.llamacpp_Microsoft.Winget.Source_8wekyb3d8bbwe\\nvml.dll C:\\Program Files\\Git\\mingw64\\bin\\nvml.dll C:\\Program Files\\Git\\usr\\bin\\nvml.dll C:\\Program Files\\cmder\\vendor\\bin\\nvml.dll c:\\Windows\\System32\\nvml.dll]" time=2025-10-26T01:39:15.718+02:00 level=DEBUG source=gpu.go:577 msg="discovered GPU libraries" paths=[] time=2025-10-26T01:39:15.718+02:00 level=DEBUG source=gpu.go:520 msg="Searching for GPU library" name=nvcuda.dll time=2025-10-26T01:39:15.718+02:00 level=DEBUG source=gpu.go:544 msg="gpu library search" globs="[C:\\Users\\lukas\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\nvcuda.dll C:\\Program Files\\cmder\\vendor\\conemu-maximus5\\ConEmu\\Scripts\\nvcuda.dll C:\\Program Files\\cmder\\vendor\\conemu-maximus5\\nvcuda.dll C:\\Program Files\\cmder\\vendor\\conemu-maximus5\\ConEmu\\nvcuda.dll C:\\WINDOWS\\System32\\AMD\\nvcuda.dll C:\\Python311\\Scripts\\nvcuda.dll C:\\Python311\\nvcuda.dll C:\\WINDOWS\\system32\\nvcuda.dll C:\\WINDOWS\\nvcuda.dll C:\\WINDOWS\\System32\\Wbem\\nvcuda.dll C:\\WINDOWS\\System32\\WindowsPowerShell\\v1.0\\nvcuda.dll C:\\WINDOWS\\System32\\OpenSSH\\nvcuda.dll C:\\Program Files\\dotnet\\nvcuda.dll C:\\ProgramData\\chocolatey\\bin\\nvcuda.dll C:\\Users\\lukas\\AppData\\Roaming\\nvm\\nvcuda.dll C:\\Program Files\\nodejs\\nvcuda.dll C:\\Users\\lukas\\AppData\\Local\\Microsoft\\WindowsApps\\nvcuda.dll C:\\Users\\lukas\\AppData\\Roaming\\npm\\nvcuda.dll C:\\Program Files\\cmder\\nvcuda.dll C:\\Users\\lukas\\AppData\\Roaming\\nvm\\nvcuda.dll C:\\Program Files\\nodejs\\nvcuda.dll C:\\Users\\lukas\\AppData\\Local\\Programs\\Microsoft VS Code\\bin\\nvcuda.dll c:\\Users\\lukas\\AppData\\Local\\Programs\\cursor\\resources\\app\\bin\\nvcuda.dll c:\\Users\\lukas\\AppData\\Local\\Programs\\cursor\\resources\\app\\bin\\nvcuda.dll C:\\Program Files\\Docker\\Docker\\resources\\bin\\nvcuda.dll C:\\Program Files\\Git\\cmd\\nvcuda.dll C:\\Program Files\\PuTTY\\nvcuda.dll C:\\Program Files\\PowerShell\\7\\nvcuda.dll C:\\Users\\lukas\\AppData\\Local\\Microsoft\\WindowsApps\\nvcuda.dll C:\\Users\\lukas\\AppData\\Roaming\\npm\\nvcuda.dll C:\\Program Files\\cmder\\nvcuda.dll C:\\Users\\lukas\\AppData\\Roaming\\nvm\\nvcuda.dll C:\\Program Files\\nodejs\\nvcuda.dll C:\\Users\\lukas\\AppData\\Local\\Programs\\Microsoft VS Code\\bin\\nvcuda.dll C:\\Users\\lukas\\AppData\\Local\\Programs\\cursor\\resources\\app\\bin\\nvcuda.dll C:\\Users\\lukas\\AppData\\Local\\Programs\\Ollama\\nvcuda.dll C:\\Users\\lukas\\.lmstudio\\bin\\nvcuda.dll C:\\Users\\lukas\\.dotnet\\tools\\nvcuda.dll C:\\Users\\lukas\\AppData\\Local\\Microsoft\\WinGet\\Packages\\SST.opencode_Microsoft.Winget.Source_8wekyb3d8bbwe\\nvcuda.dll C:\\Users\\lukas\\AppData\\Local\\Microsoft\\WinGet\\Packages\\mostlygeek.llama-swap_Microsoft.Winget.Source_8wekyb3d8bbwe\\nvcuda.dll C:\\Users\\lukas\\AppData\\Local\\Microsoft\\WinGet\\Packages\\ggml.llamacpp_Microsoft.Winget.Source_8wekyb3d8bbwe\\nvcuda.dll C:\\Program Files\\Git\\mingw64\\bin\\nvcuda.dll C:\\Program Files\\Git\\usr\\bin\\nvcuda.dll C:\\Program Files\\cmder\\vendor\\bin\\nvcuda.dll c:\\windows\\system*\\nvcuda.dll]" time=2025-10-26T01:39:15.721+02:00 level=DEBUG source=gpu.go:577 msg="discovered GPU libraries" paths=[] time=2025-10-26T01:39:15.721+02:00 level=DEBUG source=gpu.go:520 msg="Searching for GPU library" name=cudart64_*.dll time=2025-10-26T01:39:15.721+02:00 level=DEBUG source=gpu.go:544 msg="gpu library search" globs="[C:\\Users\\lukas\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\cudart64_*.dll C:\\Program Files\\cmder\\vendor\\conemu-maximus5\\ConEmu\\Scripts\\cudart64_*.dll C:\\Program Files\\cmder\\vendor\\conemu-maximus5\\cudart64_*.dll C:\\Program Files\\cmder\\vendor\\conemu-maximus5\\ConEmu\\cudart64_*.dll C:\\WINDOWS\\System32\\AMD\\cudart64_*.dll C:\\Python311\\Scripts\\cudart64_*.dll C:\\Python311\\cudart64_*.dll C:\\WINDOWS\\system32\\cudart64_*.dll C:\\WINDOWS\\cudart64_*.dll C:\\WINDOWS\\System32\\Wbem\\cudart64_*.dll C:\\WINDOWS\\System32\\WindowsPowerShell\\v1.0\\cudart64_*.dll C:\\WINDOWS\\System32\\OpenSSH\\cudart64_*.dll C:\\Program Files\\dotnet\\cudart64_*.dll C:\\ProgramData\\chocolatey\\bin\\cudart64_*.dll C:\\Users\\lukas\\AppData\\Roaming\\nvm\\cudart64_*.dll C:\\Program Files\\nodejs\\cudart64_*.dll C:\\Users\\lukas\\AppData\\Local\\Microsoft\\WindowsApps\\cudart64_*.dll C:\\Users\\lukas\\AppData\\Roaming\\npm\\cudart64_*.dll C:\\Program Files\\cmder\\cudart64_*.dll C:\\Users\\lukas\\AppData\\Roaming\\nvm\\cudart64_*.dll C:\\Program Files\\nodejs\\cudart64_*.dll C:\\Users\\lukas\\AppData\\Local\\Programs\\Microsoft VS Code\\bin\\cudart64_*.dll c:\\Users\\lukas\\AppData\\Local\\Programs\\cursor\\resources\\app\\bin\\cudart64_*.dll c:\\Users\\lukas\\AppData\\Local\\Programs\\cursor\\resources\\app\\bin\\cudart64_*.dll C:\\Program Files\\Docker\\Docker\\resources\\bin\\cudart64_*.dll C:\\Program Files\\Git\\cmd\\cudart64_*.dll C:\\Program Files\\PuTTY\\cudart64_*.dll C:\\Program Files\\PowerShell\\7\\cudart64_*.dll C:\\Users\\lukas\\AppData\\Local\\Microsoft\\WindowsApps\\cudart64_*.dll C:\\Users\\lukas\\AppData\\Roaming\\npm\\cudart64_*.dll C:\\Program Files\\cmder\\cudart64_*.dll C:\\Users\\lukas\\AppData\\Roaming\\nvm\\cudart64_*.dll C:\\Program Files\\nodejs\\cudart64_*.dll C:\\Users\\lukas\\AppData\\Local\\Programs\\Microsoft VS Code\\bin\\cudart64_*.dll C:\\Users\\lukas\\AppData\\Local\\Programs\\cursor\\resources\\app\\bin\\cudart64_*.dll C:\\Users\\lukas\\AppData\\Local\\Programs\\Ollama\\cudart64_*.dll C:\\Users\\lukas\\.lmstudio\\bin\\cudart64_*.dll C:\\Users\\lukas\\.dotnet\\tools\\cudart64_*.dll C:\\Users\\lukas\\AppData\\Local\\Microsoft\\WinGet\\Packages\\SST.opencode_Microsoft.Winget.Source_8wekyb3d8bbwe\\cudart64_*.dll C:\\Users\\lukas\\AppData\\Local\\Microsoft\\WinGet\\Packages\\mostlygeek.llama-swap_Microsoft.Winget.Source_8wekyb3d8bbwe\\cudart64_*.dll C:\\Users\\lukas\\AppData\\Local\\Microsoft\\WinGet\\Packages\\ggml.llamacpp_Microsoft.Winget.Source_8wekyb3d8bbwe\\cudart64_*.dll C:\\Program Files\\Git\\mingw64\\bin\\cudart64_*.dll C:\\Program Files\\Git\\usr\\bin\\cudart64_*.dll C:\\Program Files\\cmder\\vendor\\bin\\cudart64_*.dll C:\\Users\\lukas\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\cuda_v*\\cudart64_*.dll c:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v*\\bin\\cudart64_*.dll]" time=2025-10-26T01:39:15.730+02:00 level=DEBUG source=gpu.go:577 msg="discovered GPU libraries" paths=[C:\Users\lukas\AppData\Local\Programs\Ollama\lib\ollama\cuda_v12\cudart64_12.dll] cudaSetDevice err: 35 time=2025-10-26T01:39:15.732+02:00 level=DEBUG source=gpu.go:593 msg="Unable to load cudart library C:\\Users\\lukas\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\cuda_v12\\cudart64_12.dll: your nvidia driver is too old or missing. If you have a CUDA GPU please upgrade to run ollama" time=2025-10-26T01:39:15.752+02:00 level=DEBUG source=amd_hip_windows.go:88 msg=hipDriverGetVersion version=60450101 time=2025-10-26T01:39:15.752+02:00 level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir C:\\Users\\lukas\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\rocm" time=2025-10-26T01:39:15.752+02:00 level=DEBUG source=amd_common.go:44 msg="detected ROCM next to ollama executable C:\\Users\\lukas\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\rocm" time=2025-10-26T01:39:15.754+02:00 level=DEBUG source=amd_windows.go:73 msg="detected hip devices" count=2 time=2025-10-26T01:39:15.754+02:00 level=DEBUG source=amd_windows.go:93 msg="hip device" id=0 name="AMD Radeon(TM) Graphics" gfx=gfx1036 time=2025-10-26T01:39:16.070+02:00 level=INFO source=amd_windows.go:128 msg="unsupported Radeon iGPU detected skipping" id=0 total="24.2 GiB" time=2025-10-26T01:39:16.070+02:00 level=DEBUG source=amd_windows.go:93 msg="hip device" id=1 name="AMD Radeon RX 7900 XTX" gfx=gfx1100 time=2025-10-26T01:39:16.361+02:00 level=DEBUG source=amd_windows.go:147 msg="amdgpu is supported" gpu=1 gpu_type=gfx1100 time=2025-10-26T01:39:16.361+02:00 level=DEBUG source=amd_windows.go:150 msg="amdgpu memory" gpu=1 total="24.0 GiB" time=2025-10-26T01:39:16.361+02:00 level=DEBUG source=amd_windows.go:151 msg="amdgpu memory" gpu=1 available="23.8 GiB" time=2025-10-26T01:39:16.362+02:00 level=INFO source=types.go:131 msg="inference compute" id=0 library=rocm variant="" compute=gfx1100 driver=6.4 name="AMD Radeon RX 7900 XTX" total="24.0 GiB" available="23.8 GiB" time=2025-10-26T01:39:32.045+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32 [GIN] 2025/10/26 - 01:39:32 | 200 | 40.6336ms | 127.0.0.1 | POST "/api/show" time=2025-10-26T01:39:32.454+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32 [GIN] 2025/10/26 - 01:39:32 | 200 | 37.5929ms | 127.0.0.1 | POST "/api/show" time=2025-10-26T01:39:49.787+02:00 level=DEBUG source=gpu.go:410 msg="updating system memory data" before.total="63.1 GiB" before.free="48.7 GiB" before.free_swap="49.0 GiB" now.total="63.1 GiB" now.free="46.7 GiB" now.free_swap="46.4 GiB" time=2025-10-26T01:39:50.126+02:00 level=DEBUG source=amd_windows.go:198 msg="updating rocm free memory" gpu=0 name="AMD Radeon RX 7900 XTX" before="23.8 GiB" now="23.7 GiB" time=2025-10-26T01:39:50.126+02:00 level=INFO source=sched.go:192 msg="one or more GPUs detected that are unable to accurately report free memory - disabling default concurrency" time=2025-10-26T01:39:50.135+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32 time=2025-10-26T01:39:50.136+02:00 level=DEBUG source=sched.go:208 msg="loading first model" model=C:\Users\lukas\.ollama\models\blobs\sha256-69cd7578d77dffc0b17e34bad9ef998d08ae0e20ccceef21bad4e7eb3d8c553b time=2025-10-26T01:39:50.170+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32 time=2025-10-26T01:39:50.171+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=qwen3moe.pooling_type default=0 time=2025-10-26T01:39:50.171+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.bos_token_id default=0 time=2025-10-26T01:39:50.171+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.add_eos_token default=false time=2025-10-26T01:39:50.171+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}" time=2025-10-26T01:39:50.173+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=qwen3moe.rope.scaling.factor default=1 time=2025-10-26T01:39:50.173+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=qwen3moe.norm_top_k_prob default=true time=2025-10-26T01:39:50.173+02:00 level=DEBUG source=gpu.go:410 msg="updating system memory data" before.total="63.1 GiB" before.free="46.7 GiB" before.free_swap="46.4 GiB" now.total="63.1 GiB" now.free="46.7 GiB" now.free_swap="46.2 GiB" time=2025-10-26T01:39:50.501+02:00 level=DEBUG source=amd_windows.go:198 msg="updating rocm free memory" gpu=0 name="AMD Radeon RX 7900 XTX" before="23.7 GiB" now="23.5 GiB" time=2025-10-26T01:39:50.508+02:00 level=DEBUG source=server.go:324 msg="adding gpu library" path=C:\Users\lukas\AppData\Local\Programs\Ollama\lib\ollama\rocm time=2025-10-26T01:39:50.508+02:00 level=DEBUG source=server.go:332 msg="adding gpu dependency paths" paths=[C:\Users\lukas\AppData\Local\Programs\Ollama\lib\ollama\rocm] time=2025-10-26T01:39:50.508+02:00 level=INFO source=server.go:399 msg="starting runner" cmd="C:\\Users\\lukas\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --model C:\\Users\\lukas\\.ollama\\models\\blobs\\sha256-69cd7578d77dffc0b17e34bad9ef998d08ae0e20ccceef21bad4e7eb3d8c553b --port 50373" time=2025-10-26T01:39:50.509+02:00 level=DEBUG source=server.go:400 msg=subprocess HIP_PATH="C:\\Program Files\\AMD\\ROCm\\6.4\\" HIP_PATH_64="C:\\Program Files\\AMD\\ROCm\\6.4\\" OLLAMA_CONTEXT_LENGTH=16000 OLLAMA_DEBUG=1 OLLAMA_GPU_OVERHEAD=0 OLLAMA_HOST=127.0.0.1:11223 OLLAMA_LOAD_TIMEOUT=5m0s OLLAMA_MAX_LOADED_MODELS=1 OLLAMA_NUM_GPU=999 OLLAMA_NUM_PARALLEL=1 PATH="C:\\Users\\lukas\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\rocm;C:\\Users\\lukas\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\rocm;C:\\Users\\lukas\\AppData\\Local\\Programs\\Ollama\\lib\\ollama;C:\\Program Files\\cmder\\vendor\\conemu-maximus5\\ConEmu\\Scripts;C:\\Program Files\\cmder\\vendor\\conemu-maximus5;C:\\Program Files\\cmder\\vendor\\conemu-maximus5\\ConEmu;C:\\WINDOWS\\System32\\AMD;C:\\Python311\\Scripts\\;C:\\Python311\\;C:\\WINDOWS\\system32;C:\\WINDOWS;C:\\WINDOWS\\System32\\Wbem;C:\\WINDOWS\\System32\\WindowsPowerShell\\v1.0\\;C:\\WINDOWS\\System32\\OpenSSH\\;C:\\Program Files\\dotnet\\;C:\\ProgramData\\chocolatey\\bin;C:\\Users\\lukas\\AppData\\Roaming\\nvm;C:\\Program Files\\nodejs;C:\\Users\\lukas\\AppData\\Local\\Microsoft\\WindowsApps;C:\\Users\\lukas\\AppData\\Roaming\\npm;C:\\Program Files\\cmder;C:\\Users\\lukas\\AppData\\Roaming\\nvm;C:\\Program Files\\nodejs;C:\\Users\\lukas\\AppData\\Local\\Programs\\Microsoft VS Code\\bin;c:\\Users\\lukas\\AppData\\Local\\Programs\\cursor\\resources\\app\\bin;c:\\Users\\lukas\\AppData\\Local\\Programs\\cursor\\resources\\app\\bin;C:\\Program Files\\Docker\\Docker\\resources\\bin;C:\\Program Files\\Git\\cmd;C:\\Program Files\\PuTTY\\;C:\\Program Files\\PowerShell\\7\\;C:\\Users\\lukas\\AppData\\Local\\Microsoft\\WindowsApps;C:\\Users\\lukas\\AppData\\Roaming\\npm;C:\\Program Files\\cmder;C:\\Users\\lukas\\AppData\\Roaming\\nvm;C:\\Program Files\\nodejs;C:\\Users\\lukas\\AppData\\Local\\Programs\\Microsoft VS Code\\bin;C:\\Users\\lukas\\AppData\\Local\\Programs\\cursor\\resources\\app\\bin;C:\\Users\\lukas\\AppData\\Local\\Programs\\Ollama;C:\\Users\\lukas\\.lmstudio\\bin;C:\\Users\\lukas\\.dotnet\\tools;C:\\Users\\lukas\\AppData\\Local\\Microsoft\\WinGet\\Packages\\SST.opencode_Microsoft.Winget.Source_8wekyb3d8bbwe;C:\\Users\\lukas\\AppData\\Local\\Microsoft\\WinGet\\Packages\\mostlygeek.llama-swap_Microsoft.Winget.Source_8wekyb3d8bbwe;C:\\Users\\lukas\\AppData\\Local\\Microsoft\\WinGet\\Packages\\ggml.llamacpp_Microsoft.Winget.Source_8wekyb3d8bbwe;C:\\Program Files\\Git\\mingw64\\bin;C:\\Program Files\\Git\\usr\\bin;C:\\Program Files\\cmder\\vendor\\bin;C:\\Users\\lukas\\AppData\\Local\\Programs\\Ollama\\lib\\ollama" OLLAMA_LIBRARY_PATH=C:\Users\lukas\AppData\Local\Programs\Ollama\lib\ollama;C:\Users\lukas\AppData\Local\Programs\Ollama\lib\ollama\rocm HIP_VISIBLE_DEVICES=1 time=2025-10-26T01:39:50.512+02:00 level=INFO source=server.go:672 msg="loading model" "model layers"=49 requested=-1 time=2025-10-26T01:39:50.513+02:00 level=DEBUG source=gpu.go:410 msg="updating system memory data" before.total="63.1 GiB" before.free="46.7 GiB" before.free_swap="46.2 GiB" now.total="63.1 GiB" now.free="46.7 GiB" now.free_swap="46.0 GiB" time=2025-10-26T01:39:50.546+02:00 level=INFO source=runner.go:1252 msg="starting ollama engine" time=2025-10-26T01:39:50.554+02:00 level=INFO source=runner.go:1287 msg="Server listening on 127.0.0.1:50373" time=2025-10-26T01:39:50.828+02:00 level=DEBUG source=amd_windows.go:198 msg="updating rocm free memory" gpu=0 name="AMD Radeon RX 7900 XTX" before="23.5 GiB" now="23.4 GiB" time=2025-10-26T01:39:50.829+02:00 level=INFO source=server.go:678 msg="system memory" total="63.1 GiB" free="46.7 GiB" free_swap="46.0 GiB" time=2025-10-26T01:39:50.829+02:00 level=INFO source=server.go:686 msg="gpu memory" id=0 available="23.2 GiB" free="23.7 GiB" minimum="457.0 MiB" overhead="0 B" time=2025-10-26T01:39:50.832+02:00 level=INFO source=runner.go:1171 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:false KvSize:8192 KvCacheType: NumThreads:12 GPULayers:49[ID:0 Layers:49(0..48)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2025-10-26T01:39:50.847+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32 time=2025-10-26T01:39:50.848+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.description default="" time=2025-10-26T01:39:50.848+02:00 level=INFO source=ggml.go:131 msg="" architecture=qwen3moe file_type=Q3_K_M name=Qwen3-Coder-30B-A3B-Instruct description="" num_tensors=579 num_key_values=45 time=2025-10-26T01:39:50.848+02:00 level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=C:\Users\lukas\AppData\Local\Programs\Ollama\lib\ollama ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon RX 7900 XTX, gfx1100 (0x1100), VMM: no, Wave Size: 32, ID: 0 load_backend: loaded ROCm backend from C:\Users\lukas\AppData\Local\Programs\Ollama\lib\ollama\ggml-hip.dll load_backend: loaded CPU backend from C:\Users\lukas\AppData\Local\Programs\Ollama\lib\ollama\ggml-cpu-icelake.dll time=2025-10-26T01:39:50.904+02:00 level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=C:\Users\lukas\AppData\Local\Programs\Ollama\lib\ollama\rocm time=2025-10-26T01:39:50.905+02:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.AVX512=1 CPU.0.AVX512_VBMI=1 CPU.0.AVX512_VNNI=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 ROCm.0.NO_VMM=1 ROCm.0.NO_PEER_COPY=1 ROCm.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(clang) time=2025-10-26T01:39:50.907+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=qwen3moe.pooling_type default=0 time=2025-10-26T01:39:50.907+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.bos_token_id default=0 time=2025-10-26T01:39:50.907+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.add_eos_token default=false time=2025-10-26T01:39:50.907+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}" time=2025-10-26T01:39:50.907+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=qwen3moe.rope.scaling.factor default=1 time=2025-10-26T01:39:50.907+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=qwen3moe.norm_top_k_prob default=true time=2025-10-26T01:39:51.243+02:00 level=DEBUG source=ggml.go:794 msg="compute graph" nodes=3174 splits=2 time=2025-10-26T01:39:51.243+02:00 level=DEBUG source=backend.go:310 msg="model weights" device=ROCm0 size="12.7 GiB" time=2025-10-26T01:39:51.243+02:00 level=DEBUG source=backend.go:315 msg="model weights" device=CPU size="166.9 MiB" time=2025-10-26T01:39:51.243+02:00 level=DEBUG source=backend.go:321 msg="kv cache" device=ROCm0 size="768.0 MiB" time=2025-10-26T01:39:51.246+02:00 level=DEBUG source=backend.go:332 msg="compute graph" device=ROCm0 size="552.0 MiB" time=2025-10-26T01:39:51.246+02:00 level=DEBUG source=backend.go:337 msg="compute graph" device=CPU size="4.0 MiB" time=2025-10-26T01:39:51.246+02:00 level=DEBUG source=backend.go:342 msg="total memory" size="14.1 GiB" time=2025-10-26T01:39:51.246+02:00 level=DEBUG source=server.go:717 msg=memory success=true required.InputWeights=175030272U required.CPU.Graph=4194304U required.ROCm0.ID=0 required.ROCm0.Weights="[273368192U 300303616U 300303616U 297944320U 271205504U 273368192U 271205504U 273368192U 271205504U 271205504U 271205504U 271205504U 271205504U 271205504U 271205504U 271205504U 271205504U 271205504U 271205504U 271205504U 271205504U 297944320U 271205504U 271205504U 271205504U 271205504U 271205504U 271205504U 297944320U 271205504U 271205504U 271205504U 271205504U 273368192U 271205504U 271205504U 271205504U 273368192U 273368192U 271205504U 273368192U 297944320U 297944320U 297944320U 300107008U 300303616U 301417728U 301417728U 255260672U]" required.ROCm0.Cache="[16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 16777216U 0U]" required.ROCm0.Graph=578816128U time=2025-10-26T01:39:51.247+02:00 level=DEBUG source=server.go:894 msg="available gpu" id=0 "available layer vram"="22.7 GiB" backoff=0.00 minimum="457.0 MiB" overhead="0 B" graph="552.0 MiB" time=2025-10-26T01:39:51.247+02:00 level=DEBUG source=server.go:728 msg="new layout created" layers="49[ID:0 Layers:49(0..48)]" time=2025-10-26T01:39:51.248+02:00 level=INFO source=runner.go:1171 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:false KvSize:8192 KvCacheType: NumThreads:12 GPULayers:49[ID:0 Layers:49(0..48)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2025-10-26T01:39:51.262+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32 time=2025-10-26T01:39:51.524+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=qwen3moe.pooling_type default=0 time=2025-10-26T01:39:51.524+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.bos_token_id default=0 time=2025-10-26T01:39:51.524+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.add_eos_token default=false time=2025-10-26T01:39:51.524+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}" time=2025-10-26T01:39:51.524+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=qwen3moe.rope.scaling.factor default=1 time=2025-10-26T01:39:51.524+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=qwen3moe.norm_top_k_prob default=true time=2025-10-26T01:39:51.608+02:00 level=DEBUG source=ggml.go:794 msg="compute graph" nodes=3174 splits=2 time=2025-10-26T01:39:51.610+02:00 level=DEBUG source=backend.go:310 msg="model weights" device=ROCm0 size="12.7 GiB" time=2025-10-26T01:39:51.610+02:00 level=DEBUG source=backend.go:315 msg="model weights" device=CPU size="166.9 MiB" time=2025-10-26T01:39:51.610+02:00 level=DEBUG source=backend.go:321 msg="kv cache" device=ROCm0 size="768.0 MiB" time=2025-10-26T01:39:51.610+02:00 level=DEBUG source=backend.go:332 msg="compute graph" device=ROCm0 size="552.0 MiB" time=2025-10-26T01:39:51.610+02:00 level=DEBUG source=backend.go:337 msg="compute graph" device=CPU size="4.0 MiB" time=2025-10-26T01:39:51.611+02:00 level=DEBUG source=backend.go:342 msg="total memory" size="14.1 GiB" time=2025-10-26T01:39:51.611+02:00 level=DEBUG source=server.go:717 msg=memory success=true required.InputWeights=175030272A required.CPU.Graph=4194304A required.ROCm0.ID=0 required.ROCm0.Weights="[273368192A 300303616A 300303616A 297944320A 271205504A 273368192A 271205504A 273368192A 271205504A 271205504A 271205504A 271205504A 271205504A 271205504A 271205504A 271205504A 271205504A 271205504A 271205504A 271205504A 271205504A 297944320A 271205504A 271205504A 271205504A 271205504A 271205504A 271205504A 297944320A 271205504A 271205504A 271205504A 271205504A 273368192A 271205504A 271205504A 271205504A 273368192A 273368192A 271205504A 273368192A 297944320A 297944320A 297944320A 300107008A 300303616A 301417728A 301417728A 255260672A]" required.ROCm0.Cache="[16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 16777216A 0U]" required.ROCm0.Graph=578816128A time=2025-10-26T01:39:51.611+02:00 level=DEBUG source=server.go:894 msg="available gpu" id=0 "available layer vram"="22.7 GiB" backoff=0.00 minimum="457.0 MiB" overhead="0 B" graph="552.0 MiB" time=2025-10-26T01:39:51.611+02:00 level=DEBUG source=server.go:728 msg="new layout created" layers="49[ID:0 Layers:49(0..48)]" time=2025-10-26T01:39:51.612+02:00 level=INFO source=runner.go:1171 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:false KvSize:8192 KvCacheType: NumThreads:12 GPULayers:49[ID:0 Layers:49(0..48)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2025-10-26T01:39:51.612+02:00 level=INFO source=ggml.go:487 msg="offloading 48 repeating layers to GPU" time=2025-10-26T01:39:51.612+02:00 level=INFO source=ggml.go:493 msg="offloading output layer to GPU" time=2025-10-26T01:39:51.612+02:00 level=INFO source=ggml.go:498 msg="offloaded 49/49 layers to GPU" time=2025-10-26T01:39:51.613+02:00 level=INFO source=backend.go:310 msg="model weights" device=ROCm0 size="12.7 GiB" time=2025-10-26T01:39:51.614+02:00 level=INFO source=backend.go:315 msg="model weights" device=CPU size="166.9 MiB" time=2025-10-26T01:39:51.614+02:00 level=INFO source=backend.go:321 msg="kv cache" device=ROCm0 size="768.0 MiB" time=2025-10-26T01:39:51.614+02:00 level=INFO source=backend.go:332 msg="compute graph" device=ROCm0 size="552.0 MiB" time=2025-10-26T01:39:51.615+02:00 level=INFO source=backend.go:337 msg="compute graph" device=CPU size="4.0 MiB" time=2025-10-26T01:39:51.615+02:00 level=INFO source=backend.go:342 msg="total memory" size="14.1 GiB" time=2025-10-26T01:39:51.616+02:00 level=INFO source=sched.go:470 msg="loaded runners" count=1 time=2025-10-26T01:39:51.616+02:00 level=INFO source=server.go:1251 msg="waiting for llama runner to start responding" time=2025-10-26T01:39:51.617+02:00 level=INFO source=server.go:1285 msg="waiting for server to become available" status="llm server loading model" time=2025-10-26T01:39:51.617+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.00" time=2025-10-26T01:39:51.871+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.05" time=2025-10-26T01:39:52.126+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.09" time=2025-10-26T01:39:52.378+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.14" time=2025-10-26T01:39:52.631+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.18" time=2025-10-26T01:39:52.885+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.22" time=2025-10-26T01:39:53.138+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.27" time=2025-10-26T01:39:53.393+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.31" time=2025-10-26T01:39:53.655+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.36" time=2025-10-26T01:39:53.912+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.41" time=2025-10-26T01:39:54.165+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.45" time=2025-10-26T01:39:54.420+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.49" time=2025-10-26T01:39:54.674+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.54" time=2025-10-26T01:39:54.927+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.59" time=2025-10-26T01:39:55.183+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.63" time=2025-10-26T01:39:55.440+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.67" time=2025-10-26T01:39:55.694+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.72" time=2025-10-26T01:39:55.951+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.77" time=2025-10-26T01:39:56.208+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.81" time=2025-10-26T01:39:56.465+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.85" time=2025-10-26T01:39:56.731+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.90" time=2025-10-26T01:39:56.995+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.94" time=2025-10-26T01:39:57.247+02:00 level=DEBUG source=server.go:1295 msg="model load progress 0.98" time=2025-10-26T01:39:57.346+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=qwen3moe.pooling_type default=0 time=2025-10-26T01:39:57.499+02:00 level=INFO source=server.go:1289 msg="llama runner started in 6.99 seconds" time=2025-10-26T01:39:57.499+02:00 level=DEBUG source=sched.go:482 msg="finished setting up" runner.name=hf.co/unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF:UD-Q3_K_XL runner.inference=rocm runner.devices=1 runner.size="14.1 GiB" runner.vram="14.1 GiB" runner.parallel=1 runner.pid=6692 runner.model=C:\Users\lukas\.ollama\models\blobs\sha256-69cd7578d77dffc0b17e34bad9ef998d08ae0e20ccceef21bad4e7eb3d8c553b runner.num_ctx=8192 time=2025-10-26T01:39:57.541+02:00 level=DEBUG source=server.go:1388 msg="completion request" images=0 prompt=7795 format="" time=2025-10-26T01:39:57.556+02:00 level=DEBUG source=cache.go:142 msg="loading cache slot" id=0 cache=0 prompt=2024 used=0 remaining=2024 [GIN] 2025/10/26 - 01:40:01 | 200 | 11.6872713s | 127.0.0.1 | POST "/api/chat" time=2025-10-26T01:40:01.415+02:00 level=DEBUG source=sched.go:490 msg="context for request finished" time=2025-10-26T01:40:01.416+02:00 level=DEBUG source=sched.go:286 msg="runner with non-zero duration has gone idle, adding timer" runner.name=hf.co/unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF:UD-Q3_K_XL runner.inference=rocm runner.devices=1 runner.size="14.1 GiB" runner.vram="14.1 GiB" runner.parallel=1 runner.pid=6692 runner.model=C:\Users\lukas\.ollama\models\blobs\sha256-69cd7578d77dffc0b17e34bad9ef998d08ae0e20ccceef21bad4e7eb3d8c553b runner.num_ctx=8192 duration=30m0s time=2025-10-26T01:40:01.416+02:00 level=DEBUG source=sched.go:304 msg="after processing request finished event" runner.name=hf.co/unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF:UD-Q3_K_XL runner.inference=rocm runner.devices=1 runner.size="14.1 GiB" runner.vram="14.1 GiB" runner.parallel=1 runner.pid=6692 runner.model=C:\Users\lukas\.ollama\models\blobs\sha256-69cd7578d77dffc0b17e34bad9ef998d08ae0e20ccceef21bad4e7eb3d8c553b runner.num_ctx=8192 refCount=0```
Author
Owner

@ganakee commented on GitHub (Oct 29, 2025):

0.12.7-rc0 still broken.

The new 0.12.7-rc0 retains this issue for me on Linux and AMD 6650. The GPU is ignored in 0.12.7-rc0.

Downgraded to 0.12.2 to fix for me for now.

<!-- gh-comment-id:3462207926 --> @ganakee commented on GitHub (Oct 29, 2025): 0.12.7-rc0 still broken. The new 0.12.7-rc0 retains this issue for me on Linux and AMD 6650. The GPU is ignored in 0.12.7-rc0. Downgraded to 0.12.2 to fix for me for now.
Author
Owner

@dhiltgen commented on GitHub (Oct 29, 2025):

@ganakee sorry to hear that. Please share an updated log with OLLAMA_DEBUG=2 set so we can take a look.

<!-- gh-comment-id:3464645602 --> @dhiltgen commented on GitHub (Oct 29, 2025): @ganakee sorry to hear that. Please share an updated log with OLLAMA_DEBUG=2 set so we can take a look.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#54971