[GH-ISSUE #12564] Doens't offload any layer into GPU RAM since 0.12.4 (AMD RX 7900 XTX on Windows) #54848

New Issue

GiteaMirror · 2026-04-29T07:36:45-05:00

GiteaMirror commented

2026-04-29 07:36:45 -05:00

Originally created by @jack-running on GitHub (Oct 10, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/12564

Originally assigned to: @dhiltgen on GitHub.

What is the issue?

On Ollama 0.12.3 it works fine, in 0.12.5 and 0.12.4 the AMD GPUs are recognized, but are not used at all and inference runs only on CPU.

Relevant log output

0.12.5 or 0.12.4 log:
time=2025-10-11T00:03:32.772+02:00 level=INFO source=routes.go:1532 msg="Listening on [::]:11434 (version 0.12.4)"
time=2025-10-11T00:03:32.773+02:00 level=INFO source=runner.go:80 msg="discovering available GPUs..."
time=2025-10-11T00:03:35.623+02:00 level=INFO source=types.go:112 msg="inference compute" id=0 library=ROCm compute=gfx1100 name=ROCm1 description="AMD Radeon RX 7900 XTX" libdirs=ollama,rocm driver=60450.10 pci_id=03:00.0 type=discrete total="24.0 GiB" available="23.6 GiB"
[GIN] 2025/10/11 - 00:03:35 | 200 |            0s |       127.0.0.1 | HEAD     "/"
[GIN] 2025/10/11 - 00:03:35 | 200 |      38.627ms |       127.0.0.1 | POST     "/api/show"
time=2025-10-11T00:03:36.242+02:00 level=INFO source=cpu_windows.go:139 msg=packages count=1
time=2025-10-11T00:03:36.242+02:00 level=INFO source=cpu_windows.go:186 msg="" package=0 cores=8 efficiency=0 threads=16
time=2025-10-11T00:03:36.242+02:00 level=INFO source=server.go:216 msg="enabling flash attention"
time=2025-10-11T00:03:36.242+02:00 level=INFO source=server.go:400 msg="starting runner" cmd="C:\\develop\\ollama\\ollama.exe runner --ollama-engine --model C:\\Users\\jack\\.ollama\\models\\blobs\\sha256-1194192cf2a187eb02722edcc3f77b11d21f537048ce04b67ccf8ba78863006a --port 60650"
time=2025-10-11T00:03:36.242+02:00 level=INFO source=server.go:675 msg="loading model" "model layers"=49 requested=-1
time=2025-10-11T00:03:36.242+02:00 level=INFO source=cpu_windows.go:139 msg=packages count=1
time=2025-10-11T00:03:36.242+02:00 level=INFO source=cpu_windows.go:186 msg="" package=0 cores=8 efficiency=0 threads=16
time=2025-10-11T00:03:36.242+02:00 level=INFO source=server.go:681 msg="system memory" total="61.8 GiB" free="52.3 GiB" free_swap="55.3 GiB"
time=2025-10-11T00:03:36.242+02:00 level=INFO source=server.go:689 msg="gpu memory" id=0 library=ROCm available="370.0 MiB" free="827.0 MiB" minimum="457.0 MiB" overhead="0 B"
time=2025-10-11T00:03:36.302+02:00 level=INFO source=runner.go:1299 msg="starting ollama engine"
time=2025-10-11T00:03:36.304+02:00 level=INFO source=runner.go:1335 msg="Server listening on 127.0.0.1:60650"
time=2025-10-11T00:03:36.308+02:00 level=INFO source=runner.go:1172 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:16384 KvCacheType: NumThreads:8 GPULayers:49[ID:0 Layers:49(0..48)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2025-10-11T00:03:36.326+02:00 level=INFO source=ggml.go:133 msg="" architecture=qwen3moe file_type=Q4_K_M name="Qwen3 Coder 30B A3B Instruct" description="" num_tensors=579 num_key_values=35
load_backend: loaded CPU backend from C:\develop\ollama\lib\ollama\ggml-cpu-icelake.dll
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
  Device 0: AMD Radeon RX 7900 XTX, gfx1100 (0x1100), VMM: no, Wave Size: 32, ID: 0
load_backend: loaded ROCm backend from C:\develop\ollama\lib\ollama\rocm\ggml-hip.dll
time=2025-10-11T00:03:36.377+02:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.AVX512=1 CPU.0.AVX512_VBMI=1 CPU.0.AVX512_VNNI=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 ROCm.0.NO_VMM=1 ROCm.0.NO_PEER_COPY=1 ROCm.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(clang)
time=2025-10-11T00:03:37.020+02:00 level=INFO source=runner.go:1172 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:16384 KvCacheType: NumThreads:8 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2025-10-11T00:03:37.043+02:00 level=INFO source=runner.go:1172 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:16384 KvCacheType: NumThreads:8 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2025-10-11T00:03:37.228+02:00 level=INFO source=runner.go:1172 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:16384 KvCacheType: NumThreads:8 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2025-10-11T00:03:37.228+02:00 level=INFO source=ggml.go:477 msg="offloading 0 repeating layers to GPU"
time=2025-10-11T00:03:37.228+02:00 level=INFO source=ggml.go:481 msg="offloading output layer to CPU"
time=2025-10-11T00:03:37.228+02:00 level=INFO source=ggml.go:488 msg="offloaded 0/49 layers to GPU"
time=2025-10-11T00:03:37.228+02:00 level=INFO source=device.go:211 msg="model weights" device=CPU size="17.3 GiB"
time=2025-10-11T00:03:37.228+02:00 level=INFO source=device.go:222 msg="kv cache" device=CPU size="1.5 GiB"
time=2025-10-11T00:03:37.228+02:00 level=INFO source=device.go:233 msg="compute graph" device=CPU size="84.0 MiB"
time=2025-10-11T00:03:37.228+02:00 level=INFO source=device.go:238 msg="total memory" size="18.9 GiB"
time=2025-10-11T00:03:37.228+02:00 level=INFO source=sched.go:481 msg="loaded runners" count=1
time=2025-10-11T00:03:37.228+02:00 level=INFO source=server.go:1271 msg="waiting for llama runner to start responding"


0.12.3 log:
time=2025-10-11T00:05:19.966+02:00 level=INFO source=routes.go:1528 msg="Listening on [::]:11434 (version 0.12.3)"
time=2025-10-11T00:05:19.966+02:00 level=INFO source=gpu.go:217 msg="looking for compatible GPUs"
time=2025-10-11T00:05:19.966+02:00 level=INFO source=gpu_windows.go:167 msg=packages count=1
time=2025-10-11T00:05:19.966+02:00 level=INFO source=gpu_windows.go:214 msg="" package=0 cores=8 efficiency=0 threads=16
time=2025-10-11T00:05:20.384+02:00 level=WARN source=amd_windows.go:139 msg="amdgpu is not supported (supported types:[gfx1030 gfx1100 gfx1101 gfx1102 gfx1151 gfx906])" gpu_type=gfx1103 gpu=0 library=C:\develop\ollama\lib\ollama\rocm
time=2025-10-11T00:05:20.727+02:00 level=INFO source=types.go:131 msg="inference compute" id=0 library=rocm variant="" compute=gfx1100 driver=6.4 name="AMD Radeon RX 7900 XTX" total="24.0 GiB" available="23.8 GiB"
[GIN] 2025/10/11 - 00:05:20 | 200 |            0s |       127.0.0.1 | HEAD     "/"
[GIN] 2025/10/11 - 00:05:20 | 200 |     35.5206ms |       127.0.0.1 | POST     "/api/show"
time=2025-10-11T00:05:21.197+02:00 level=INFO source=sched.go:192 msg="one or more GPUs detected that are unable to accurately report free memory - disabling default concurrency"
time=2025-10-11T00:05:21.625+02:00 level=INFO source=server.go:399 msg="starting runner" cmd="C:\\develop\\ollama\\ollama.exe runner --ollama-engine --model C:\\Users\\jack\\.ollama\\models\\blobs\\sha256-1194192cf2a187eb02722edcc3f77b11d21f537048ce04b67ccf8ba78863006a --port 60670"
time=2025-10-11T00:05:21.628+02:00 level=INFO source=server.go:672 msg="loading model" "model layers"=49 requested=-1
time=2025-10-11T00:05:21.687+02:00 level=INFO source=runner.go:1252 msg="starting ollama engine"
time=2025-10-11T00:05:21.687+02:00 level=INFO source=runner.go:1287 msg="Server listening on 127.0.0.1:60670"
time=2025-10-11T00:05:22.007+02:00 level=INFO source=server.go:678 msg="system memory" total="61.8 GiB" free="51.4 GiB" free_swap="54.5 GiB"
time=2025-10-11T00:05:22.007+02:00 level=INFO source=server.go:686 msg="gpu memory" id=0 available="23.2 GiB" free="23.7 GiB" minimum="457.0 MiB" overhead="0 B"
time=2025-10-11T00:05:22.008+02:00 level=INFO source=runner.go:1171 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:false KvSize:16384 KvCacheType: NumThreads:8 GPULayers:49[ID:0 Layers:49(0..48)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2025-10-11T00:05:22.026+02:00 level=INFO source=ggml.go:131 msg="" architecture=qwen3moe file_type=Q4_K_M name="Qwen3 Coder 30B A3B Instruct" description="" num_tensors=579 num_key_values=35
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
  Device 0: AMD Radeon RX 7900 XTX, gfx1100 (0x1100), VMM: no, Wave Size: 32, ID: 0
load_backend: loaded ROCm backend from C:\develop\ollama\lib\ollama\ggml-hip.dll
load_backend: loaded CPU backend from C:\develop\ollama\lib\ollama\ggml-cpu-icelake.dll
time=2025-10-11T00:05:22.213+02:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.AVX512=1 CPU.0.AVX512_VBMI=1 CPU.0.AVX512_VNNI=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 ROCm.0.NO_VMM=1 ROCm.0.NO_PEER_COPY=1 ROCm.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(clang)
time=2025-10-11T00:05:22.607+02:00 level=INFO source=runner.go:1171 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:false KvSize:16384 KvCacheType: NumThreads:8 GPULayers:49[ID:0 Layers:49(0..48)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2025-10-11T00:05:23.160+02:00 level=INFO source=runner.go:1171 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:false KvSize:16384 KvCacheType: NumThreads:8 GPULayers:49[ID:0 Layers:49(0..48)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2025-10-11T00:05:23.160+02:00 level=INFO source=ggml.go:487 msg="offloading 48 repeating layers to GPU"
time=2025-10-11T00:05:23.160+02:00 level=INFO source=ggml.go:493 msg="offloading output layer to GPU"
time=2025-10-11T00:05:23.160+02:00 level=INFO source=ggml.go:498 msg="offloaded 49/49 layers to GPU"
time=2025-10-11T00:05:23.160+02:00 level=INFO source=backend.go:310 msg="model weights" device=ROCm0 size="17.1 GiB"
time=2025-10-11T00:05:23.160+02:00 level=INFO source=backend.go:315 msg="model weights" device=CPU size="166.9 MiB"
time=2025-10-11T00:05:23.160+02:00 level=INFO source=backend.go:321 msg="kv cache" device=ROCm0 size="1.5 GiB"
time=2025-10-11T00:05:23.160+02:00 level=INFO source=backend.go:332 msg="compute graph" device=ROCm0 size="1.1 GiB"
time=2025-10-11T00:05:23.160+02:00 level=INFO source=backend.go:337 msg="compute graph" device=CPU size="4.0 MiB"
time=2025-10-11T00:05:23.160+02:00 level=INFO source=backend.go:342 msg="total memory" size="19.8 GiB"
time=2025-10-11T00:05:23.160+02:00 level=INFO source=sched.go:470 msg="loaded runners" count=1
time=2025-10-11T00:05:23.160+02:00 level=INFO source=server.go:1251 msg="waiting for llama runner to start responding"

OS

Windows

GPU

AMD

CPU

AMD

Ollama version

0.12.5

Originally created by @jack-running on GitHub (Oct 10, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/12564 Originally assigned to: @dhiltgen on GitHub. ### What is the issue? On Ollama 0.12.3 it works fine, in 0.12.5 and 0.12.4 the AMD GPUs are recognized, but are not used at all and inference runs only on CPU. ### Relevant log output ```shell 0.12.5 or 0.12.4 log: time=2025-10-11T00:03:32.772+02:00 level=INFO source=routes.go:1532 msg="Listening on [::]:11434 (version 0.12.4)" time=2025-10-11T00:03:32.773+02:00 level=INFO source=runner.go:80 msg="discovering available GPUs..." time=2025-10-11T00:03:35.623+02:00 level=INFO source=types.go:112 msg="inference compute" id=0 library=ROCm compute=gfx1100 name=ROCm1 description="AMD Radeon RX 7900 XTX" libdirs=ollama,rocm driver=60450.10 pci_id=03:00.0 type=discrete total="24.0 GiB" available="23.6 GiB" [GIN] 2025/10/11 - 00:03:35 | 200 | 0s | 127.0.0.1 | HEAD "/" [GIN] 2025/10/11 - 00:03:35 | 200 | 38.627ms | 127.0.0.1 | POST "/api/show" time=2025-10-11T00:03:36.242+02:00 level=INFO source=cpu_windows.go:139 msg=packages count=1 time=2025-10-11T00:03:36.242+02:00 level=INFO source=cpu_windows.go:186 msg="" package=0 cores=8 efficiency=0 threads=16 time=2025-10-11T00:03:36.242+02:00 level=INFO source=server.go:216 msg="enabling flash attention" time=2025-10-11T00:03:36.242+02:00 level=INFO source=server.go:400 msg="starting runner" cmd="C:\\develop\\ollama\\ollama.exe runner --ollama-engine --model C:\\Users\\jack\\.ollama\\models\\blobs\\sha256-1194192cf2a187eb02722edcc3f77b11d21f537048ce04b67ccf8ba78863006a --port 60650" time=2025-10-11T00:03:36.242+02:00 level=INFO source=server.go:675 msg="loading model" "model layers"=49 requested=-1 time=2025-10-11T00:03:36.242+02:00 level=INFO source=cpu_windows.go:139 msg=packages count=1 time=2025-10-11T00:03:36.242+02:00 level=INFO source=cpu_windows.go:186 msg="" package=0 cores=8 efficiency=0 threads=16 time=2025-10-11T00:03:36.242+02:00 level=INFO source=server.go:681 msg="system memory" total="61.8 GiB" free="52.3 GiB" free_swap="55.3 GiB" time=2025-10-11T00:03:36.242+02:00 level=INFO source=server.go:689 msg="gpu memory" id=0 library=ROCm available="370.0 MiB" free="827.0 MiB" minimum="457.0 MiB" overhead="0 B" time=2025-10-11T00:03:36.302+02:00 level=INFO source=runner.go:1299 msg="starting ollama engine" time=2025-10-11T00:03:36.304+02:00 level=INFO source=runner.go:1335 msg="Server listening on 127.0.0.1:60650" time=2025-10-11T00:03:36.308+02:00 level=INFO source=runner.go:1172 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:16384 KvCacheType: NumThreads:8 GPULayers:49[ID:0 Layers:49(0..48)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2025-10-11T00:03:36.326+02:00 level=INFO source=ggml.go:133 msg="" architecture=qwen3moe file_type=Q4_K_M name="Qwen3 Coder 30B A3B Instruct" description="" num_tensors=579 num_key_values=35 load_backend: loaded CPU backend from C:\develop\ollama\lib\ollama\ggml-cpu-icelake.dll ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon RX 7900 XTX, gfx1100 (0x1100), VMM: no, Wave Size: 32, ID: 0 load_backend: loaded ROCm backend from C:\develop\ollama\lib\ollama\rocm\ggml-hip.dll time=2025-10-11T00:03:36.377+02:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.AVX512=1 CPU.0.AVX512_VBMI=1 CPU.0.AVX512_VNNI=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 ROCm.0.NO_VMM=1 ROCm.0.NO_PEER_COPY=1 ROCm.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(clang) time=2025-10-11T00:03:37.020+02:00 level=INFO source=runner.go:1172 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:16384 KvCacheType: NumThreads:8 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2025-10-11T00:03:37.043+02:00 level=INFO source=runner.go:1172 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:16384 KvCacheType: NumThreads:8 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2025-10-11T00:03:37.228+02:00 level=INFO source=runner.go:1172 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:16384 KvCacheType: NumThreads:8 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2025-10-11T00:03:37.228+02:00 level=INFO source=ggml.go:477 msg="offloading 0 repeating layers to GPU" time=2025-10-11T00:03:37.228+02:00 level=INFO source=ggml.go:481 msg="offloading output layer to CPU" time=2025-10-11T00:03:37.228+02:00 level=INFO source=ggml.go:488 msg="offloaded 0/49 layers to GPU" time=2025-10-11T00:03:37.228+02:00 level=INFO source=device.go:211 msg="model weights" device=CPU size="17.3 GiB" time=2025-10-11T00:03:37.228+02:00 level=INFO source=device.go:222 msg="kv cache" device=CPU size="1.5 GiB" time=2025-10-11T00:03:37.228+02:00 level=INFO source=device.go:233 msg="compute graph" device=CPU size="84.0 MiB" time=2025-10-11T00:03:37.228+02:00 level=INFO source=device.go:238 msg="total memory" size="18.9 GiB" time=2025-10-11T00:03:37.228+02:00 level=INFO source=sched.go:481 msg="loaded runners" count=1 time=2025-10-11T00:03:37.228+02:00 level=INFO source=server.go:1271 msg="waiting for llama runner to start responding" 0.12.3 log: time=2025-10-11T00:05:19.966+02:00 level=INFO source=routes.go:1528 msg="Listening on [::]:11434 (version 0.12.3)" time=2025-10-11T00:05:19.966+02:00 level=INFO source=gpu.go:217 msg="looking for compatible GPUs" time=2025-10-11T00:05:19.966+02:00 level=INFO source=gpu_windows.go:167 msg=packages count=1 time=2025-10-11T00:05:19.966+02:00 level=INFO source=gpu_windows.go:214 msg="" package=0 cores=8 efficiency=0 threads=16 time=2025-10-11T00:05:20.384+02:00 level=WARN source=amd_windows.go:139 msg="amdgpu is not supported (supported types:[gfx1030 gfx1100 gfx1101 gfx1102 gfx1151 gfx906])" gpu_type=gfx1103 gpu=0 library=C:\develop\ollama\lib\ollama\rocm time=2025-10-11T00:05:20.727+02:00 level=INFO source=types.go:131 msg="inference compute" id=0 library=rocm variant="" compute=gfx1100 driver=6.4 name="AMD Radeon RX 7900 XTX" total="24.0 GiB" available="23.8 GiB" [GIN] 2025/10/11 - 00:05:20 | 200 | 0s | 127.0.0.1 | HEAD "/" [GIN] 2025/10/11 - 00:05:20 | 200 | 35.5206ms | 127.0.0.1 | POST "/api/show" time=2025-10-11T00:05:21.197+02:00 level=INFO source=sched.go:192 msg="one or more GPUs detected that are unable to accurately report free memory - disabling default concurrency" time=2025-10-11T00:05:21.625+02:00 level=INFO source=server.go:399 msg="starting runner" cmd="C:\\develop\\ollama\\ollama.exe runner --ollama-engine --model C:\\Users\\jack\\.ollama\\models\\blobs\\sha256-1194192cf2a187eb02722edcc3f77b11d21f537048ce04b67ccf8ba78863006a --port 60670" time=2025-10-11T00:05:21.628+02:00 level=INFO source=server.go:672 msg="loading model" "model layers"=49 requested=-1 time=2025-10-11T00:05:21.687+02:00 level=INFO source=runner.go:1252 msg="starting ollama engine" time=2025-10-11T00:05:21.687+02:00 level=INFO source=runner.go:1287 msg="Server listening on 127.0.0.1:60670" time=2025-10-11T00:05:22.007+02:00 level=INFO source=server.go:678 msg="system memory" total="61.8 GiB" free="51.4 GiB" free_swap="54.5 GiB" time=2025-10-11T00:05:22.007+02:00 level=INFO source=server.go:686 msg="gpu memory" id=0 available="23.2 GiB" free="23.7 GiB" minimum="457.0 MiB" overhead="0 B" time=2025-10-11T00:05:22.008+02:00 level=INFO source=runner.go:1171 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:false KvSize:16384 KvCacheType: NumThreads:8 GPULayers:49[ID:0 Layers:49(0..48)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2025-10-11T00:05:22.026+02:00 level=INFO source=ggml.go:131 msg="" architecture=qwen3moe file_type=Q4_K_M name="Qwen3 Coder 30B A3B Instruct" description="" num_tensors=579 num_key_values=35 ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon RX 7900 XTX, gfx1100 (0x1100), VMM: no, Wave Size: 32, ID: 0 load_backend: loaded ROCm backend from C:\develop\ollama\lib\ollama\ggml-hip.dll load_backend: loaded CPU backend from C:\develop\ollama\lib\ollama\ggml-cpu-icelake.dll time=2025-10-11T00:05:22.213+02:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.AVX512=1 CPU.0.AVX512_VBMI=1 CPU.0.AVX512_VNNI=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 ROCm.0.NO_VMM=1 ROCm.0.NO_PEER_COPY=1 ROCm.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(clang) time=2025-10-11T00:05:22.607+02:00 level=INFO source=runner.go:1171 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:false KvSize:16384 KvCacheType: NumThreads:8 GPULayers:49[ID:0 Layers:49(0..48)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2025-10-11T00:05:23.160+02:00 level=INFO source=runner.go:1171 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:false KvSize:16384 KvCacheType: NumThreads:8 GPULayers:49[ID:0 Layers:49(0..48)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2025-10-11T00:05:23.160+02:00 level=INFO source=ggml.go:487 msg="offloading 48 repeating layers to GPU" time=2025-10-11T00:05:23.160+02:00 level=INFO source=ggml.go:493 msg="offloading output layer to GPU" time=2025-10-11T00:05:23.160+02:00 level=INFO source=ggml.go:498 msg="offloaded 49/49 layers to GPU" time=2025-10-11T00:05:23.160+02:00 level=INFO source=backend.go:310 msg="model weights" device=ROCm0 size="17.1 GiB" time=2025-10-11T00:05:23.160+02:00 level=INFO source=backend.go:315 msg="model weights" device=CPU size="166.9 MiB" time=2025-10-11T00:05:23.160+02:00 level=INFO source=backend.go:321 msg="kv cache" device=ROCm0 size="1.5 GiB" time=2025-10-11T00:05:23.160+02:00 level=INFO source=backend.go:332 msg="compute graph" device=ROCm0 size="1.1 GiB" time=2025-10-11T00:05:23.160+02:00 level=INFO source=backend.go:337 msg="compute graph" device=CPU size="4.0 MiB" time=2025-10-11T00:05:23.160+02:00 level=INFO source=backend.go:342 msg="total memory" size="19.8 GiB" time=2025-10-11T00:05:23.160+02:00 level=INFO source=sched.go:470 msg="loaded runners" count=1 time=2025-10-11T00:05:23.160+02:00 level=INFO source=server.go:1251 msg="waiting for llama runner to start responding" ``` ### OS Windows ### GPU AMD ### CPU AMD ### Ollama version 0.12.5

GiteaMirror added the bug amd windows labels 2026-04-29 07:36:46 -05:00

GiteaMirror closed this issue

2026-04-29 07:36:47 -05:00

GiteaMirror commented

2026-04-29 07:36:49 -05:00

@jessegross commented on GitHub (Oct 10, 2025):

With 0.12.5:

On startup:
time=2025-10-11T00:03:35.623+02:00 level=INFO source=types.go:112 msg="inference compute" id=0 library=ROCm compute=gfx1100 name=ROCm1 description="AMD Radeon RX 7900 XTX" libdirs=ollama,rocm driver=60450.10 pci_id=03:00.0 type=discrete total="24.0 GiB" available="23.6 GiB"

We detect 23.6 GB of available VRAM.

Right before the model is loaded:
time=2025-10-11T00:03:36.242+02:00 level=INFO source=server.go:689 msg="gpu memory" id=0 library=ROCm available="370.0 MiB" free="827.0 MiB" minimum="457.0 MiB" overhead="0 B"

We only see 827 MB of free VRAM. What does Windows task manager show right before you launch the model?

We have improved free memory reporting for AMD on Windows with this version - it could be buggy or it could be more accurately reflecting reality.

@jessegross commented on GitHub (Oct 10, 2025): With 0.12.5: On startup: `time=2025-10-11T00:03:35.623+02:00 level=INFO source=types.go:112 msg="inference compute" id=0 library=ROCm compute=gfx1100 name=ROCm1 description="AMD Radeon RX 7900 XTX" libdirs=ollama,rocm driver=60450.10 pci_id=03:00.0 type=discrete total="24.0 GiB" available="23.6 GiB"` We detect 23.6 GB of available VRAM. Right before the model is loaded: `time=2025-10-11T00:03:36.242+02:00 level=INFO source=server.go:689 msg="gpu memory" id=0 library=ROCm available="370.0 MiB" free="827.0 MiB" minimum="457.0 MiB" overhead="0 B"` We only see 827 MB of free VRAM. What does Windows task manager show right before you launch the model? We have improved free memory reporting for AMD on Windows with this version - it could be buggy or it could be more accurately reflecting reality.

GiteaMirror commented

2026-04-29 07:36:49 -05:00

@dhiltgen commented on GitHub (Oct 10, 2025):

If task manager claims much more VRAM is available, could you try again with $env:OLLAMA_DEBUG="1" so we can see more details in the log?

@dhiltgen commented on GitHub (Oct 10, 2025): If task manager claims much more VRAM is available, could you try again with `$env:OLLAMA_DEBUG="1"` so we can see more details in the log?

GiteaMirror commented

2026-04-29 07:36:49 -05:00

@dhiltgen commented on GitHub (Oct 10, 2025):

Actually we may need $env:OLLAMA_DEBUG="2" to see why the AMD GPU VRAM reporting is getting things wrong. The easiest way to get this is to quit Ollama from the system tray, then in a powershell terminal run:

$env:OLLAMA_DEBUG="2"
ollama serve 2>&1 | % ToString | tee-object serve.log

Then in another terminal run

ollama run gpt-oss:20b hello

@dhiltgen commented on GitHub (Oct 10, 2025): Actually we may need `$env:OLLAMA_DEBUG="2"` to see why the AMD GPU VRAM reporting is getting things wrong. The easiest way to get this is to quit Ollama from the system tray, then in a powershell terminal run: ```powershell $env:OLLAMA_DEBUG="2" ollama serve 2>&1 | % ToString | tee-object serve.log ``` Then in another terminal run ```powershell ollama run gpt-oss:20b hello ```

GiteaMirror commented

2026-04-29 07:36:50 -05:00

@StrykeSlammerII commented on GitHub (Oct 10, 2025):

Similar issue here (AMD card, Manjaro/Linux, Intel CPU), just started after upgrading ollama to 0.12.4

I typically use $ OLLAMA_FLASH_ATTENTION=1 ollama start, and (in 0.12.4) that log ends showing 0 VRAM regardless of what other programs I have running.

time=2025-10-10T19:31:43.521-04:00 level=TRACE source=runner.go:505 msg="runner exited" OLLAMA_LIBRARY_PATH=[/usr/lib/ollama] extra_envs="[GGML_CUDA_INIT=1 ROCR_VISIBLE_DEVICES=GPU-b3d5d3574c66244c]" code=2
time=2025-10-10T19:31:43.521-04:00 level=TRACE source=runner.go:510 msg="runner enumerated devices" OLLAMA_LIBRARY_PATH=[/usr/lib/ollama] devices=[]
time=2025-10-10T19:31:43.521-04:00 level=DEBUG source=runner.go:414 msg="bootstrap discovery took" duration=547.470993ms OLLAMA_LIBRARY_PATH=[/usr/lib/ollama] extra_envs="[GGML_CUDA_INIT=1 ROCR_VISIBLE_DEVICES=GPU-b3d5d3574c66244c]"
time=2025-10-10T19:31:43.521-04:00 level=TRACE source=runner.go:164 msg="supported GPU library combinations" supported=map[]
time=2025-10-10T19:31:43.521-04:00 level=TRACE source=runner.go:173 msg="removing unsupported or overlapping GPU combination" libDir=/usr/lib/ollama description="AMD Radeon Graphics" compute=gfx1200 pci_id=04:00.0
time=2025-10-10T19:31:43.521-04:00 level=DEBUG source=runner.go:45 msg="GPU bootstrap discovery took" duration=1.03841313s
time=2025-10-10T19:31:43.521-04:00 level=INFO source=types.go:129 msg="inference compute" id=cpu library=cpu compute="" name=cpu description=cpu libdirs=ollama driver="" pci_id="" type="" total="62.5 GiB" available="52.0 GiB"
time=2025-10-10T19:31:43.521-04:00 level=INFO source=routes.go:1573 msg="entering low vram mode" "total vram"="0 B" threshold="20.0 GiB"

log.txt

Not sure whether this is different enough for a new ticket, I'll be happy to submit one if requested.

@StrykeSlammerII commented on GitHub (Oct 10, 2025): Similar issue here (AMD card, Manjaro/Linux, Intel CPU), just started after upgrading ollama to 0.12.4 I typically use `$ OLLAMA_FLASH_ATTENTION=1 ollama start`, and (in 0.12.4) that log ends showing 0 VRAM regardless of what other programs I have running. ``` time=2025-10-10T19:31:43.521-04:00 level=TRACE source=runner.go:505 msg="runner exited" OLLAMA_LIBRARY_PATH=[/usr/lib/ollama] extra_envs="[GGML_CUDA_INIT=1 ROCR_VISIBLE_DEVICES=GPU-b3d5d3574c66244c]" code=2 time=2025-10-10T19:31:43.521-04:00 level=TRACE source=runner.go:510 msg="runner enumerated devices" OLLAMA_LIBRARY_PATH=[/usr/lib/ollama] devices=[] time=2025-10-10T19:31:43.521-04:00 level=DEBUG source=runner.go:414 msg="bootstrap discovery took" duration=547.470993ms OLLAMA_LIBRARY_PATH=[/usr/lib/ollama] extra_envs="[GGML_CUDA_INIT=1 ROCR_VISIBLE_DEVICES=GPU-b3d5d3574c66244c]" time=2025-10-10T19:31:43.521-04:00 level=TRACE source=runner.go:164 msg="supported GPU library combinations" supported=map[] time=2025-10-10T19:31:43.521-04:00 level=TRACE source=runner.go:173 msg="removing unsupported or overlapping GPU combination" libDir=/usr/lib/ollama description="AMD Radeon Graphics" compute=gfx1200 pci_id=04:00.0 time=2025-10-10T19:31:43.521-04:00 level=DEBUG source=runner.go:45 msg="GPU bootstrap discovery took" duration=1.03841313s time=2025-10-10T19:31:43.521-04:00 level=INFO source=types.go:129 msg="inference compute" id=cpu library=cpu compute="" name=cpu description=cpu libdirs=ollama driver="" pci_id="" type="" total="62.5 GiB" available="52.0 GiB" time=2025-10-10T19:31:43.521-04:00 level=INFO source=routes.go:1573 msg="entering low vram mode" "total vram"="0 B" threshold="20.0 GiB" ``` [log.txt](https://github.com/user-attachments/files/22859316/log.txt) Not sure whether this is different enough for a new ticket, I'll be happy to submit one if requested.

GiteaMirror commented

2026-04-29 07:36:51 -05:00

@esmorun commented on GitHub (Oct 11, 2025):

Same issue on 9070 XT after upgrade to v0.12.5, downgrading to v0.12.3 solved it. I noticed this line claiming there's no free vram:

llama_model_load_from_file_impl: using device ROCm0 (AMD Radeon RX 9070 XT) (0000:03:00.0) - 0 MiB free

time=2025-10-11T113353.627+0200 lev.txt

@esmorun commented on GitHub (Oct 11, 2025): Same issue on 9070 XT after upgrade to v0.12.5, downgrading to v0.12.3 solved it. I noticed this line claiming there's no free vram: `llama_model_load_from_file_impl: using device ROCm0 (AMD Radeon RX 9070 XT) (0000:03:00.0) - 0 MiB free` [time=2025-10-11T113353.627+0200 lev.txt](https://github.com/user-attachments/files/22863439/time.2025-10-11T113353.627%2B0200.lev.txt)

GiteaMirror commented

2026-04-29 07:36:54 -05:00

@inforithmics commented on GitHub (Oct 12, 2025):

I had a look into the log and saw following line
runner exited" OLLAMA_LIBRARY_PATH=[/usr/lib/ollama] extra_envs="[GGML_CUDA_INIT=1 ROCR_VISIBLE_DEVICES=GPU-b3d5d3574c66244c]
ROCR_VISIBLE_DEVICES is initialized with a GPU-UUID ~~but it only supports numbers 0,1,2 So that~~. The Device is not found and returns
0 available Memory. I had to fix this in the Vulkan Pull Request to get correct Sizes.

	"GGML_CUDA_INIT=1",           // force deep initialization to trigger crash on unsupported GPUs
	envVar + "=" + devices[i].ID, // Filter to just this one GPU

replace with

	"GGML_CUDA_INIT=1",                   // force deep initialization to trigger crash on unsupported GPUs
	envVar + "=" + devices[i].FilteredID, // Filter to just this one GPU

Add following to bootstrapDevices

// Enumerate returned devices starting at 0 per library and assign the per-library index as FilteredID
	libCounts := make(map[string]int)
	for i := range devices {
		lib := devices[i].Library
		devices[i].FilteredID = strconv.Itoa(libCounts[lib])
		libCounts[lib]++
	}

~~Maybe a Graphic Driver change suddenly retunred the UUID and it stopped working~~
Maybe to GPU UUIDs stopped working and only Device Ids work (0,1,2).

@inforithmics commented on GitHub (Oct 12, 2025): I had a look into the log and saw following line runner exited" OLLAMA_LIBRARY_PATH=[/usr/lib/ollama] extra_envs="[GGML_CUDA_INIT=1 ROCR_VISIBLE_DEVICES=GPU-b3d5d3574c66244c] ROCR_VISIBLE_DEVICES is initialized with a GPU-UUID ~~but it only supports numbers 0,1,2 So that~~. The Device is not found and returns 0 available Memory. I had to fix this in the Vulkan Pull Request to get correct Sizes. ``` "GGML_CUDA_INIT=1", // force deep initialization to trigger crash on unsupported GPUs envVar + "=" + devices[i].ID, // Filter to just this one GPU ``` replace with ``` "GGML_CUDA_INIT=1", // force deep initialization to trigger crash on unsupported GPUs envVar + "=" + devices[i].FilteredID, // Filter to just this one GPU ``` Add following to bootstrapDevices ``` // Enumerate returned devices starting at 0 per library and assign the per-library index as FilteredID libCounts := make(map[string]int) for i := range devices { lib := devices[i].Library devices[i].FilteredID = strconv.Itoa(libCounts[lib]) libCounts[lib]++ } ``` ~~Maybe a Graphic Driver change suddenly retunred the UUID and it stopped working~~ Maybe to GPU UUIDs stopped working and only Device Ids work (0,1,2).

GiteaMirror commented

2026-04-29 07:36:58 -05:00

@dhiltgen commented on GitHub (Oct 12, 2025):

@esmorun and anyone else facing this issue, please run the server with OLLAMA_DEBUG="2" and share the startup logs up to the point of "inference compute" or where the system is reporting 0mb free when loading a model so we can see what the problem is.

@dhiltgen commented on GitHub (Oct 12, 2025): @esmorun and anyone else facing this issue, please run the server with OLLAMA_DEBUG="2" and share the startup logs up to the point of "inference compute" or where the system is reporting 0mb free when loading a model so we can see what the problem is.

GiteaMirror commented

2026-04-29 07:37:00 -05:00

@StrykeSlammerII commented on GitHub (Oct 13, 2025):

Mine reports 0mb free before loading a model, just when starting the server--but attached below is a log with OLLAMA_DEBUG="2" up to source=sched.go:493 msg="finished setting up" runner.name=registry.ollama.ai/library/PG-snow:latest runner.size="16.2 GiB" runner.vram="0 B"

There's a SIGSEGV before the model is loaded:

time=2025-10-12T20:30:06.138-04:00 level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/lib/ollama
/opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
ggml_cuda_init: initializing rocBLAS on device 0
SIGSEGV: segmentation violation

not sure if that's relevant as it still proceeds to find the correct gfx1200 device
though it reports 0MiB free

ggml_cuda_init: found 1 ROCm devices:
  Device 0: AMD Radeon Graphics, gfx1200 (0x1200), VMM: no, Wave Size: 32, ID: GPU-b3d5d3574c66244c
load_backend: loaded ROCm backend from /usr/lib/ollama/libggml-hip.so
load_backend: loaded CPU backend from /usr/lib/ollama/libggml-cpu-alderlake.so
time=2025-10-12T20:31:08.685-04:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX_VNNI=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 ROCm.0.NO_VMM=1 ROCm.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc)
time=2025-10-12T20:31:08.685-04:00 level=INFO source=runner.go:900 msg="Server listening on 127.0.0.1:34115"
time=2025-10-12T20:31:08.699-04:00 level=INFO source=runner.go:799 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:20000 KvCacheType: NumThreads:24 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
llama_model_load_from_file_impl: using device ROCm0 (AMD Radeon Graphics) (0000:04:00.0) - 0 MiB free

Loading the model comes later, confirms 0 layers offloaded to GPU (which is atypical)

load_tensors: tensor 'token_embd.weight' (q6_K) (and 0 others) cannot be used with preferred buffer type ROCm_Host, using CPU instead
load_tensors: offloading 0 repeating layers to GPU
load_tensors: offloaded 0/49 layers to GPU

I'm running Ollama 0.12.5 from CLI, loading model via open-webui
$ uname -r
6.12.51-1-MANJARO

$ ollama -v
ollama version is 0.12.5

Full log:
log.txt

@StrykeSlammerII commented on GitHub (Oct 13, 2025): Mine reports 0mb free _before_ loading a model, just when starting the server--but attached below is a log with OLLAMA_DEBUG="2" up to `source=sched.go:493 msg="finished setting up" runner.name=registry.ollama.ai/library/PG-snow:latest runner.size="16.2 GiB" runner.vram="0 B"` There's a SIGSEGV before the model is loaded: ``` time=2025-10-12T20:30:06.138-04:00 level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/lib/ollama /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 ROCm devices: ggml_cuda_init: initializing rocBLAS on device 0 SIGSEGV: segmentation violation ``` not sure if that's relevant as it still proceeds to find the correct gfx1200 device though it reports 0MiB free ``` ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon Graphics, gfx1200 (0x1200), VMM: no, Wave Size: 32, ID: GPU-b3d5d3574c66244c load_backend: loaded ROCm backend from /usr/lib/ollama/libggml-hip.so load_backend: loaded CPU backend from /usr/lib/ollama/libggml-cpu-alderlake.so time=2025-10-12T20:31:08.685-04:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX_VNNI=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 ROCm.0.NO_VMM=1 ROCm.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc) time=2025-10-12T20:31:08.685-04:00 level=INFO source=runner.go:900 msg="Server listening on 127.0.0.1:34115" time=2025-10-12T20:31:08.699-04:00 level=INFO source=runner.go:799 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:20000 KvCacheType: NumThreads:24 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" llama_model_load_from_file_impl: using device ROCm0 (AMD Radeon Graphics) (0000:04:00.0) - 0 MiB free ``` Loading the model comes later, confirms 0 layers offloaded to GPU (which is atypical) ``` load_tensors: tensor 'token_embd.weight' (q6_K) (and 0 others) cannot be used with preferred buffer type ROCm_Host, using CPU instead load_tensors: offloading 0 repeating layers to GPU load_tensors: offloaded 0/49 layers to GPU ``` I'm running Ollama 0.12.5 from CLI, loading model via open-webui $ uname -r 6.12.51-1-MANJARO $ ollama -v ollama version is 0.12.5 Full log: [log.txt](https://github.com/user-attachments/files/22875495/log.txt)

GiteaMirror commented

2026-04-29 07:37:01 -05:00

@geminigeek commented on GitHub (Oct 13, 2025):

same issue works with ollama/ollama:0.12.3-rocm with gpu AMD 5600g

@geminigeek commented on GitHub (Oct 13, 2025): same issue works with `ollama/ollama:0.12.3-rocm` with gpu AMD 5600g

GiteaMirror commented

2026-04-29 07:37:02 -05:00

@dhiltgen commented on GitHub (Oct 13, 2025):

Thanks for the logs @StrykeSlammerII - it looks like your failure may be different than @jack-running. Did prior versions of Ollama work correctly on your GPU? I don't have a matching GPU to test, but an RX 9070 (gfx1201) seems to be working OK on linux. Which model do you have? You may be dependent on #10676 which updates the ROCm version we're using for the official binary releases.

@dhiltgen commented on GitHub (Oct 13, 2025): Thanks for the logs @StrykeSlammerII - it looks like your failure may be different than @jack-running. Did prior versions of Ollama work correctly on your GPU? I don't have a matching GPU to test, but an RX 9070 (gfx1201) seems to be working OK on linux. Which model do you have? You may be dependent on #10676 which updates the ROCm version we're using for the official binary releases.

GiteaMirror commented

2026-04-29 07:37:02 -05:00

@StrykeSlammerII commented on GitHub (Oct 13, 2025):

Yes, Ollama was working fine until I updated recently--sorry I don't know the exact version that worked correctly.
My GPU is a RX 9060 XT.

Should I make a new issue so as not to clutter this one?

@StrykeSlammerII commented on GitHub (Oct 13, 2025): Yes, Ollama was working fine until I updated recently--sorry I don't know the exact version that worked correctly. My GPU is a RX 9060 XT. Should I make a new issue so as not to clutter this one?

GiteaMirror commented

2026-04-29 07:37:02 -05:00

@dhiltgen commented on GitHub (Oct 13, 2025):

Should I make a new issue so as not to clutter this one?

Yes please.

If possible, please try to share logs from a prior run that did bind to the GPU as well.

The new discovery code in 0.12.5+ tries harder to verify the GPU will work properly, where older versions would sometimes crash for some models during inference. I'm not sure, but it's possible if you only used certain models before, you may not have exercised the code paths that would have led to a crash which the new code tries to verify at startup.

One other thing to try is setting HSA_OVERRIDE_GFX_VERSION_0=12.0.1 and see if that changes behavior at all.

@dhiltgen commented on GitHub (Oct 13, 2025): > Should I make a new issue so as not to clutter this one? Yes please. If possible, please try to share logs from a prior run that did bind to the GPU as well. The new discovery code in 0.12.5+ tries harder to verify the GPU will work properly, where older versions would sometimes crash for some models during inference. I'm not sure, but it's possible if you only used certain models before, you may not have exercised the code paths that would have led to a crash which the new code tries to verify at startup. One other thing to try is setting `HSA_OVERRIDE_GFX_VERSION_0=12.0.1` and see if that changes behavior at all.

GiteaMirror commented

2026-04-29 07:37:03 -05:00

@thk-socal commented on GitHub (Oct 17, 2025):

I have the same type of issue running docker on Ubuntu 24.04. Running 0.12.0-rocm was fine and now with the last few releases it will NOT work with my GPU. I have even tagged it directly to the GPU with the ID and told it not to use discovery. It defaults back to CPU mode.

time=2025-10-17T04:40:51.213Z level=INFO source=routes.go:1511 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES:GPU-44d96d7bf1798b3e HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:DEBUG-4 OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY:rocm_v6 OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES:GPU-44d96d7bf1798b3e http_proxy: https_proxy: no_proxy:]"
time=2025-10-17T04:40:51.217Z level=INFO source=images.go:522 msg="total blobs: 38"
time=2025-10-17T04:40:51.218Z level=INFO source=images.go:529 msg="total unused blobs removed: 0"
time=2025-10-17T04:40:51.219Z level=INFO source=routes.go:1564 msg="Listening on [::]:11434 (version 0.12.6)"
time=2025-10-17T04:40:51.219Z level=DEBUG source=sched.go:123 msg="starting llm scheduler"
time=2025-10-17T04:40:51.219Z level=INFO source=runner.go:80 msg="discovering available GPUs..."
time=2025-10-17T04:40:51.219Z level=DEBUG source=runner.go:94 msg="skipping available library at users request" requested=rocm_v6 libDir=/usr/lib/ollama/rocm
time=2025-10-17T04:40:51.219Z level=DEBUG source=runner.go:118 msg="filtering out unsupported or overlapping GPU library combinations" count=0
time=2025-10-17T04:40:51.219Z level=TRACE source=runner.go:171 msg="supported GPU library combinations" supported=map[]
time=2025-10-17T04:40:51.220Z level=DEBUG source=runner.go:45 msg="GPU bootstrap discovery took" duration=403.669µs
time=2025-10-17T04:40:51.220Z level=INFO source=types.go:129 msg="inference compute" id=cpu library=cpu compute="" name=cpu description=cpu libdirs=ollama driver="" pci_id="" type="" total="31.0 GiB" available="29.3 GiB"
time=2025-10-17T04:40:51.220Z level=INFO source=routes.go:1605 msg="entering low vram mode" "total vram"="0 B" threshold="20.0 GiB"

@thk-socal commented on GitHub (Oct 17, 2025): I have the same type of issue running docker on Ubuntu 24.04. Running 0.12.0-rocm was fine and now with the last few releases it will NOT work with my GPU. I have even tagged it directly to the GPU with the ID and told it not to use discovery. It defaults back to CPU mode. ``` time=2025-10-17T04:40:51.213Z level=INFO source=routes.go:1511 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES:GPU-44d96d7bf1798b3e HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:DEBUG-4 OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY:rocm_v6 OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES:GPU-44d96d7bf1798b3e http_proxy: https_proxy: no_proxy:]" time=2025-10-17T04:40:51.217Z level=INFO source=images.go:522 msg="total blobs: 38" time=2025-10-17T04:40:51.218Z level=INFO source=images.go:529 msg="total unused blobs removed: 0" time=2025-10-17T04:40:51.219Z level=INFO source=routes.go:1564 msg="Listening on [::]:11434 (version 0.12.6)" time=2025-10-17T04:40:51.219Z level=DEBUG source=sched.go:123 msg="starting llm scheduler" time=2025-10-17T04:40:51.219Z level=INFO source=runner.go:80 msg="discovering available GPUs..." time=2025-10-17T04:40:51.219Z level=DEBUG source=runner.go:94 msg="skipping available library at users request" requested=rocm_v6 libDir=/usr/lib/ollama/rocm time=2025-10-17T04:40:51.219Z level=DEBUG source=runner.go:118 msg="filtering out unsupported or overlapping GPU library combinations" count=0 time=2025-10-17T04:40:51.219Z level=TRACE source=runner.go:171 msg="supported GPU library combinations" supported=map[] time=2025-10-17T04:40:51.220Z level=DEBUG source=runner.go:45 msg="GPU bootstrap discovery took" duration=403.669µs time=2025-10-17T04:40:51.220Z level=INFO source=types.go:129 msg="inference compute" id=cpu library=cpu compute="" name=cpu description=cpu libdirs=ollama driver="" pci_id="" type="" total="31.0 GiB" available="29.3 GiB" time=2025-10-17T04:40:51.220Z level=INFO source=routes.go:1605 msg="entering low vram mode" "total vram"="0 B" threshold="20.0 GiB" ```

GiteaMirror commented

2026-04-29 07:37:03 -05:00

@thk-socal commented on GitHub (Oct 17, 2025):

Here is 0.12.6-rocm

time=2025-10-17T05:51:09.470Z level=INFO source=routes.go:1511 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:DEBUG-4 OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES:GPU-44d96d7bf1798b3e http_proxy: https_proxy: no_proxy:]"
time=2025-10-17T05:51:09.475Z level=INFO source=images.go:522 msg="total blobs: 38"
time=2025-10-17T05:51:09.476Z level=INFO source=images.go:529 msg="total unused blobs removed: 0"
time=2025-10-17T05:51:09.476Z level=INFO source=routes.go:1564 msg="Listening on [::]:11434 (version 0.12.6)"
time=2025-10-17T05:51:09.476Z level=DEBUG source=sched.go:123 msg="starting llm scheduler"
time=2025-10-17T05:51:09.477Z level=INFO source=runner.go:80 msg="discovering available GPUs..."
time=2025-10-17T05:51:09.477Z level=DEBUG source=runner.go:448 msg="spawning runner with" OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/rocm]" extra_envs=[]
time=2025-10-17T05:51:09.477Z level=TRACE source=runner.go:529 msg="starting runner for device discovery" env="[PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin HOSTNAME=compute-01 TZ=America/Los_Angeles ROCR_VISIBLE_DEVICES=GPU-44d96d7bf1798b3e GGML_CUDA_INIT=0 CUDA_VISIBLE_DEVICES= OLLAMA_DEBUG=2 LD_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama:/usr/lib/ollama/rocm:/usr/lib/ollama/rocm:/usr/local/nvidia/lib:/usr/local/nvidia/lib64 NVIDIA_DRIVER_CAPABILITIES=compute,utility NVIDIA_VISIBLE_DEVICES=all OLLAMA_HOST=0.0.0.0:11434 HOME=/root OLLAMA_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/rocm]" cmd="/usr/bin/ollama runner --ollama-engine --port 34613"
time=2025-10-17T05:51:09.497Z level=INFO source=runner.go:1332 msg="starting ollama engine"
time=2025-10-17T05:51:09.497Z level=INFO source=runner.go:1367 msg="Server listening on 127.0.0.1:34613"
time=2025-10-17T05:51:09.502Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32
time=2025-10-17T05:51:09.502Z level=DEBUG source=gguf.go:578 msg=general.architecture type=string
time=2025-10-17T05:51:09.502Z level=DEBUG source=gguf.go:578 msg=tokenizer.ggml.model type=string
time=2025-10-17T05:51:09.502Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32
time=2025-10-17T05:51:09.502Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.file_type default=0
time=2025-10-17T05:51:09.502Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.name default=""
time=2025-10-17T05:51:09.502Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.description default=""
time=2025-10-17T05:51:09.502Z level=INFO source=ggml.go:134 msg="" architecture=llama file_type=unknown name="" description="" num_tensors=0 num_key_values=3
time=2025-10-17T05:51:09.502Z level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/lib/ollama
load_backend: loaded CPU backend from /usr/lib/ollama/libggml-cpu-icelake.so
time=2025-10-17T05:51:09.513Z level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/lib/ollama/rocm
/opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
ggml_cuda_init: failed to initialize ROCm: no ROCm-capable device is detected
load_backend: loaded ROCm backend from /usr/lib/ollama/rocm/libggml-hip.so
time=2025-10-17T05:51:09.589Z level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.AVX512=1 CPU.0.AVX512_VBMI=1 CPU.0.AVX512_VNNI=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc)
time=2025-10-17T05:51:09.589Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.block_count default=0
time=2025-10-17T05:51:09.590Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.pooling_type default=0
time=2025-10-17T05:51:09.590Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.expert_count default=0
time=2025-10-17T05:51:09.590Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.tokens default="&{size:0 values:[]}"
time=2025-10-17T05:51:09.590Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.scores default="&{size:0 values:[]}"
time=2025-10-17T05:51:09.590Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.token_type default="&{size:0 values:[]}"
time=2025-10-17T05:51:09.590Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.merges default="&{size:0 values:[]}"
time=2025-10-17T05:51:09.590Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.add_bos_token default=true
time=2025-10-17T05:51:09.590Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.bos_token_id default=0
time=2025-10-17T05:51:09.590Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.add_eos_token default=false
time=2025-10-17T05:51:09.590Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.eos_token_id default=0
time=2025-10-17T05:51:09.590Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}"
time=2025-10-17T05:51:09.590Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.pre default=""
time=2025-10-17T05:51:09.590Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.block_count default=0
time=2025-10-17T05:51:09.590Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.embedding_length default=0
time=2025-10-17T05:51:09.590Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.attention.head_count default=0
time=2025-10-17T05:51:09.590Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.attention.head_count_kv default=0
time=2025-10-17T05:51:09.590Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.attention.key_length default=0
time=2025-10-17T05:51:09.590Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.rope.dimension_count default=0
time=2025-10-17T05:51:09.590Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.attention.layer_norm_rms_epsilon default=0
time=2025-10-17T05:51:09.590Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.rope.freq_base default=100000
time=2025-10-17T05:51:09.590Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.rope.scaling.factor default=1
time=2025-10-17T05:51:09.590Z level=DEBUG source=runner.go:1307 msg="dummy model load took" duration=90.876049ms
time=2025-10-17T05:51:09.590Z level=DEBUG source=runner.go:1312 msg="gathering device infos took" duration=1.06µs
time=2025-10-17T05:51:09.591Z level=TRACE source=runner.go:548 msg="runner enumerated devices" OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/rocm]" devices=[]
time=2025-10-17T05:51:09.591Z level=DEBUG source=runner.go:451 msg="bootstrap discovery took" duration=114.323231ms OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/rocm]" extra_envs=[]
time=2025-10-17T05:51:09.591Z level=DEBUG source=runner.go:118 msg="filtering out unsupported or overlapping GPU library combinations" count=0
time=2025-10-17T05:51:09.591Z level=TRACE source=runner.go:171 msg="supported GPU library combinations" supported=map[]
time=2025-10-17T05:51:09.591Z level=DEBUG source=runner.go:45 msg="GPU bootstrap discovery took" duration=114.955066ms
time=2025-10-17T05:51:09.591Z level=INFO source=types.go:129 msg="inference compute" id=cpu library=cpu compute="" name=cpu description=cpu libdirs=ollama driver="" pci_id="" type="" total="31.0 GiB" available="29.3 GiB"
time=2025-10-17T05:51:09.591Z level=INFO source=routes.go:1605 msg="entering low vram mode" "total vram"="0 B" threshold="20.0 GiB"

@thk-socal commented on GitHub (Oct 17, 2025): Here is 0.12.6-rocm ``` time=2025-10-17T05:51:09.470Z level=INFO source=routes.go:1511 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:DEBUG-4 OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES:GPU-44d96d7bf1798b3e http_proxy: https_proxy: no_proxy:]" time=2025-10-17T05:51:09.475Z level=INFO source=images.go:522 msg="total blobs: 38" time=2025-10-17T05:51:09.476Z level=INFO source=images.go:529 msg="total unused blobs removed: 0" time=2025-10-17T05:51:09.476Z level=INFO source=routes.go:1564 msg="Listening on [::]:11434 (version 0.12.6)" time=2025-10-17T05:51:09.476Z level=DEBUG source=sched.go:123 msg="starting llm scheduler" time=2025-10-17T05:51:09.477Z level=INFO source=runner.go:80 msg="discovering available GPUs..." time=2025-10-17T05:51:09.477Z level=DEBUG source=runner.go:448 msg="spawning runner with" OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/rocm]" extra_envs=[] time=2025-10-17T05:51:09.477Z level=TRACE source=runner.go:529 msg="starting runner for device discovery" env="[PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin HOSTNAME=compute-01 TZ=America/Los_Angeles ROCR_VISIBLE_DEVICES=GPU-44d96d7bf1798b3e GGML_CUDA_INIT=0 CUDA_VISIBLE_DEVICES= OLLAMA_DEBUG=2 LD_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama:/usr/lib/ollama/rocm:/usr/lib/ollama/rocm:/usr/local/nvidia/lib:/usr/local/nvidia/lib64 NVIDIA_DRIVER_CAPABILITIES=compute,utility NVIDIA_VISIBLE_DEVICES=all OLLAMA_HOST=0.0.0.0:11434 HOME=/root OLLAMA_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/rocm]" cmd="/usr/bin/ollama runner --ollama-engine --port 34613" time=2025-10-17T05:51:09.497Z level=INFO source=runner.go:1332 msg="starting ollama engine" time=2025-10-17T05:51:09.497Z level=INFO source=runner.go:1367 msg="Server listening on 127.0.0.1:34613" time=2025-10-17T05:51:09.502Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32 time=2025-10-17T05:51:09.502Z level=DEBUG source=gguf.go:578 msg=general.architecture type=string time=2025-10-17T05:51:09.502Z level=DEBUG source=gguf.go:578 msg=tokenizer.ggml.model type=string time=2025-10-17T05:51:09.502Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32 time=2025-10-17T05:51:09.502Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.file_type default=0 time=2025-10-17T05:51:09.502Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.name default="" time=2025-10-17T05:51:09.502Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.description default="" time=2025-10-17T05:51:09.502Z level=INFO source=ggml.go:134 msg="" architecture=llama file_type=unknown name="" description="" num_tensors=0 num_key_values=3 time=2025-10-17T05:51:09.502Z level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/lib/ollama load_backend: loaded CPU backend from /usr/lib/ollama/libggml-cpu-icelake.so time=2025-10-17T05:51:09.513Z level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/lib/ollama/rocm /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory ggml_cuda_init: failed to initialize ROCm: no ROCm-capable device is detected load_backend: loaded ROCm backend from /usr/lib/ollama/rocm/libggml-hip.so time=2025-10-17T05:51:09.589Z level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.AVX512=1 CPU.0.AVX512_VBMI=1 CPU.0.AVX512_VNNI=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc) time=2025-10-17T05:51:09.589Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.block_count default=0 time=2025-10-17T05:51:09.590Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.pooling_type default=0 time=2025-10-17T05:51:09.590Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.expert_count default=0 time=2025-10-17T05:51:09.590Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.tokens default="&{size:0 values:[]}" time=2025-10-17T05:51:09.590Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.scores default="&{size:0 values:[]}" time=2025-10-17T05:51:09.590Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.token_type default="&{size:0 values:[]}" time=2025-10-17T05:51:09.590Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.merges default="&{size:0 values:[]}" time=2025-10-17T05:51:09.590Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.add_bos_token default=true time=2025-10-17T05:51:09.590Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.bos_token_id default=0 time=2025-10-17T05:51:09.590Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.add_eos_token default=false time=2025-10-17T05:51:09.590Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.eos_token_id default=0 time=2025-10-17T05:51:09.590Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}" time=2025-10-17T05:51:09.590Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.pre default="" time=2025-10-17T05:51:09.590Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.block_count default=0 time=2025-10-17T05:51:09.590Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.embedding_length default=0 time=2025-10-17T05:51:09.590Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.attention.head_count default=0 time=2025-10-17T05:51:09.590Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.attention.head_count_kv default=0 time=2025-10-17T05:51:09.590Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.attention.key_length default=0 time=2025-10-17T05:51:09.590Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.rope.dimension_count default=0 time=2025-10-17T05:51:09.590Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.attention.layer_norm_rms_epsilon default=0 time=2025-10-17T05:51:09.590Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.rope.freq_base default=100000 time=2025-10-17T05:51:09.590Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.rope.scaling.factor default=1 time=2025-10-17T05:51:09.590Z level=DEBUG source=runner.go:1307 msg="dummy model load took" duration=90.876049ms time=2025-10-17T05:51:09.590Z level=DEBUG source=runner.go:1312 msg="gathering device infos took" duration=1.06µs time=2025-10-17T05:51:09.591Z level=TRACE source=runner.go:548 msg="runner enumerated devices" OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/rocm]" devices=[] time=2025-10-17T05:51:09.591Z level=DEBUG source=runner.go:451 msg="bootstrap discovery took" duration=114.323231ms OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/rocm]" extra_envs=[] time=2025-10-17T05:51:09.591Z level=DEBUG source=runner.go:118 msg="filtering out unsupported or overlapping GPU library combinations" count=0 time=2025-10-17T05:51:09.591Z level=TRACE source=runner.go:171 msg="supported GPU library combinations" supported=map[] time=2025-10-17T05:51:09.591Z level=DEBUG source=runner.go:45 msg="GPU bootstrap discovery took" duration=114.955066ms time=2025-10-17T05:51:09.591Z level=INFO source=types.go:129 msg="inference compute" id=cpu library=cpu compute="" name=cpu description=cpu libdirs=ollama driver="" pci_id="" type="" total="31.0 GiB" available="29.3 GiB" time=2025-10-17T05:51:09.591Z level=INFO source=routes.go:1605 msg="entering low vram mode" "total vram"="0 B" threshold="20.0 GiB" ```

GiteaMirror commented

2026-04-29 07:37:04 -05:00

@thk-socal commented on GitHub (Oct 17, 2025):

and here is 0.12.0-rocm working. The ONLY difference is the docker image I pull.

time=2025-10-17T05:53:02.542Z level=INFO source=routes.go:1466 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:DEBUG-4 OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES:GPU-44d96d7bf1798b3e http_proxy: https_proxy: no_proxy:]"
time=2025-10-17T05:53:02.547Z level=INFO source=images.go:518 msg="total blobs: 38"
time=2025-10-17T05:53:02.548Z level=INFO source=images.go:525 msg="total unused blobs removed: 0"
time=2025-10-17T05:53:02.549Z level=INFO source=routes.go:1519 msg="Listening on [::]:11434 (version 0.12.0)"
time=2025-10-17T05:53:02.549Z level=DEBUG source=sched.go:121 msg="starting llm scheduler"
time=2025-10-17T05:53:02.549Z level=INFO source=gpu.go:217 msg="looking for compatible GPUs"
time=2025-10-17T05:53:02.550Z level=DEBUG source=gpu.go:98 msg="searching for GPU discovery libraries for NVIDIA"
time=2025-10-17T05:53:02.550Z level=DEBUG source=gpu.go:520 msg="Searching for GPU library" name=libcuda.so*
time=2025-10-17T05:53:02.550Z level=DEBUG source=gpu.go:544 msg="gpu library search" globs="[/usr/lib/ollama/libcuda.so* /usr/local/nvidia/lib/libcuda.so* /usr/local/nvidia/lib64/libcuda.so* /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]"
time=2025-10-17T05:53:02.551Z level=DEBUG source=gpu.go:577 msg="discovered GPU libraries" paths=[]
time=2025-10-17T05:53:02.551Z level=DEBUG source=gpu.go:520 msg="Searching for GPU library" name=libcudart.so*
time=2025-10-17T05:53:02.551Z level=DEBUG source=gpu.go:544 msg="gpu library search" globs="[/usr/lib/ollama/libcudart.so* /usr/local/nvidia/lib/libcudart.so* /usr/local/nvidia/lib64/libcudart.so* /usr/lib/ollama/cuda_v*/libcudart.so* /usr/local/cuda/lib64/libcudart.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/x86_64-linux-gnu/libcudart.so* /usr/lib/wsl/lib/libcudart.so* /usr/lib/wsl/drivers/*/libcudart.so* /opt/cuda/lib64/libcudart.so* /usr/local/cuda*/targets/aarch64-linux/lib/libcudart.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/aarch64-linux-gnu/libcudart.so* /usr/local/cuda/lib*/libcudart.so* /usr/lib*/libcudart.so* /usr/local/lib*/libcudart.so*]"
time=2025-10-17T05:53:02.551Z level=DEBUG source=gpu.go:577 msg="discovered GPU libraries" paths=[]
time=2025-10-17T05:53:02.552Z level=DEBUG source=amd_linux.go:102 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/0/properties"
time=2025-10-17T05:53:02.552Z level=DEBUG source=amd_linux.go:122 msg="detected CPU /sys/class/kfd/kfd/topology/nodes/0/properties"
time=2025-10-17T05:53:02.552Z level=DEBUG source=amd_linux.go:102 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/1/properties"
time=2025-10-17T05:53:02.552Z level=DEBUG source=amd_linux.go:203 msg="mapping amdgpu to drm sysfs nodes" amdgpu=/sys/class/kfd/kfd/topology/nodes/1/properties vendor=4098 device=29772 unique_id=4961116843624991550
time=2025-10-17T05:53:02.552Z level=DEBUG source=amd_linux.go:237 msg=matched amdgpu=/sys/class/kfd/kfd/topology/nodes/1/properties drm=/sys/class/drm/card0/device
time=2025-10-17T05:53:02.552Z level=DEBUG source=amd_linux.go:343 msg="amdgpu memory" gpu=GPU-44d96d7bf1798b3e total="24.0 GiB"
time=2025-10-17T05:53:02.552Z level=DEBUG source=amd_linux.go:344 msg="amdgpu memory" gpu=GPU-44d96d7bf1798b3e available="24.0 GiB"
time=2025-10-17T05:53:02.552Z level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /usr/lib/ollama/rocm"
time=2025-10-17T05:53:02.553Z level=DEBUG source=amd_common.go:44 msg="detected ROCM next to ollama executable /usr/lib/ollama/rocm"
time=2025-10-17T05:53:02.556Z level=DEBUG source=amd_linux.go:375 msg="rocm supported GPUs" types="[gfx1010 gfx1012 gfx1030 gfx1100 gfx1101 gfx1102 gfx1151 gfx1200 gfx1201 gfx900 gfx906 gfx908 gfx90a gfx942]"
time=2025-10-17T05:53:02.556Z level=INFO source=amd_linux.go:390 msg="amdgpu is supported" gpu=GPU-44d96d7bf1798b3e gpu_type=gfx1100
time=2025-10-17T05:53:02.561Z level=INFO source=types.go:131 msg="inference compute" id=GPU-44d96d7bf1798b3e library=rocm variant="" compute=gfx1100 driver=6.14 name=1002:744c total="24.0 GiB" available="24.0 GiB"

@thk-socal commented on GitHub (Oct 17, 2025): and here is 0.12.0-rocm working. The ONLY difference is the docker image I pull. ``` time=2025-10-17T05:53:02.542Z level=INFO source=routes.go:1466 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:DEBUG-4 OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES:GPU-44d96d7bf1798b3e http_proxy: https_proxy: no_proxy:]" time=2025-10-17T05:53:02.547Z level=INFO source=images.go:518 msg="total blobs: 38" time=2025-10-17T05:53:02.548Z level=INFO source=images.go:525 msg="total unused blobs removed: 0" time=2025-10-17T05:53:02.549Z level=INFO source=routes.go:1519 msg="Listening on [::]:11434 (version 0.12.0)" time=2025-10-17T05:53:02.549Z level=DEBUG source=sched.go:121 msg="starting llm scheduler" time=2025-10-17T05:53:02.549Z level=INFO source=gpu.go:217 msg="looking for compatible GPUs" time=2025-10-17T05:53:02.550Z level=DEBUG source=gpu.go:98 msg="searching for GPU discovery libraries for NVIDIA" time=2025-10-17T05:53:02.550Z level=DEBUG source=gpu.go:520 msg="Searching for GPU library" name=libcuda.so* time=2025-10-17T05:53:02.550Z level=DEBUG source=gpu.go:544 msg="gpu library search" globs="[/usr/lib/ollama/libcuda.so* /usr/local/nvidia/lib/libcuda.so* /usr/local/nvidia/lib64/libcuda.so* /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]" time=2025-10-17T05:53:02.551Z level=DEBUG source=gpu.go:577 msg="discovered GPU libraries" paths=[] time=2025-10-17T05:53:02.551Z level=DEBUG source=gpu.go:520 msg="Searching for GPU library" name=libcudart.so* time=2025-10-17T05:53:02.551Z level=DEBUG source=gpu.go:544 msg="gpu library search" globs="[/usr/lib/ollama/libcudart.so* /usr/local/nvidia/lib/libcudart.so* /usr/local/nvidia/lib64/libcudart.so* /usr/lib/ollama/cuda_v*/libcudart.so* /usr/local/cuda/lib64/libcudart.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/x86_64-linux-gnu/libcudart.so* /usr/lib/wsl/lib/libcudart.so* /usr/lib/wsl/drivers/*/libcudart.so* /opt/cuda/lib64/libcudart.so* /usr/local/cuda*/targets/aarch64-linux/lib/libcudart.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/aarch64-linux-gnu/libcudart.so* /usr/local/cuda/lib*/libcudart.so* /usr/lib*/libcudart.so* /usr/local/lib*/libcudart.so*]" time=2025-10-17T05:53:02.551Z level=DEBUG source=gpu.go:577 msg="discovered GPU libraries" paths=[] time=2025-10-17T05:53:02.552Z level=DEBUG source=amd_linux.go:102 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/0/properties" time=2025-10-17T05:53:02.552Z level=DEBUG source=amd_linux.go:122 msg="detected CPU /sys/class/kfd/kfd/topology/nodes/0/properties" time=2025-10-17T05:53:02.552Z level=DEBUG source=amd_linux.go:102 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/1/properties" time=2025-10-17T05:53:02.552Z level=DEBUG source=amd_linux.go:203 msg="mapping amdgpu to drm sysfs nodes" amdgpu=/sys/class/kfd/kfd/topology/nodes/1/properties vendor=4098 device=29772 unique_id=4961116843624991550 time=2025-10-17T05:53:02.552Z level=DEBUG source=amd_linux.go:237 msg=matched amdgpu=/sys/class/kfd/kfd/topology/nodes/1/properties drm=/sys/class/drm/card0/device time=2025-10-17T05:53:02.552Z level=DEBUG source=amd_linux.go:343 msg="amdgpu memory" gpu=GPU-44d96d7bf1798b3e total="24.0 GiB" time=2025-10-17T05:53:02.552Z level=DEBUG source=amd_linux.go:344 msg="amdgpu memory" gpu=GPU-44d96d7bf1798b3e available="24.0 GiB" time=2025-10-17T05:53:02.552Z level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /usr/lib/ollama/rocm" time=2025-10-17T05:53:02.553Z level=DEBUG source=amd_common.go:44 msg="detected ROCM next to ollama executable /usr/lib/ollama/rocm" time=2025-10-17T05:53:02.556Z level=DEBUG source=amd_linux.go:375 msg="rocm supported GPUs" types="[gfx1010 gfx1012 gfx1030 gfx1100 gfx1101 gfx1102 gfx1151 gfx1200 gfx1201 gfx900 gfx906 gfx908 gfx90a gfx942]" time=2025-10-17T05:53:02.556Z level=INFO source=amd_linux.go:390 msg="amdgpu is supported" gpu=GPU-44d96d7bf1798b3e gpu_type=gfx1100 time=2025-10-17T05:53:02.561Z level=INFO source=types.go:131 msg="inference compute" id=GPU-44d96d7bf1798b3e library=rocm variant="" compute=gfx1100 driver=6.14 name=1002:744c total="24.0 GiB" available="24.0 GiB" ```

GiteaMirror commented

2026-04-29 07:37:04 -05:00

@jack-running commented on GitHub (Oct 17, 2025):

Actually we may need $env:OLLAMA_DEBUG="2" to see why the AMD GPU VRAM reporting is getting things wrong. The easiest way to get this is to quit Ollama from the system tray, then in a powershell terminal run:

$env:OLLAMA_DEBUG="2"
ollama serve 2>&1 | % ToString | tee-object serve.log
Then in another terminal run

ollama run gpt-oss:20b hello

My apologies for delay in my response.
The issue is still there even with Ollama 0.12.6
The startup logs:

PS C:\Users\jack> $env:OLLAMA_DEBUG="2"
PS C:\Users\jack> ollama serve 2>&1 | % ToString | tee-object serve.log
time=2025-10-17T14:35:00.654+02:00 level=INFO source=routes.go:1511 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:DEBUG-4 OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:C:\\Users\\jack\\.ollama\\models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES:]"
time=2025-10-17T14:35:00.662+02:00 level=INFO source=images.go:522 msg="total blobs: 80"
time=2025-10-17T14:35:00.665+02:00 level=INFO source=images.go:529 msg="total unused blobs removed: 0"
time=2025-10-17T14:35:00.666+02:00 level=INFO source=routes.go:1564 msg="Listening on 127.0.0.1:11434 (version 0.12.6)"
time=2025-10-17T14:35:00.667+02:00 level=DEBUG source=sched.go:123 msg="starting llm scheduler"
time=2025-10-17T14:35:00.667+02:00 level=INFO source=runner.go:80 msg="discovering available GPUs..."
time=2025-10-17T14:35:00.667+02:00 level=DEBUG source=runner.go:448 msg="spawning runner with" OLLAMA_LIBRARY_PATH="[C:\\develop\\ollama\\lib\\ollama C:\\develop\\ollama\\lib\\ollama\\cuda_v12]" extra_envs=[]
time=2025-10-17T14:35:00.671+02:00 level=TRACE source=runner.go:529 msg="starting runner for device discovery" env="[ALLUSERSPROFILE=C:\\ProgramData APPDATA=C:\\Users\\jack\\AppData\\Roaming CLIENTNAME=SK-5CD422DMZ6 CommonProgramFiles=C:\\Program Files\\Common Files CommonProgramFiles(x86)=C:\\Program Files (x86)\\Common Files CommonProgramW6432=C:\\Program Files\\Common Files COMPUTERNAME=WIN-UPOOAPUI2PC ComSpec=C:\\WINDOWS\\system32\\cmd.exe DriverData=C:\\Windows\\System32\\Drivers\\DriverData HIP_PATH=C:\\Program Files\\AMD\\ROCm\\6.4\\ HIP_PATH_64=C:\\Program Files\\AMD\\ROCm\\6.4\\ HOMEDRIVE=C: HOMEPATH=\\Users\\jack LOCALAPPDATA=C:\\Users\\jack\\AppData\\Local LOGONSERVER=\\\\WIN-UPOOAPUI2PC NUMBER_OF_PROCESSORS=16 OLLAMA_DEBUG=2 OneDrive=C:\\Users\\jack\\OneDrive OS=Windows_NT PATH=C:\\develop\\ollama\\lib\\ollama;C:\\develop\\ollama\\lib\\ollama;C:\\develop\\ollama\\lib\\ollama\\cuda_v12;C:\\develop\\ollama\\lib\\ollama\\rocm;C:\\WINDOWS\\system32;C:\\WINDOWS;C:\\WINDOWS\\System32\\Wbem;C:\\WINDOWS\\System32\\WindowsPowerShell\\v1.0\\;C:\\WINDOWS\\System32\\OpenSSH\\;C:\\Program Files\\PowerShell\\7\\;C:\\Users\\jack\\AppData\\Local\\Microsoft\\WindowsApps;;C:\\Users\\jack\\.lmstudio\\bin;C:\\applications\\Cinebench;;C:\\develop\\ollama PATHEXT=.COM;.EXE;.BAT;.CMD;.VBS;.VBE;.JS;.JSE;.WSF;.WSH;.MSC;.CPL POWERSHELL_DISTRIBUTION_CHANNEL=MSI:Windows 10 Pro PROCESSOR_ARCHITECTURE=AMD64 PROCESSOR_IDENTIFIER=AMD64 Family 25 Model 117 Stepping 2, AuthenticAMD PROCESSOR_LEVEL=25 PROCESSOR_REVISION=7502 ProgramData=C:\\ProgramData ProgramFiles=C:\\Program Files ProgramFiles(x86)=C:\\Program Files (x86) ProgramW6432=C:\\Program Files PSModulePath=C:\\Users\\jack\\Documents\\WindowsPowerShell\\Modules;C:\\Program Files\\WindowsPowerShell\\Modules;C:\\WINDOWS\\system32\\WindowsPowerShell\\v1.0\\Modules PUBLIC=C:\\Users\\Public SESSIONNAME=RDP-Tcp#0 SystemDrive=C: SystemRoot=C:\\WINDOWS TEMP=C:\\Users\\jack\\AppData\\Local\\Temp TMP=C:\\Users\\jack\\AppData\\Local\\Temp USERDOMAIN=WIN-UPOOAPUI2PC USERDOMAIN_ROAMINGPROFILE=WIN-UPOOAPUI2PC USERNAME=jack USERPROFILE=C:\\Users\\jack windir=C:\\WINDOWS WSLENV=WT_SESSION:WT_PROFILE_ID: WT_PROFILE_ID={61c54bbd-c2c6-5271-96e7-009a87ff44bf} WT_SESSION=f8e91e60-9199-47c5-9982-1702cfd36181 OLLAMA_LIBRARY_PATH=C:\\develop\\ollama\\lib\\ollama;C:\\develop\\ollama\\lib\\ollama\\cuda_v12]" cmd="C:\\develop\\ollama\\ollama.exe runner --ollama-engine --port 51872"
time=2025-10-17T14:35:00.696+02:00 level=INFO source=runner.go:1332 msg="starting ollama engine"
time=2025-10-17T14:35:00.697+02:00 level=INFO source=runner.go:1367 msg="Server listening on 127.0.0.1:51872"
time=2025-10-17T14:35:00.705+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32
time=2025-10-17T14:35:00.705+02:00 level=DEBUG source=gguf.go:578 msg=general.architecture type=string
time=2025-10-17T14:35:00.705+02:00 level=DEBUG source=gguf.go:578 msg=tokenizer.ggml.model type=string
time=2025-10-17T14:35:00.706+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32
time=2025-10-17T14:35:00.706+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.file_type default=0
time=2025-10-17T14:35:00.706+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.name default=""
time=2025-10-17T14:35:00.706+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.description default=""
time=2025-10-17T14:35:00.706+02:00 level=INFO source=ggml.go:134 msg="" architecture=llama file_type=unknown name="" description="" num_tensors=0 num_key_values=3
time=2025-10-17T14:35:00.706+02:00 level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=C:\develop\ollama\lib\ollama
load_backend: loaded CPU backend from C:\develop\ollama\lib\ollama\ggml-cpu-icelake.dll
time=2025-10-17T14:35:00.716+02:00 level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=C:\develop\ollama\lib\ollama\cuda_v12
time=2025-10-17T14:35:00.790+02:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.AVX512=1 CPU.0.AVX512_VBMI=1 CPU.0.AVX512_VNNI=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(clang)
time=2025-10-17T14:35:00.797+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.block_count default=0
time=2025-10-17T14:35:00.797+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.pooling_type default=0
time=2025-10-17T14:35:00.797+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.expert_count default=0
time=2025-10-17T14:35:00.797+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.tokens default="&{size:0 values:[]}"
time=2025-10-17T14:35:00.797+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.scores default="&{size:0 values:[]}"
time=2025-10-17T14:35:00.797+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.token_type default="&{size:0 values:[]}"
time=2025-10-17T14:35:00.797+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.merges default="&{size:0 values:[]}"
time=2025-10-17T14:35:00.797+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.add_bos_token default=true
time=2025-10-17T14:35:00.797+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.bos_token_id default=0
time=2025-10-17T14:35:00.797+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.add_eos_token default=false
time=2025-10-17T14:35:00.797+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.eos_token_id default=0
time=2025-10-17T14:35:00.797+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}"
time=2025-10-17T14:35:00.797+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.pre default=""
time=2025-10-17T14:35:00.797+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.block_count default=0
time=2025-10-17T14:35:00.797+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.embedding_length default=0
time=2025-10-17T14:35:00.797+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.attention.head_count default=0
time=2025-10-17T14:35:00.797+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.attention.head_count_kv default=0
time=2025-10-17T14:35:00.797+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.attention.key_length default=0
time=2025-10-17T14:35:00.797+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.rope.dimension_count default=0
time=2025-10-17T14:35:00.797+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.attention.layer_norm_rms_epsilon default=0
time=2025-10-17T14:35:00.797+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.rope.freq_base default=100000
time=2025-10-17T14:35:00.797+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.rope.scaling.factor default=1
time=2025-10-17T14:35:00.797+02:00 level=DEBUG source=runner.go:1307 msg="dummy model load took" duration=92.5632ms
time=2025-10-17T14:35:00.797+02:00 level=DEBUG source=runner.go:1312 msg="gathering device infos took" duration=0s
time=2025-10-17T14:35:00.798+02:00 level=TRACE source=runner.go:548 msg="runner enumerated devices" OLLAMA_LIBRARY_PATH="[C:\\develop\\ollama\\lib\\ollama C:\\develop\\ollama\\lib\\ollama\\cuda_v12]" devices=[]
time=2025-10-17T14:35:00.798+02:00 level=DEBUG source=runner.go:451 msg="bootstrap discovery took" duration=131.2594ms OLLAMA_LIBRARY_PATH="[C:\\develop\\ollama\\lib\\ollama C:\\develop\\ollama\\lib\\ollama\\cuda_v12]" extra_envs=[]
time=2025-10-17T14:35:00.798+02:00 level=DEBUG source=runner.go:448 msg="spawning runner with" OLLAMA_LIBRARY_PATH="[C:\\develop\\ollama\\lib\\ollama C:\\develop\\ollama\\lib\\ollama\\cuda_v13]" extra_envs=[]
time=2025-10-17T14:35:00.799+02:00 level=TRACE source=runner.go:529 msg="starting runner for device discovery" env="[ALLUSERSPROFILE=C:\\ProgramData APPDATA=C:\\Users\\jack\\AppData\\Roaming CLIENTNAME=SK-5CD422DMZ6 CommonProgramFiles=C:\\Program Files\\Common Files CommonProgramFiles(x86)=C:\\Program Files (x86)\\Common Files CommonProgramW6432=C:\\Program Files\\Common Files COMPUTERNAME=WIN-UPOOAPUI2PC ComSpec=C:\\WINDOWS\\system32\\cmd.exe DriverData=C:\\Windows\\System32\\Drivers\\DriverData HIP_PATH=C:\\Program Files\\AMD\\ROCm\\6.4\\ HIP_PATH_64=C:\\Program Files\\AMD\\ROCm\\6.4\\ HOMEDRIVE=C: HOMEPATH=\\Users\\jack LOCALAPPDATA=C:\\Users\\jack\\AppData\\Local LOGONSERVER=\\\\WIN-UPOOAPUI2PC NUMBER_OF_PROCESSORS=16 OLLAMA_DEBUG=2 OneDrive=C:\\Users\\jack\\OneDrive OS=Windows_NT PATH=C:\\develop\\ollama\\lib\\ollama;C:\\develop\\ollama\\lib\\ollama;C:\\develop\\ollama\\lib\\ollama\\cuda_v13;C:\\develop\\ollama\\lib\\ollama\\rocm;C:\\WINDOWS\\system32;C:\\WINDOWS;C:\\WINDOWS\\System32\\Wbem;C:\\WINDOWS\\System32\\WindowsPowerShell\\v1.0\\;C:\\WINDOWS\\System32\\OpenSSH\\;C:\\Program Files\\PowerShell\\7\\;C:\\Users\\jack\\AppData\\Local\\Microsoft\\WindowsApps;;C:\\Users\\jack\\.lmstudio\\bin;C:\\applications\\Cinebench;;C:\\develop\\ollama PATHEXT=.COM;.EXE;.BAT;.CMD;.VBS;.VBE;.JS;.JSE;.WSF;.WSH;.MSC;.CPL POWERSHELL_DISTRIBUTION_CHANNEL=MSI:Windows 10 Pro PROCESSOR_ARCHITECTURE=AMD64 PROCESSOR_IDENTIFIER=AMD64 Family 25 Model 117 Stepping 2, AuthenticAMD PROCESSOR_LEVEL=25 PROCESSOR_REVISION=7502 ProgramData=C:\\ProgramData ProgramFiles=C:\\Program Files ProgramFiles(x86)=C:\\Program Files (x86) ProgramW6432=C:\\Program Files PSModulePath=C:\\Users\\jack\\Documents\\WindowsPowerShell\\Modules;C:\\Program Files\\WindowsPowerShell\\Modules;C:\\WINDOWS\\system32\\WindowsPowerShell\\v1.0\\Modules PUBLIC=C:\\Users\\Public SESSIONNAME=RDP-Tcp#0 SystemDrive=C: SystemRoot=C:\\WINDOWS TEMP=C:\\Users\\jack\\AppData\\Local\\Temp TMP=C:\\Users\\jack\\AppData\\Local\\Temp USERDOMAIN=WIN-UPOOAPUI2PC USERDOMAIN_ROAMINGPROFILE=WIN-UPOOAPUI2PC USERNAME=jack USERPROFILE=C:\\Users\\jack windir=C:\\WINDOWS WSLENV=WT_SESSION:WT_PROFILE_ID: WT_PROFILE_ID={61c54bbd-c2c6-5271-96e7-009a87ff44bf} WT_SESSION=f8e91e60-9199-47c5-9982-1702cfd36181 OLLAMA_LIBRARY_PATH=C:\\develop\\ollama\\lib\\ollama;C:\\develop\\ollama\\lib\\ollama\\cuda_v13]" cmd="C:\\develop\\ollama\\ollama.exe runner --ollama-engine --port 51876"
time=2025-10-17T14:35:00.820+02:00 level=INFO source=runner.go:1332 msg="starting ollama engine"
time=2025-10-17T14:35:00.821+02:00 level=INFO source=runner.go:1367 msg="Server listening on 127.0.0.1:51876"
time=2025-10-17T14:35:00.823+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32
time=2025-10-17T14:35:00.823+02:00 level=DEBUG source=gguf.go:578 msg=general.architecture type=string
time=2025-10-17T14:35:00.823+02:00 level=DEBUG source=gguf.go:578 msg=tokenizer.ggml.model type=string
time=2025-10-17T14:35:00.824+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32
time=2025-10-17T14:35:00.824+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.file_type default=0
time=2025-10-17T14:35:00.824+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.name default=""
time=2025-10-17T14:35:00.824+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.description default=""
time=2025-10-17T14:35:00.824+02:00 level=INFO source=ggml.go:134 msg="" architecture=llama file_type=unknown name="" description="" num_tensors=0 num_key_values=3
time=2025-10-17T14:35:00.824+02:00 level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=C:\develop\ollama\lib\ollama
load_backend: loaded CPU backend from C:\develop\ollama\lib\ollama\ggml-cpu-icelake.dll
time=2025-10-17T14:35:00.834+02:00 level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=C:\develop\ollama\lib\ollama\cuda_v13
CUDA error: (null)
  current device: -1, in function ggml_backend_cuda_reg at C:\a\ollama\ollama\ml\backend\ggml\ggml\src\ggml-cuda\ggml-cuda.cu:4162
  cudaDriverGetVersion(&driverVersion)
C:\a\ollama\ollama\ml\backend\ggml\ggml\src\ggml-cuda\ggml-cuda.cu:88: CUDA error
time=2025-10-17T14:35:01.092+02:00 level=TRACE source=runner.go:543 msg="runner exited" OLLAMA_LIBRARY_PATH="[C:\\develop\\ollama\\lib\\ollama C:\\develop\\ollama\\lib\\ollama\\cuda_v13]" extra_envs=[] code=3221226505
time=2025-10-17T14:35:01.092+02:00 level=TRACE source=runner.go:548 msg="runner enumerated devices" OLLAMA_LIBRARY_PATH="[C:\\develop\\ollama\\lib\\ollama C:\\develop\\ollama\\lib\\ollama\\cuda_v13]" devices=[]
time=2025-10-17T14:35:01.092+02:00 level=DEBUG source=runner.go:451 msg="bootstrap discovery took" duration=294.4044ms OLLAMA_LIBRARY_PATH="[C:\\develop\\ollama\\lib\\ollama C:\\develop\\ollama\\lib\\ollama\\cuda_v13]" extra_envs=[]
time=2025-10-17T14:35:01.092+02:00 level=DEBUG source=runner.go:448 msg="spawning runner with" OLLAMA_LIBRARY_PATH="[C:\\develop\\ollama\\lib\\ollama C:\\develop\\ollama\\lib\\ollama\\rocm]" extra_envs=[]
time=2025-10-17T14:35:01.093+02:00 level=TRACE source=runner.go:529 msg="starting runner for device discovery" env="[ALLUSERSPROFILE=C:\\ProgramData APPDATA=C:\\Users\\jack\\AppData\\Roaming CLIENTNAME=SK-5CD422DMZ6 CommonProgramFiles=C:\\Program Files\\Common Files CommonProgramFiles(x86)=C:\\Program Files (x86)\\Common Files CommonProgramW6432=C:\\Program Files\\Common Files COMPUTERNAME=WIN-UPOOAPUI2PC ComSpec=C:\\WINDOWS\\system32\\cmd.exe DriverData=C:\\Windows\\System32\\Drivers\\DriverData HIP_PATH=C:\\Program Files\\AMD\\ROCm\\6.4\\ HIP_PATH_64=C:\\Program Files\\AMD\\ROCm\\6.4\\ HOMEDRIVE=C: HOMEPATH=\\Users\\jack LOCALAPPDATA=C:\\Users\\jack\\AppData\\Local LOGONSERVER=\\\\WIN-UPOOAPUI2PC NUMBER_OF_PROCESSORS=16 OLLAMA_DEBUG=2 OneDrive=C:\\Users\\jack\\OneDrive OS=Windows_NT PATH=C:\\develop\\ollama\\lib\\ollama;C:\\develop\\ollama\\lib\\ollama;C:\\develop\\ollama\\lib\\ollama\\rocm;C:\\develop\\ollama\\lib\\ollama\\rocm;C:\\WINDOWS\\system32;C:\\WINDOWS;C:\\WINDOWS\\System32\\Wbem;C:\\WINDOWS\\System32\\WindowsPowerShell\\v1.0\\;C:\\WINDOWS\\System32\\OpenSSH\\;C:\\Program Files\\PowerShell\\7\\;C:\\Users\\jack\\AppData\\Local\\Microsoft\\WindowsApps;;C:\\Users\\jack\\.lmstudio\\bin;C:\\applications\\Cinebench;;C:\\develop\\ollama PATHEXT=.COM;.EXE;.BAT;.CMD;.VBS;.VBE;.JS;.JSE;.WSF;.WSH;.MSC;.CPL POWERSHELL_DISTRIBUTION_CHANNEL=MSI:Windows 10 Pro PROCESSOR_ARCHITECTURE=AMD64 PROCESSOR_IDENTIFIER=AMD64 Family 25 Model 117 Stepping 2, AuthenticAMD PROCESSOR_LEVEL=25 PROCESSOR_REVISION=7502 ProgramData=C:\\ProgramData ProgramFiles=C:\\Program Files ProgramFiles(x86)=C:\\Program Files (x86) ProgramW6432=C:\\Program Files PSModulePath=C:\\Users\\jack\\Documents\\WindowsPowerShell\\Modules;C:\\Program Files\\WindowsPowerShell\\Modules;C:\\WINDOWS\\system32\\WindowsPowerShell\\v1.0\\Modules PUBLIC=C:\\Users\\Public SESSIONNAME=RDP-Tcp#0 SystemDrive=C: SystemRoot=C:\\WINDOWS TEMP=C:\\Users\\jack\\AppData\\Local\\Temp TMP=C:\\Users\\jack\\AppData\\Local\\Temp USERDOMAIN=WIN-UPOOAPUI2PC USERDOMAIN_ROAMINGPROFILE=WIN-UPOOAPUI2PC USERNAME=jack USERPROFILE=C:\\Users\\jack windir=C:\\WINDOWS WSLENV=WT_SESSION:WT_PROFILE_ID: WT_PROFILE_ID={61c54bbd-c2c6-5271-96e7-009a87ff44bf} WT_SESSION=f8e91e60-9199-47c5-9982-1702cfd36181 OLLAMA_LIBRARY_PATH=C:\\develop\\ollama\\lib\\ollama;C:\\develop\\ollama\\lib\\ollama\\rocm]" cmd="C:\\develop\\ollama\\ollama.exe runner --ollama-engine --port 51882"
time=2025-10-17T14:35:01.116+02:00 level=INFO source=runner.go:1332 msg="starting ollama engine"
time=2025-10-17T14:35:01.117+02:00 level=INFO source=runner.go:1367 msg="Server listening on 127.0.0.1:51882"
time=2025-10-17T14:35:01.127+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32
time=2025-10-17T14:35:01.127+02:00 level=DEBUG source=gguf.go:578 msg=general.architecture type=string
time=2025-10-17T14:35:01.127+02:00 level=DEBUG source=gguf.go:578 msg=tokenizer.ggml.model type=string
time=2025-10-17T14:35:01.128+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32
time=2025-10-17T14:35:01.129+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.file_type default=0
time=2025-10-17T14:35:01.129+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.name default=""
time=2025-10-17T14:35:01.129+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.description default=""
time=2025-10-17T14:35:01.129+02:00 level=INFO source=ggml.go:134 msg="" architecture=llama file_type=unknown name="" description="" num_tensors=0 num_key_values=3
time=2025-10-17T14:35:01.129+02:00 level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=C:\develop\ollama\lib\ollama
load_backend: loaded CPU backend from C:\develop\ollama\lib\ollama\ggml-cpu-icelake.dll
time=2025-10-17T14:35:01.140+02:00 level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=C:\develop\ollama\lib\ollama\rocm
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 2 ROCm devices:
  Device 0: AMD Radeon 780M Graphics, gfx1103 (0x1103), VMM: no, Wave Size: 32, ID: 0
  Device 1: AMD Radeon RX 7900 XTX, gfx1100 (0x1100), VMM: no, Wave Size: 32, ID: 1
load_backend: loaded ROCm backend from C:\develop\ollama\lib\ollama\rocm\ggml-hip.dll
time=2025-10-17T14:35:01.180+02:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.AVX512=1 CPU.0.AVX512_VBMI=1 CPU.0.AVX512_VNNI=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 ROCm.0.NO_VMM=1 ROCm.0.NO_PEER_COPY=1 ROCm.0.PEER_MAX_BATCH_SIZE=128 ROCm.1.NO_VMM=1 ROCm.1.NO_PEER_COPY=1 ROCm.1.PEER_MAX_BATCH_SIZE=128 compiler=cgo(clang)
time=2025-10-17T14:35:01.181+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.block_count default=0
time=2025-10-17T14:35:01.181+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.pooling_type default=0
time=2025-10-17T14:35:01.181+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.expert_count default=0
time=2025-10-17T14:35:01.181+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.tokens default="&{size:0 values:[]}"
time=2025-10-17T14:35:01.181+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.scores default="&{size:0 values:[]}"
time=2025-10-17T14:35:01.181+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.token_type default="&{size:0 values:[]}"
time=2025-10-17T14:35:01.181+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.merges default="&{size:0 values:[]}"
time=2025-10-17T14:35:01.181+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.add_bos_token default=true
time=2025-10-17T14:35:01.181+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.bos_token_id default=0
time=2025-10-17T14:35:01.181+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.add_eos_token default=false
time=2025-10-17T14:35:01.181+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.eos_token_id default=0
time=2025-10-17T14:35:01.181+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}"
time=2025-10-17T14:35:01.181+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.pre default=""
time=2025-10-17T14:35:01.181+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.block_count default=0
time=2025-10-17T14:35:01.181+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.embedding_length default=0
time=2025-10-17T14:35:01.181+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.attention.head_count default=0
time=2025-10-17T14:35:01.181+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.attention.head_count_kv default=0
time=2025-10-17T14:35:01.181+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.attention.key_length default=0
time=2025-10-17T14:35:01.181+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.rope.dimension_count default=0
time=2025-10-17T14:35:01.181+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.attention.layer_norm_rms_epsilon default=0
time=2025-10-17T14:35:01.181+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.rope.freq_base default=100000
time=2025-10-17T14:35:01.181+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.rope.scaling.factor default=1
time=2025-10-17T14:35:01.181+02:00 level=DEBUG source=runner.go:1307 msg="dummy model load took" duration=53.9027ms
ggml_hip_mgmt_init located ADLX version 1.4
ggml_backend_cuda_device_get_memory utilizing ADLX memory reporting free: 2129657856 total: 2147483648
ggml_hip_mgmt_init located ADLX version 1.4
ggml_hip_get_device_memory 0] GPU UniqueId: c800 does not match target 03 00
ggml_backend_cuda_device_get_memory utilizing ADLX memory reporting free: 25044189184 total: 25753026560
time=2025-10-17T14:35:01.518+02:00 level=DEBUG source=runner.go:1312 msg="gathering device infos took" duration=337.4036ms
time=2025-10-17T14:35:01.520+02:00 level=TRACE source=runner.go:548 msg="runner enumerated devices" OLLAMA_LIBRARY_PATH="[C:\\develop\\ollama\\lib\\ollama C:\\develop\\ollama\\lib\\ollama\\rocm]" devices="[{DeviceID:{ID:0 Library:ROCm} Name:ROCm0 Description:AMD Radeon 780M Graphics FilteredID: Integrated:true PCIID:c8:00.0 TotalMemory:2147483648 FreeMemory:2129657856 ComputeMajor:17 ComputeMinor:3 DriverMajor:60450 DriverMinor:10 LibraryPath:[C:\\develop\\ollama\\lib\\ollama C:\\develop\\ollama\\lib\\ollama\\rocm]} {DeviceID:{ID:1 Library:ROCm} Name:ROCm1 Description:AMD Radeon RX 7900 XTX FilteredID: Integrated:false PCIID:03:00.0 TotalMemory:25753026560 FreeMemory:25044189184 ComputeMajor:17 ComputeMinor:0 DriverMajor:60450 DriverMinor:10 LibraryPath:[C:\\develop\\ollama\\lib\\ollama C:\\develop\\ollama\\lib\\ollama\\rocm]}]"
time=2025-10-17T14:35:01.520+02:00 level=DEBUG source=runner.go:451 msg="bootstrap discovery took" duration=427.7923ms OLLAMA_LIBRARY_PATH="[C:\\develop\\ollama\\lib\\ollama C:\\develop\\ollama\\lib\\ollama\\rocm]" extra_envs=[]
time=2025-10-17T14:35:01.520+02:00 level=DEBUG source=runner.go:118 msg="filtering out unsupported or overlapping GPU library combinations" count=2
time=2025-10-17T14:35:01.520+02:00 level=DEBUG source=runner.go:130 msg="verifying GPU is supported" library=C:\develop\ollama\lib\ollama\rocm description="AMD Radeon 780M Graphics" compute=gfx1103 pci_id=c8:00.0
time=2025-10-17T14:35:01.520+02:00 level=DEBUG source=runner.go:130 msg="verifying GPU is supported" library=C:\develop\ollama\lib\ollama\rocm description="AMD Radeon RX 7900 XTX" compute=gfx1100 pci_id=03:00.0
time=2025-10-17T14:35:01.520+02:00 level=DEBUG source=runner.go:448 msg="spawning runner with" OLLAMA_LIBRARY_PATH="[C:\\develop\\ollama\\lib\\ollama C:\\develop\\ollama\\lib\\ollama\\rocm]" extra_envs="[GGML_CUDA_INIT=1 HIP_VISIBLE_DEVICES=1]"
time=2025-10-17T14:35:01.520+02:00 level=DEBUG source=runner.go:448 msg="spawning runner with" OLLAMA_LIBRARY_PATH="[C:\\develop\\ollama\\lib\\ollama C:\\develop\\ollama\\lib\\ollama\\rocm]" extra_envs="[GGML_CUDA_INIT=1 HIP_VISIBLE_DEVICES=0]"
time=2025-10-17T14:35:01.521+02:00 level=TRACE source=runner.go:529 msg="starting runner for device discovery" env="[ALLUSERSPROFILE=C:\\ProgramData APPDATA=C:\\Users\\jack\\AppData\\Roaming CLIENTNAME=SK-5CD422DMZ6 CommonProgramFiles=C:\\Program Files\\Common Files CommonProgramFiles(x86)=C:\\Program Files (x86)\\Common Files CommonProgramW6432=C:\\Program Files\\Common Files COMPUTERNAME=WIN-UPOOAPUI2PC ComSpec=C:\\WINDOWS\\system32\\cmd.exe DriverData=C:\\Windows\\System32\\Drivers\\DriverData HIP_PATH=C:\\Program Files\\AMD\\ROCm\\6.4\\ HIP_PATH_64=C:\\Program Files\\AMD\\ROCm\\6.4\\ HOMEDRIVE=C: HOMEPATH=\\Users\\jack LOCALAPPDATA=C:\\Users\\jack\\AppData\\Local LOGONSERVER=\\\\WIN-UPOOAPUI2PC NUMBER_OF_PROCESSORS=16 OLLAMA_DEBUG=2 OneDrive=C:\\Users\\jack\\OneDrive OS=Windows_NT PATH=C:\\develop\\ollama\\lib\\ollama;C:\\develop\\ollama\\lib\\ollama;C:\\develop\\ollama\\lib\\ollama\\rocm;C:\\develop\\ollama\\lib\\ollama\\rocm;C:\\WINDOWS\\system32;C:\\WINDOWS;C:\\WINDOWS\\System32\\Wbem;C:\\WINDOWS\\System32\\WindowsPowerShell\\v1.0\\;C:\\WINDOWS\\System32\\OpenSSH\\;C:\\Program Files\\PowerShell\\7\\;C:\\Users\\jack\\AppData\\Local\\Microsoft\\WindowsApps;;C:\\Users\\jack\\.lmstudio\\bin;C:\\applications\\Cinebench;;C:\\develop\\ollama PATHEXT=.COM;.EXE;.BAT;.CMD;.VBS;.VBE;.JS;.JSE;.WSF;.WSH;.MSC;.CPL POWERSHELL_DISTRIBUTION_CHANNEL=MSI:Windows 10 Pro PROCESSOR_ARCHITECTURE=AMD64 PROCESSOR_IDENTIFIER=AMD64 Family 25 Model 117 Stepping 2, AuthenticAMD PROCESSOR_LEVEL=25 PROCESSOR_REVISION=7502 ProgramData=C:\\ProgramData ProgramFiles=C:\\Program Files ProgramFiles(x86)=C:\\Program Files (x86) ProgramW6432=C:\\Program Files PSModulePath=C:\\Users\\jack\\Documents\\WindowsPowerShell\\Modules;C:\\Program Files\\WindowsPowerShell\\Modules;C:\\WINDOWS\\system32\\WindowsPowerShell\\v1.0\\Modules PUBLIC=C:\\Users\\Public SESSIONNAME=RDP-Tcp#0 SystemDrive=C: SystemRoot=C:\\WINDOWS TEMP=C:\\Users\\jack\\AppData\\Local\\Temp TMP=C:\\Users\\jack\\AppData\\Local\\Temp USERDOMAIN=WIN-UPOOAPUI2PC USERDOMAIN_ROAMINGPROFILE=WIN-UPOOAPUI2PC USERNAME=jack USERPROFILE=C:\\Users\\jack windir=C:\\WINDOWS WSLENV=WT_SESSION:WT_PROFILE_ID: WT_PROFILE_ID={61c54bbd-c2c6-5271-96e7-009a87ff44bf} WT_SESSION=f8e91e60-9199-47c5-9982-1702cfd36181 OLLAMA_LIBRARY_PATH=C:\\develop\\ollama\\lib\\ollama;C:\\develop\\ollama\\lib\\ollama\\rocm GGML_CUDA_INIT=1 HIP_VISIBLE_DEVICES=0]" cmd="C:\\develop\\ollama\\ollama.exe runner --ollama-engine --port 51886"
time=2025-10-17T14:35:01.521+02:00 level=TRACE source=runner.go:529 msg="starting runner for device discovery" env="[ALLUSERSPROFILE=C:\\ProgramData APPDATA=C:\\Users\\jack\\AppData\\Roaming CLIENTNAME=SK-5CD422DMZ6 CommonProgramFiles=C:\\Program Files\\Common Files CommonProgramFiles(x86)=C:\\Program Files (x86)\\Common Files CommonProgramW6432=C:\\Program Files\\Common Files COMPUTERNAME=WIN-UPOOAPUI2PC ComSpec=C:\\WINDOWS\\system32\\cmd.exe DriverData=C:\\Windows\\System32\\Drivers\\DriverData HIP_PATH=C:\\Program Files\\AMD\\ROCm\\6.4\\ HIP_PATH_64=C:\\Program Files\\AMD\\ROCm\\6.4\\ HOMEDRIVE=C: HOMEPATH=\\Users\\jack LOCALAPPDATA=C:\\Users\\jack\\AppData\\Local LOGONSERVER=\\\\WIN-UPOOAPUI2PC NUMBER_OF_PROCESSORS=16 OLLAMA_DEBUG=2 OneDrive=C:\\Users\\jack\\OneDrive OS=Windows_NT PATH=C:\\develop\\ollama\\lib\\ollama;C:\\develop\\ollama\\lib\\ollama;C:\\develop\\ollama\\lib\\ollama\\rocm;C:\\develop\\ollama\\lib\\ollama\\rocm;C:\\WINDOWS\\system32;C:\\WINDOWS;C:\\WINDOWS\\System32\\Wbem;C:\\WINDOWS\\System32\\WindowsPowerShell\\v1.0\\;C:\\WINDOWS\\System32\\OpenSSH\\;C:\\Program Files\\PowerShell\\7\\;C:\\Users\\jack\\AppData\\Local\\Microsoft\\WindowsApps;;C:\\Users\\jack\\.lmstudio\\bin;C:\\applications\\Cinebench;;C:\\develop\\ollama PATHEXT=.COM;.EXE;.BAT;.CMD;.VBS;.VBE;.JS;.JSE;.WSF;.WSH;.MSC;.CPL POWERSHELL_DISTRIBUTION_CHANNEL=MSI:Windows 10 Pro PROCESSOR_ARCHITECTURE=AMD64 PROCESSOR_IDENTIFIER=AMD64 Family 25 Model 117 Stepping 2, AuthenticAMD PROCESSOR_LEVEL=25 PROCESSOR_REVISION=7502 ProgramData=C:\\ProgramData ProgramFiles=C:\\Program Files ProgramFiles(x86)=C:\\Program Files (x86) ProgramW6432=C:\\Program Files PSModulePath=C:\\Users\\jack\\Documents\\WindowsPowerShell\\Modules;C:\\Program Files\\WindowsPowerShell\\Modules;C:\\WINDOWS\\system32\\WindowsPowerShell\\v1.0\\Modules PUBLIC=C:\\Users\\Public SESSIONNAME=RDP-Tcp#0 SystemDrive=C: SystemRoot=C:\\WINDOWS TEMP=C:\\Users\\jack\\AppData\\Local\\Temp TMP=C:\\Users\\jack\\AppData\\Local\\Temp USERDOMAIN=WIN-UPOOAPUI2PC USERDOMAIN_ROAMINGPROFILE=WIN-UPOOAPUI2PC USERNAME=jack USERPROFILE=C:\\Users\\jack windir=C:\\WINDOWS WSLENV=WT_SESSION:WT_PROFILE_ID: WT_PROFILE_ID={61c54bbd-c2c6-5271-96e7-009a87ff44bf} WT_SESSION=f8e91e60-9199-47c5-9982-1702cfd36181 OLLAMA_LIBRARY_PATH=C:\\develop\\ollama\\lib\\ollama;C:\\develop\\ollama\\lib\\ollama\\rocm GGML_CUDA_INIT=1 HIP_VISIBLE_DEVICES=1]" cmd="C:\\develop\\ollama\\ollama.exe runner --ollama-engine --port 51887"
time=2025-10-17T14:35:01.544+02:00 level=INFO source=runner.go:1332 msg="starting ollama engine"
time=2025-10-17T14:35:01.544+02:00 level=INFO source=runner.go:1332 msg="starting ollama engine"
time=2025-10-17T14:35:01.545+02:00 level=INFO source=runner.go:1367 msg="Server listening on 127.0.0.1:51887"
time=2025-10-17T14:35:01.545+02:00 level=INFO source=runner.go:1367 msg="Server listening on 127.0.0.1:51886"
time=2025-10-17T14:35:01.554+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32
time=2025-10-17T14:35:01.554+02:00 level=DEBUG source=gguf.go:578 msg=general.architecture type=string
time=2025-10-17T14:35:01.554+02:00 level=DEBUG source=gguf.go:578 msg=tokenizer.ggml.model type=string
time=2025-10-17T14:35:01.555+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32
time=2025-10-17T14:35:01.555+02:00 level=DEBUG source=gguf.go:578 msg=general.architecture type=string
time=2025-10-17T14:35:01.555+02:00 level=DEBUG source=gguf.go:578 msg=tokenizer.ggml.model type=string
time=2025-10-17T14:35:01.555+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32
time=2025-10-17T14:35:01.555+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.file_type default=0
time=2025-10-17T14:35:01.555+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.name default=""
time=2025-10-17T14:35:01.555+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.description default=""
time=2025-10-17T14:35:01.555+02:00 level=INFO source=ggml.go:134 msg="" architecture=llama file_type=unknown name="" description="" num_tensors=0 num_key_values=3
time=2025-10-17T14:35:01.555+02:00 level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=C:\develop\ollama\lib\ollama
time=2025-10-17T14:35:01.555+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32
time=2025-10-17T14:35:01.555+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.file_type default=0
time=2025-10-17T14:35:01.555+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.name default=""
time=2025-10-17T14:35:01.555+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.description default=""
time=2025-10-17T14:35:01.555+02:00 level=INFO source=ggml.go:134 msg="" architecture=llama file_type=unknown name="" description="" num_tensors=0 num_key_values=3
time=2025-10-17T14:35:01.555+02:00 level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=C:\develop\ollama\lib\ollama
load_backend: loaded CPU backend from C:\develop\ollama\lib\ollama\ggml-cpu-icelake.dll
load_backend: loaded CPU backend from C:\develop\ollama\lib\ollama\ggml-cpu-icelake.dll
time=2025-10-17T14:35:01.566+02:00 level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=C:\develop\ollama\lib\ollama\rocm
time=2025-10-17T14:35:01.566+02:00 level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=C:\develop\ollama\lib\ollama\rocm
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
ggml_cuda_init: initializing rocBLAS on device 0
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
ggml_cuda_init: initializing rocBLAS on device 0
System.Management.Automation.RemoteException
rocBLAS error: Cannot read C:\develop\ollama\lib\ollama\rocm\/rocblas/library/TensileLibrary.dat: No such file or directory for GPU arch : gfx1103
 List of available TensileLibrary Files :
time=2025-10-17T14:35:01.663+02:00 level=TRACE source=runner.go:543 msg="runner exited" OLLAMA_LIBRARY_PATH="[C:\\develop\\ollama\\lib\\ollama C:\\develop\\ollama\\lib\\ollama\\rocm]" extra_envs="[GGML_CUDA_INIT=1 HIP_VISIBLE_DEVICES=0]" code=3221226505
time=2025-10-17T14:35:01.663+02:00 level=TRACE source=runner.go:548 msg="runner enumerated devices" OLLAMA_LIBRARY_PATH="[C:\\develop\\ollama\\lib\\ollama C:\\develop\\ollama\\lib\\ollama\\rocm]" devices=[]
time=2025-10-17T14:35:01.663+02:00 level=DEBUG source=runner.go:451 msg="bootstrap discovery took" duration=143.3038ms OLLAMA_LIBRARY_PATH="[C:\\develop\\ollama\\lib\\ollama C:\\develop\\ollama\\lib\\ollama\\rocm]" extra_envs="[GGML_CUDA_INIT=1 HIP_VISIBLE_DEVICES=0]"
ggml_cuda_init: rocBLAS initialized on device 0
  Device 0: AMD Radeon RX 7900 XTX, gfx1100 (0x1100), VMM: no, Wave Size: 32, ID: 0
load_backend: loaded ROCm backend from C:\develop\ollama\lib\ollama\rocm\ggml-hip.dll
time=2025-10-17T14:35:03.658+02:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.AVX512=1 CPU.0.AVX512_VBMI=1 CPU.0.AVX512_VNNI=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 ROCm.0.NO_VMM=1 ROCm.0.NO_PEER_COPY=1 ROCm.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(clang)
time=2025-10-17T14:35:03.659+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.block_count default=0
time=2025-10-17T14:35:03.659+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.pooling_type default=0
time=2025-10-17T14:35:03.659+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.expert_count default=0
time=2025-10-17T14:35:03.659+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.tokens default="&{size:0 values:[]}"
time=2025-10-17T14:35:03.659+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.scores default="&{size:0 values:[]}"
time=2025-10-17T14:35:03.659+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.token_type default="&{size:0 values:[]}"
time=2025-10-17T14:35:03.659+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.merges default="&{size:0 values:[]}"
time=2025-10-17T14:35:03.659+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.add_bos_token default=true
time=2025-10-17T14:35:03.659+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.bos_token_id default=0
time=2025-10-17T14:35:03.659+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.add_eos_token default=false
time=2025-10-17T14:35:03.659+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.eos_token_id default=0
time=2025-10-17T14:35:03.659+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}"
time=2025-10-17T14:35:03.659+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.pre default=""
time=2025-10-17T14:35:03.659+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.block_count default=0
time=2025-10-17T14:35:03.659+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.embedding_length default=0
time=2025-10-17T14:35:03.659+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.attention.head_count default=0
time=2025-10-17T14:35:03.659+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.attention.head_count_kv default=0
time=2025-10-17T14:35:03.659+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.attention.key_length default=0
time=2025-10-17T14:35:03.659+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.rope.dimension_count default=0
time=2025-10-17T14:35:03.659+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.attention.layer_norm_rms_epsilon default=0
time=2025-10-17T14:35:03.659+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.rope.freq_base default=100000
time=2025-10-17T14:35:03.659+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.rope.scaling.factor default=1
time=2025-10-17T14:35:03.659+02:00 level=DEBUG source=runner.go:1307 msg="dummy model load took" duration=2.1046517s
ggml_hip_mgmt_init located ADLX version 1.4
ggml_hip_get_device_memory 0] GPU UniqueId: c800 does not match target 03 00
ggml_backend_cuda_device_get_memory utilizing ADLX memory reporting free: 24657264640 total: 25753026560
time=2025-10-17T14:35:03.862+02:00 level=DEBUG source=runner.go:1312 msg="gathering device infos took" duration=202.9503ms
time=2025-10-17T14:35:03.862+02:00 level=TRACE source=runner.go:548 msg="runner enumerated devices" OLLAMA_LIBRARY_PATH="[C:\\develop\\ollama\\lib\\ollama C:\\develop\\ollama\\lib\\ollama\\rocm]" devices="[{DeviceID:{ID:0 Library:ROCm} Name:ROCm0 Description:AMD Radeon RX 7900 XTX FilteredID: Integrated:false PCIID:03:00.0 TotalMemory:25753026560 FreeMemory:24657264640 ComputeMajor:17 ComputeMinor:0 DriverMajor:60450 DriverMinor:10 LibraryPath:[C:\\develop\\ollama\\lib\\ollama C:\\develop\\ollama\\lib\\ollama\\rocm]}]"
time=2025-10-17T14:35:03.863+02:00 level=DEBUG source=runner.go:451 msg="bootstrap discovery took" duration=2.3425196s OLLAMA_LIBRARY_PATH="[C:\\develop\\ollama\\lib\\ollama C:\\develop\\ollama\\lib\\ollama\\rocm]" extra_envs="[GGML_CUDA_INIT=1 HIP_VISIBLE_DEVICES=1]"
time=2025-10-17T14:35:03.863+02:00 level=TRACE source=runner.go:171 msg="supported GPU library combinations" supported=map[ROCm:map[C:\develop\ollama\lib\ollama\rocm:map[1:1]]]
time=2025-10-17T14:35:03.863+02:00 level=TRACE source=runner.go:182 msg="removing unsupported or overlapping GPU combination" libDir=C:\develop\ollama\lib\ollama\rocm description="AMD Radeon 780M Graphics" compute=gfx1103 pci_id=c8:00.0
time=2025-10-17T14:35:03.863+02:00 level=DEBUG source=runner.go:45 msg="GPU bootstrap discovery took" duration=3.1959757s
time=2025-10-17T14:35:03.863+02:00 level=INFO source=types.go:112 msg="inference compute" id=0 library=ROCm compute=gfx1100 name=ROCm1 description="AMD Radeon RX 7900 XTX" libdirs=ollama,rocm driver=60450.10 pci_id=03:00.0 type=discrete total="24.0 GiB" available="23.3 GiB"

and then the logs running the inference:

[GIN] 2025/10/17 - 14:42:55 | 200 |            0s |       127.0.0.1 | HEAD     "/"
time=2025-10-17T14:42:55.231+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32
[GIN] 2025/10/17 - 14:42:55 | 200 |     73.6745ms |       127.0.0.1 | POST     "/api/show"
time=2025-10-17T14:42:55.355+02:00 level=DEBUG source=runner.go:259 msg="refreshing free memory"
time=2025-10-17T14:42:55.355+02:00 level=DEBUG source=runner.go:323 msg="unable to refresh all GPUs with existing runners, performing bootstrap discovery"
time=2025-10-17T14:42:55.355+02:00 level=DEBUG source=runner.go:448 msg="spawning runner with" OLLAMA_LIBRARY_PATH="[C:\\develop\\ollama\\lib\\ollama C:\\develop\\ollama\\lib\\ollama\\rocm]" extra_envs=[]
time=2025-10-17T14:42:55.363+02:00 level=TRACE source=runner.go:529 msg="starting runner for device discovery" env="[ALLUSERSPROFILE=C:\\ProgramData APPDATA=C:\\Users\\jack\\AppData\\Roaming CLIENTNAME=SK-5CD422DMZ6 CommonProgramFiles=C:\\Program Files\\Common Files CommonProgramFiles(x86)=C:\\Program Files (x86)\\Common Files CommonProgramW6432=C:\\Program Files\\Common Files COMPUTERNAME=WIN-UPOOAPUI2PC ComSpec=C:\\WINDOWS\\system32\\cmd.exe DriverData=C:\\Windows\\System32\\Drivers\\DriverData HIP_PATH=C:\\Program Files\\AMD\\ROCm\\6.4\\ HIP_PATH_64=C:\\Program Files\\AMD\\ROCm\\6.4\\ HOMEDRIVE=C: HOMEPATH=\\Users\\jack LOCALAPPDATA=C:\\Users\\jack\\AppData\\Local LOGONSERVER=\\\\WIN-UPOOAPUI2PC NUMBER_OF_PROCESSORS=16 OLLAMA_DEBUG=2 OneDrive=C:\\Users\\jack\\OneDrive OS=Windows_NT PATH=C:\\develop\\ollama\\lib\\ollama;C:\\develop\\ollama\\lib\\ollama;C:\\develop\\ollama\\lib\\ollama\\rocm;C:\\develop\\ollama\\lib\\ollama\\rocm;C:\\WINDOWS\\system32;C:\\WINDOWS;C:\\WINDOWS\\System32\\Wbem;C:\\WINDOWS\\System32\\WindowsPowerShell\\v1.0\\;C:\\WINDOWS\\System32\\OpenSSH\\;C:\\Program Files\\PowerShell\\7\\;C:\\Users\\jack\\AppData\\Local\\Microsoft\\WindowsApps;;C:\\Users\\jack\\.lmstudio\\bin;C:\\applications\\Cinebench;;C:\\develop\\ollama PATHEXT=.COM;.EXE;.BAT;.CMD;.VBS;.VBE;.JS;.JSE;.WSF;.WSH;.MSC;.CPL POWERSHELL_DISTRIBUTION_CHANNEL=MSI:Windows 10 Pro PROCESSOR_ARCHITECTURE=AMD64 PROCESSOR_IDENTIFIER=AMD64 Family 25 Model 117 Stepping 2, AuthenticAMD PROCESSOR_LEVEL=25 PROCESSOR_REVISION=7502 ProgramData=C:\\ProgramData ProgramFiles=C:\\Program Files ProgramFiles(x86)=C:\\Program Files (x86) ProgramW6432=C:\\Program Files PSModulePath=C:\\Users\\jack\\Documents\\WindowsPowerShell\\Modules;C:\\Program Files\\WindowsPowerShell\\Modules;C:\\WINDOWS\\system32\\WindowsPowerShell\\v1.0\\Modules PUBLIC=C:\\Users\\Public SESSIONNAME=RDP-Tcp#0 SystemDrive=C: SystemRoot=C:\\WINDOWS TEMP=C:\\Users\\jack\\AppData\\Local\\Temp TMP=C:\\Users\\jack\\AppData\\Local\\Temp USERDOMAIN=WIN-UPOOAPUI2PC USERDOMAIN_ROAMINGPROFILE=WIN-UPOOAPUI2PC USERNAME=jack USERPROFILE=C:\\Users\\jack windir=C:\\WINDOWS WSLENV=WT_SESSION:WT_PROFILE_ID: WT_PROFILE_ID={61c54bbd-c2c6-5271-96e7-009a87ff44bf} WT_SESSION=f8e91e60-9199-47c5-9982-1702cfd36181 OLLAMA_LIBRARY_PATH=C:\\develop\\ollama\\lib\\ollama;C:\\develop\\ollama\\lib\\ollama\\rocm]" cmd="C:\\develop\\ollama\\ollama.exe runner --ollama-engine --port 51584"
time=2025-10-17T14:42:55.384+02:00 level=INFO source=runner.go:1332 msg="starting ollama engine"
time=2025-10-17T14:42:55.385+02:00 level=INFO source=runner.go:1367 msg="Server listening on 127.0.0.1:51584"
time=2025-10-17T14:42:55.387+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32
time=2025-10-17T14:42:55.387+02:00 level=DEBUG source=gguf.go:578 msg=general.architecture type=string
time=2025-10-17T14:42:55.387+02:00 level=DEBUG source=gguf.go:578 msg=tokenizer.ggml.model type=string
time=2025-10-17T14:42:55.387+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32
time=2025-10-17T14:42:55.387+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.file_type default=0
time=2025-10-17T14:42:55.388+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.name default=""
time=2025-10-17T14:42:55.388+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.description default=""
time=2025-10-17T14:42:55.388+02:00 level=INFO source=ggml.go:134 msg="" architecture=llama file_type=unknown name="" description="" num_tensors=0 num_key_values=3
time=2025-10-17T14:42:55.388+02:00 level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=C:\develop\ollama\lib\ollama
load_backend: loaded CPU backend from C:\develop\ollama\lib\ollama\ggml-cpu-icelake.dll
time=2025-10-17T14:42:55.398+02:00 level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=C:\develop\ollama\lib\ollama\rocm
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 2 ROCm devices:
  Device 0: AMD Radeon 780M Graphics, gfx1103 (0x1103), VMM: no, Wave Size: 32, ID: 0
  Device 1: AMD Radeon RX 7900 XTX, gfx1100 (0x1100), VMM: no, Wave Size: 32, ID: 1
load_backend: loaded ROCm backend from C:\develop\ollama\lib\ollama\rocm\ggml-hip.dll
time=2025-10-17T14:42:55.437+02:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.AVX512=1 CPU.0.AVX512_VBMI=1 CPU.0.AVX512_VNNI=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 ROCm.0.NO_VMM=1 ROCm.0.NO_PEER_COPY=1 ROCm.0.PEER_MAX_BATCH_SIZE=128 ROCm.1.NO_VMM=1 ROCm.1.NO_PEER_COPY=1 ROCm.1.PEER_MAX_BATCH_SIZE=128 compiler=cgo(clang)
time=2025-10-17T14:42:55.438+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.block_count default=0
time=2025-10-17T14:42:55.438+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.pooling_type default=0
time=2025-10-17T14:42:55.438+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.expert_count default=0
time=2025-10-17T14:42:55.438+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.tokens default="&{size:0 values:[]}"
time=2025-10-17T14:42:55.438+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.scores default="&{size:0 values:[]}"
time=2025-10-17T14:42:55.438+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.token_type default="&{size:0 values:[]}"
time=2025-10-17T14:42:55.438+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.merges default="&{size:0 values:[]}"
time=2025-10-17T14:42:55.438+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.add_bos_token default=true
time=2025-10-17T14:42:55.438+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.bos_token_id default=0
time=2025-10-17T14:42:55.438+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.add_eos_token default=false
time=2025-10-17T14:42:55.438+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.eos_token_id default=0
time=2025-10-17T14:42:55.438+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}"
time=2025-10-17T14:42:55.438+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.pre default=""
time=2025-10-17T14:42:55.438+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.block_count default=0
time=2025-10-17T14:42:55.438+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.embedding_length default=0
time=2025-10-17T14:42:55.438+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.attention.head_count default=0
time=2025-10-17T14:42:55.438+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.attention.head_count_kv default=0
time=2025-10-17T14:42:55.438+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.attention.key_length default=0
time=2025-10-17T14:42:55.438+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.rope.dimension_count default=0
time=2025-10-17T14:42:55.438+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.attention.layer_norm_rms_epsilon default=0
time=2025-10-17T14:42:55.438+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.rope.freq_base default=100000
time=2025-10-17T14:42:55.438+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.rope.scaling.factor default=1
time=2025-10-17T14:42:55.438+02:00 level=DEBUG source=runner.go:1307 msg="dummy model load took" duration=51.2401ms
ggml_hip_mgmt_init located ADLX version 1.4
ggml_backend_cuda_device_get_memory utilizing ADLX memory reporting free: 2129657856 total: 2147483648
ggml_hip_mgmt_init located ADLX version 1.4
ggml_hip_get_device_memory 0] GPU UniqueId: c800 does not match target 03 00
ggml_backend_cuda_device_get_memory utilizing ADLX memory reporting free: 24987566080 total: 25753026560
time=2025-10-17T14:42:55.772+02:00 level=DEBUG source=runner.go:1312 msg="gathering device infos took" duration=334.968ms
time=2025-10-17T14:42:55.774+02:00 level=TRACE source=runner.go:548 msg="runner enumerated devices" OLLAMA_LIBRARY_PATH="[C:\\develop\\ollama\\lib\\ollama C:\\develop\\ollama\\lib\\ollama\\rocm]" devices="[{DeviceID:{ID:0 Library:ROCm} Name:ROCm0 Description:AMD Radeon 780M Graphics FilteredID: Integrated:true PCIID:c8:00.0 TotalMemory:2147483648 FreeMemory:2129657856 ComputeMajor:17 ComputeMinor:3 DriverMajor:60450 DriverMinor:10 LibraryPath:[C:\\develop\\ollama\\lib\\ollama C:\\develop\\ollama\\lib\\ollama\\rocm]} {DeviceID:{ID:1 Library:ROCm} Name:ROCm1 Description:AMD Radeon RX 7900 XTX FilteredID: Integrated:false PCIID:03:00.0 TotalMemory:25753026560 FreeMemory:24987566080 ComputeMajor:17 ComputeMinor:0 DriverMajor:60450 DriverMinor:10 LibraryPath:[C:\\develop\\ollama\\lib\\ollama C:\\develop\\ollama\\lib\\ollama\\rocm]}]"
time=2025-10-17T14:42:55.774+02:00 level=DEBUG source=runner.go:451 msg="bootstrap discovery took" duration=419.13ms OLLAMA_LIBRARY_PATH="[C:\\develop\\ollama\\lib\\ollama C:\\develop\\ollama\\lib\\ollama\\rocm]" extra_envs=[]
time=2025-10-17T14:42:55.775+02:00 level=DEBUG source=runner.go:45 msg="overall device VRAM discovery took" duration=419.867ms
time=2025-10-17T14:42:55.775+02:00 level=DEBUG source=sched.go:195 msg="updating default concurrency" OLLAMA_MAX_LOADED_MODELS=3 gpu_count=1
time=2025-10-17T14:42:55.794+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32
time=2025-10-17T14:42:55.795+02:00 level=DEBUG source=sched.go:215 msg="loading first model" model=C:\Users\jack\.ollama\models\blobs\sha256-e7b273f9636059a689e3ddcab3716e4f65abe0143ac978e46673ad0e52d09efb
time=2025-10-17T14:42:55.866+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32
time=2025-10-17T14:42:55.866+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=gptoss.pooling_type default=0
time=2025-10-17T14:42:55.866+02:00 level=INFO source=cpu_windows.go:139 msg=packages count=1
time=2025-10-17T14:42:55.866+02:00 level=INFO source=cpu_windows.go:186 msg="" package=0 cores=8 efficiency=0 threads=16
time=2025-10-17T14:42:55.866+02:00 level=INFO source=server.go:216 msg="enabling flash attention"
time=2025-10-17T14:42:55.866+02:00 level=DEBUG source=server.go:331 msg="adding gpu dependency paths" paths="[C:\\develop\\ollama\\lib\\ollama C:\\develop\\ollama\\lib\\ollama\\rocm C:\\develop\\ollama\\lib\\ollama\\rocm]"
time=2025-10-17T14:42:55.866+02:00 level=INFO source=server.go:400 msg="starting runner" cmd="C:\\develop\\ollama\\ollama.exe runner --ollama-engine --model C:\\Users\\jack\\.ollama\\models\\blobs\\sha256-e7b273f9636059a689e3ddcab3716e4f65abe0143ac978e46673ad0e52d09efb --port 51587"
time=2025-10-17T14:42:55.866+02:00 level=DEBUG source=server.go:401 msg=subprocess HIP_PATH="C:\\Program Files\\AMD\\ROCm\\6.4\\" HIP_PATH_64="C:\\Program Files\\AMD\\ROCm\\6.4\\" OLLAMA_DEBUG=2 OLLAMA_MAX_LOADED_MODELS=3 PATH="C:\\develop\\ollama\\lib\\ollama;C:\\develop\\ollama\\lib\\ollama\\rocm;C:\\develop\\ollama\\lib\\ollama\\rocm;C:\\develop\\ollama\\lib\\ollama;C:\\WINDOWS\\system32;C:\\WINDOWS;C:\\WINDOWS\\System32\\Wbem;C:\\WINDOWS\\System32\\WindowsPowerShell\\v1.0\\;C:\\WINDOWS\\System32\\OpenSSH\\;C:\\Program Files\\PowerShell\\7\\;C:\\Users\\jack\\AppData\\Local\\Microsoft\\WindowsApps;;C:\\Users\\jack\\.lmstudio\\bin;C:\\applications\\Cinebench;;C:\\develop\\ollama;C:\\develop\\ollama\\lib\\ollama" OLLAMA_LIBRARY_PATH=C:\develop\ollama\lib\ollama;C:\develop\ollama\lib\ollama;C:\develop\ollama\lib\ollama\rocm;C:\develop\ollama\lib\ollama\rocm HIP_VISIBLE_DEVICES=1
time=2025-10-17T14:42:55.866+02:00 level=INFO source=server.go:676 msg="loading model" "model layers"=25 requested=-1
time=2025-10-17T14:42:55.866+02:00 level=INFO source=cpu_windows.go:139 msg=packages count=1
time=2025-10-17T14:42:55.866+02:00 level=INFO source=cpu_windows.go:186 msg="" package=0 cores=8 efficiency=0 threads=16
time=2025-10-17T14:42:55.866+02:00 level=INFO source=server.go:682 msg="system memory" total="61.8 GiB" free="52.9 GiB" free_swap="54.1 GiB"
time=2025-10-17T14:42:55.866+02:00 level=INFO source=server.go:690 msg="gpu memory" id=0 library=ROCm available="1.5 GiB" free="2.0 GiB" minimum="457.0 MiB" overhead="0 B"
time=2025-10-17T14:42:55.866+02:00 level=TRACE source=server.go:890 msg="layer to assign" layer=0 size="0 B"
time=2025-10-17T14:42:55.866+02:00 level=TRACE source=server.go:890 msg="layer to assign" layer=1 size="0 B"
time=2025-10-17T14:42:55.866+02:00 level=TRACE source=server.go:890 msg="layer to assign" layer=2 size="0 B"
time=2025-10-17T14:42:55.866+02:00 level=TRACE source=server.go:890 msg="layer to assign" layer=3 size="0 B"
time=2025-10-17T14:42:55.866+02:00 level=TRACE source=server.go:890 msg="layer to assign" layer=4 size="0 B"
time=2025-10-17T14:42:55.866+02:00 level=TRACE source=server.go:890 msg="layer to assign" layer=5 size="0 B"
time=2025-10-17T14:42:55.866+02:00 level=TRACE source=server.go:890 msg="layer to assign" layer=6 size="0 B"
time=2025-10-17T14:42:55.866+02:00 level=TRACE source=server.go:890 msg="layer to assign" layer=7 size="0 B"
time=2025-10-17T14:42:55.866+02:00 level=TRACE source=server.go:890 msg="layer to assign" layer=8 size="0 B"
time=2025-10-17T14:42:55.866+02:00 level=TRACE source=server.go:890 msg="layer to assign" layer=9 size="0 B"
time=2025-10-17T14:42:55.866+02:00 level=TRACE source=server.go:890 msg="layer to assign" layer=10 size="0 B"
time=2025-10-17T14:42:55.866+02:00 level=TRACE source=server.go:890 msg="layer to assign" layer=11 size="0 B"
time=2025-10-17T14:42:55.866+02:00 level=TRACE source=server.go:890 msg="layer to assign" layer=12 size="0 B"
time=2025-10-17T14:42:55.866+02:00 level=TRACE source=server.go:890 msg="layer to assign" layer=13 size="0 B"

....

ime=2025-10-17T14:43:01.750+02:00 level=TRACE source=runner.go:745 msg="computeBatch: EOS" batchID=33 seqIdx=0
time=2025-10-17T14:43:01.750+02:00 level=TRACE source=runner.go:630 msg="computeBatch: outputs are ready" batchID=33
time=2025-10-17T14:43:01.750+02:00 level=TRACE source=runner.go:625 msg="computeBatch: inputs are ready" batchID=34
time=2025-10-17T14:43:01.750+02:00 level=TRACE source=runner.go:698 msg="computeBatch: signaling computeStartedCh" batchID=34
time=2025-10-17T14:43:01.750+02:00 level=TRACE source=runner.go:450 msg="forwardBatch compute started, setting up next batch" pendingBatch.id=34 id=35
time=2025-10-17T14:43:01.818+02:00 level=TRACE source=runner.go:706 msg="computeBatch: logits ready" batchID=34
time=2025-10-17T14:43:01.818+02:00 level=TRACE source=runner.go:711 msg="computeBatch: decoding" batchID=34
time=2025-10-17T14:43:01.818+02:00 level=TRACE source=runner.go:630 msg="computeBatch: outputs are ready" batchID=34
time=2025-10-17T14:43:01.849+02:00 level=TRACE source=bytepairencoding.go:244 msg=encoded string="<|start|>system<|message|>You are ChatGPT, a large language model trained by OpenAI.\nKnowledge cutoff: 2024-06\nCurrent date: 2025-10-17\n\nReasoning: medium\n\n# Valid channels: analysis, commentary, final. Channel must be included for every message.<|end|><|start|>user<|message|>hello<|end|><|start|>assistant<|channel|>analysis<|message|>The user says \"hello\". They want a response. We respond politely.<|end|><|start|>assistant<|channel|>final<|message|>Hello! How can I help you today?" ids="[200006 17360 200008 3575 553 17554 162016 11 261 4410 6439 2359 22203 656 7788 17527 558 87447 100594 25 220 1323 19 12 3218 198 6576 3521 25 220 1323 20 12 702 12 1422 279 30377 289 25 14093 279 2 13888 18403 25 8450 11 49159 11 1721 13 21030 2804 413 7360 395 1753 3176 13 200007 200006 1428 200008 24912 200007 200006 173781 200005 35644 200008 976 1825 5003 392 24912 4050 3164 1682 261 3322 13 1416 9570 167705 13 200007 200006 173781 200005 17196 200008 13225 0 3253 665 357 1652 481 4044 30]"
[GIN] 2025/10/17 - 14:43:01 | 200 |    6.6153224s |       127.0.0.1 | POST     "/api/generate"
time=2025-10-17T14:43:01.850+02:00 level=DEBUG source=sched.go:502 msg="context for request finished"
time=2025-10-17T14:43:01.850+02:00 level=DEBUG source=sched.go:294 msg="runner with non-zero duration has gone idle, adding timer" runner.name=registry.ollama.ai/library/gpt-oss:20b runner.inference="[{ID:0 Library:ROCm}]" runner.size="13.3 GiB" runner.vram="1.5 GiB" runner.parallel=1 runner.pid=8044 runner.model=C:\Users\jack\.ollama\models\blobs\sha256-e7b273f9636059a689e3ddcab3716e4f65abe0143ac978e46673ad0e52d09efb runner.num_ctx=8192 duration=5m0s
time=2025-10-17T14:43:01.850+02:00 level=DEBUG source=sched.go:312 msg="after processing request finished event" runner.name=registry.ollama.ai/library/gpt-oss:20b runner.inference="[{ID:0 Library:ROCm}]" runner.size="13.3 GiB" runner.vram="1.5 GiB" runner.parallel=1 runner.pid=8044 runner.model=C:\Users\jack\.ollama\models\blobs\sha256-e7b273f9636059a689e3ddcab3716e4f65abe0143ac978e46673ad0e52d09efb runner.num_ctx=8192 refCount=0

@jack-running commented on GitHub (Oct 17, 2025): > Actually we may need `$env:OLLAMA_DEBUG="2"` to see why the AMD GPU VRAM reporting is getting things wrong. The easiest way to get this is to quit Ollama from the system tray, then in a powershell terminal run: > > $env:OLLAMA_DEBUG="2" > ollama serve 2>&1 | % ToString | tee-object serve.log > Then in another terminal run > > ollama run gpt-oss:20b hello My apologies for delay in my response. The issue is still there even with Ollama 0.12.6 The startup logs: ``` PS C:\Users\jack> $env:OLLAMA_DEBUG="2" PS C:\Users\jack> ollama serve 2>&1 | % ToString | tee-object serve.log time=2025-10-17T14:35:00.654+02:00 level=INFO source=routes.go:1511 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:DEBUG-4 OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:C:\\Users\\jack\\.ollama\\models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES:]" time=2025-10-17T14:35:00.662+02:00 level=INFO source=images.go:522 msg="total blobs: 80" time=2025-10-17T14:35:00.665+02:00 level=INFO source=images.go:529 msg="total unused blobs removed: 0" time=2025-10-17T14:35:00.666+02:00 level=INFO source=routes.go:1564 msg="Listening on 127.0.0.1:11434 (version 0.12.6)" time=2025-10-17T14:35:00.667+02:00 level=DEBUG source=sched.go:123 msg="starting llm scheduler" time=2025-10-17T14:35:00.667+02:00 level=INFO source=runner.go:80 msg="discovering available GPUs..." time=2025-10-17T14:35:00.667+02:00 level=DEBUG source=runner.go:448 msg="spawning runner with" OLLAMA_LIBRARY_PATH="[C:\\develop\\ollama\\lib\\ollama C:\\develop\\ollama\\lib\\ollama\\cuda_v12]" extra_envs=[] time=2025-10-17T14:35:00.671+02:00 level=TRACE source=runner.go:529 msg="starting runner for device discovery" env="[ALLUSERSPROFILE=C:\\ProgramData APPDATA=C:\\Users\\jack\\AppData\\Roaming CLIENTNAME=SK-5CD422DMZ6 CommonProgramFiles=C:\\Program Files\\Common Files CommonProgramFiles(x86)=C:\\Program Files (x86)\\Common Files CommonProgramW6432=C:\\Program Files\\Common Files COMPUTERNAME=WIN-UPOOAPUI2PC ComSpec=C:\\WINDOWS\\system32\\cmd.exe DriverData=C:\\Windows\\System32\\Drivers\\DriverData HIP_PATH=C:\\Program Files\\AMD\\ROCm\\6.4\\ HIP_PATH_64=C:\\Program Files\\AMD\\ROCm\\6.4\\ HOMEDRIVE=C: HOMEPATH=\\Users\\jack LOCALAPPDATA=C:\\Users\\jack\\AppData\\Local LOGONSERVER=\\\\WIN-UPOOAPUI2PC NUMBER_OF_PROCESSORS=16 OLLAMA_DEBUG=2 OneDrive=C:\\Users\\jack\\OneDrive OS=Windows_NT PATH=C:\\develop\\ollama\\lib\\ollama;C:\\develop\\ollama\\lib\\ollama;C:\\develop\\ollama\\lib\\ollama\\cuda_v12;C:\\develop\\ollama\\lib\\ollama\\rocm;C:\\WINDOWS\\system32;C:\\WINDOWS;C:\\WINDOWS\\System32\\Wbem;C:\\WINDOWS\\System32\\WindowsPowerShell\\v1.0\\;C:\\WINDOWS\\System32\\OpenSSH\\;C:\\Program Files\\PowerShell\\7\\;C:\\Users\\jack\\AppData\\Local\\Microsoft\\WindowsApps;;C:\\Users\\jack\\.lmstudio\\bin;C:\\applications\\Cinebench;;C:\\develop\\ollama PATHEXT=.COM;.EXE;.BAT;.CMD;.VBS;.VBE;.JS;.JSE;.WSF;.WSH;.MSC;.CPL POWERSHELL_DISTRIBUTION_CHANNEL=MSI:Windows 10 Pro PROCESSOR_ARCHITECTURE=AMD64 PROCESSOR_IDENTIFIER=AMD64 Family 25 Model 117 Stepping 2, AuthenticAMD PROCESSOR_LEVEL=25 PROCESSOR_REVISION=7502 ProgramData=C:\\ProgramData ProgramFiles=C:\\Program Files ProgramFiles(x86)=C:\\Program Files (x86) ProgramW6432=C:\\Program Files PSModulePath=C:\\Users\\jack\\Documents\\WindowsPowerShell\\Modules;C:\\Program Files\\WindowsPowerShell\\Modules;C:\\WINDOWS\\system32\\WindowsPowerShell\\v1.0\\Modules PUBLIC=C:\\Users\\Public SESSIONNAME=RDP-Tcp#0 SystemDrive=C: SystemRoot=C:\\WINDOWS TEMP=C:\\Users\\jack\\AppData\\Local\\Temp TMP=C:\\Users\\jack\\AppData\\Local\\Temp USERDOMAIN=WIN-UPOOAPUI2PC USERDOMAIN_ROAMINGPROFILE=WIN-UPOOAPUI2PC USERNAME=jack USERPROFILE=C:\\Users\\jack windir=C:\\WINDOWS WSLENV=WT_SESSION:WT_PROFILE_ID: WT_PROFILE_ID={61c54bbd-c2c6-5271-96e7-009a87ff44bf} WT_SESSION=f8e91e60-9199-47c5-9982-1702cfd36181 OLLAMA_LIBRARY_PATH=C:\\develop\\ollama\\lib\\ollama;C:\\develop\\ollama\\lib\\ollama\\cuda_v12]" cmd="C:\\develop\\ollama\\ollama.exe runner --ollama-engine --port 51872" time=2025-10-17T14:35:00.696+02:00 level=INFO source=runner.go:1332 msg="starting ollama engine" time=2025-10-17T14:35:00.697+02:00 level=INFO source=runner.go:1367 msg="Server listening on 127.0.0.1:51872" time=2025-10-17T14:35:00.705+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32 time=2025-10-17T14:35:00.705+02:00 level=DEBUG source=gguf.go:578 msg=general.architecture type=string time=2025-10-17T14:35:00.705+02:00 level=DEBUG source=gguf.go:578 msg=tokenizer.ggml.model type=string time=2025-10-17T14:35:00.706+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32 time=2025-10-17T14:35:00.706+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.file_type default=0 time=2025-10-17T14:35:00.706+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.name default="" time=2025-10-17T14:35:00.706+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.description default="" time=2025-10-17T14:35:00.706+02:00 level=INFO source=ggml.go:134 msg="" architecture=llama file_type=unknown name="" description="" num_tensors=0 num_key_values=3 time=2025-10-17T14:35:00.706+02:00 level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=C:\develop\ollama\lib\ollama load_backend: loaded CPU backend from C:\develop\ollama\lib\ollama\ggml-cpu-icelake.dll time=2025-10-17T14:35:00.716+02:00 level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=C:\develop\ollama\lib\ollama\cuda_v12 time=2025-10-17T14:35:00.790+02:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.AVX512=1 CPU.0.AVX512_VBMI=1 CPU.0.AVX512_VNNI=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(clang) time=2025-10-17T14:35:00.797+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.block_count default=0 time=2025-10-17T14:35:00.797+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.pooling_type default=0 time=2025-10-17T14:35:00.797+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.expert_count default=0 time=2025-10-17T14:35:00.797+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.tokens default="&{size:0 values:[]}" time=2025-10-17T14:35:00.797+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.scores default="&{size:0 values:[]}" time=2025-10-17T14:35:00.797+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.token_type default="&{size:0 values:[]}" time=2025-10-17T14:35:00.797+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.merges default="&{size:0 values:[]}" time=2025-10-17T14:35:00.797+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.add_bos_token default=true time=2025-10-17T14:35:00.797+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.bos_token_id default=0 time=2025-10-17T14:35:00.797+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.add_eos_token default=false time=2025-10-17T14:35:00.797+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.eos_token_id default=0 time=2025-10-17T14:35:00.797+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}" time=2025-10-17T14:35:00.797+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.pre default="" time=2025-10-17T14:35:00.797+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.block_count default=0 time=2025-10-17T14:35:00.797+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.embedding_length default=0 time=2025-10-17T14:35:00.797+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.attention.head_count default=0 time=2025-10-17T14:35:00.797+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.attention.head_count_kv default=0 time=2025-10-17T14:35:00.797+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.attention.key_length default=0 time=2025-10-17T14:35:00.797+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.rope.dimension_count default=0 time=2025-10-17T14:35:00.797+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.attention.layer_norm_rms_epsilon default=0 time=2025-10-17T14:35:00.797+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.rope.freq_base default=100000 time=2025-10-17T14:35:00.797+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.rope.scaling.factor default=1 time=2025-10-17T14:35:00.797+02:00 level=DEBUG source=runner.go:1307 msg="dummy model load took" duration=92.5632ms time=2025-10-17T14:35:00.797+02:00 level=DEBUG source=runner.go:1312 msg="gathering device infos took" duration=0s time=2025-10-17T14:35:00.798+02:00 level=TRACE source=runner.go:548 msg="runner enumerated devices" OLLAMA_LIBRARY_PATH="[C:\\develop\\ollama\\lib\\ollama C:\\develop\\ollama\\lib\\ollama\\cuda_v12]" devices=[] time=2025-10-17T14:35:00.798+02:00 level=DEBUG source=runner.go:451 msg="bootstrap discovery took" duration=131.2594ms OLLAMA_LIBRARY_PATH="[C:\\develop\\ollama\\lib\\ollama C:\\develop\\ollama\\lib\\ollama\\cuda_v12]" extra_envs=[] time=2025-10-17T14:35:00.798+02:00 level=DEBUG source=runner.go:448 msg="spawning runner with" OLLAMA_LIBRARY_PATH="[C:\\develop\\ollama\\lib\\ollama C:\\develop\\ollama\\lib\\ollama\\cuda_v13]" extra_envs=[] time=2025-10-17T14:35:00.799+02:00 level=TRACE source=runner.go:529 msg="starting runner for device discovery" env="[ALLUSERSPROFILE=C:\\ProgramData APPDATA=C:\\Users\\jack\\AppData\\Roaming CLIENTNAME=SK-5CD422DMZ6 CommonProgramFiles=C:\\Program Files\\Common Files CommonProgramFiles(x86)=C:\\Program Files (x86)\\Common Files CommonProgramW6432=C:\\Program Files\\Common Files COMPUTERNAME=WIN-UPOOAPUI2PC ComSpec=C:\\WINDOWS\\system32\\cmd.exe DriverData=C:\\Windows\\System32\\Drivers\\DriverData HIP_PATH=C:\\Program Files\\AMD\\ROCm\\6.4\\ HIP_PATH_64=C:\\Program Files\\AMD\\ROCm\\6.4\\ HOMEDRIVE=C: HOMEPATH=\\Users\\jack LOCALAPPDATA=C:\\Users\\jack\\AppData\\Local LOGONSERVER=\\\\WIN-UPOOAPUI2PC NUMBER_OF_PROCESSORS=16 OLLAMA_DEBUG=2 OneDrive=C:\\Users\\jack\\OneDrive OS=Windows_NT PATH=C:\\develop\\ollama\\lib\\ollama;C:\\develop\\ollama\\lib\\ollama;C:\\develop\\ollama\\lib\\ollama\\cuda_v13;C:\\develop\\ollama\\lib\\ollama\\rocm;C:\\WINDOWS\\system32;C:\\WINDOWS;C:\\WINDOWS\\System32\\Wbem;C:\\WINDOWS\\System32\\WindowsPowerShell\\v1.0\\;C:\\WINDOWS\\System32\\OpenSSH\\;C:\\Program Files\\PowerShell\\7\\;C:\\Users\\jack\\AppData\\Local\\Microsoft\\WindowsApps;;C:\\Users\\jack\\.lmstudio\\bin;C:\\applications\\Cinebench;;C:\\develop\\ollama PATHEXT=.COM;.EXE;.BAT;.CMD;.VBS;.VBE;.JS;.JSE;.WSF;.WSH;.MSC;.CPL POWERSHELL_DISTRIBUTION_CHANNEL=MSI:Windows 10 Pro PROCESSOR_ARCHITECTURE=AMD64 PROCESSOR_IDENTIFIER=AMD64 Family 25 Model 117 Stepping 2, AuthenticAMD PROCESSOR_LEVEL=25 PROCESSOR_REVISION=7502 ProgramData=C:\\ProgramData ProgramFiles=C:\\Program Files ProgramFiles(x86)=C:\\Program Files (x86) ProgramW6432=C:\\Program Files PSModulePath=C:\\Users\\jack\\Documents\\WindowsPowerShell\\Modules;C:\\Program Files\\WindowsPowerShell\\Modules;C:\\WINDOWS\\system32\\WindowsPowerShell\\v1.0\\Modules PUBLIC=C:\\Users\\Public SESSIONNAME=RDP-Tcp#0 SystemDrive=C: SystemRoot=C:\\WINDOWS TEMP=C:\\Users\\jack\\AppData\\Local\\Temp TMP=C:\\Users\\jack\\AppData\\Local\\Temp USERDOMAIN=WIN-UPOOAPUI2PC USERDOMAIN_ROAMINGPROFILE=WIN-UPOOAPUI2PC USERNAME=jack USERPROFILE=C:\\Users\\jack windir=C:\\WINDOWS WSLENV=WT_SESSION:WT_PROFILE_ID: WT_PROFILE_ID={61c54bbd-c2c6-5271-96e7-009a87ff44bf} WT_SESSION=f8e91e60-9199-47c5-9982-1702cfd36181 OLLAMA_LIBRARY_PATH=C:\\develop\\ollama\\lib\\ollama;C:\\develop\\ollama\\lib\\ollama\\cuda_v13]" cmd="C:\\develop\\ollama\\ollama.exe runner --ollama-engine --port 51876" time=2025-10-17T14:35:00.820+02:00 level=INFO source=runner.go:1332 msg="starting ollama engine" time=2025-10-17T14:35:00.821+02:00 level=INFO source=runner.go:1367 msg="Server listening on 127.0.0.1:51876" time=2025-10-17T14:35:00.823+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32 time=2025-10-17T14:35:00.823+02:00 level=DEBUG source=gguf.go:578 msg=general.architecture type=string time=2025-10-17T14:35:00.823+02:00 level=DEBUG source=gguf.go:578 msg=tokenizer.ggml.model type=string time=2025-10-17T14:35:00.824+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32 time=2025-10-17T14:35:00.824+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.file_type default=0 time=2025-10-17T14:35:00.824+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.name default="" time=2025-10-17T14:35:00.824+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.description default="" time=2025-10-17T14:35:00.824+02:00 level=INFO source=ggml.go:134 msg="" architecture=llama file_type=unknown name="" description="" num_tensors=0 num_key_values=3 time=2025-10-17T14:35:00.824+02:00 level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=C:\develop\ollama\lib\ollama load_backend: loaded CPU backend from C:\develop\ollama\lib\ollama\ggml-cpu-icelake.dll time=2025-10-17T14:35:00.834+02:00 level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=C:\develop\ollama\lib\ollama\cuda_v13 CUDA error: (null) current device: -1, in function ggml_backend_cuda_reg at C:\a\ollama\ollama\ml\backend\ggml\ggml\src\ggml-cuda\ggml-cuda.cu:4162 cudaDriverGetVersion(&driverVersion) C:\a\ollama\ollama\ml\backend\ggml\ggml\src\ggml-cuda\ggml-cuda.cu:88: CUDA error time=2025-10-17T14:35:01.092+02:00 level=TRACE source=runner.go:543 msg="runner exited" OLLAMA_LIBRARY_PATH="[C:\\develop\\ollama\\lib\\ollama C:\\develop\\ollama\\lib\\ollama\\cuda_v13]" extra_envs=[] code=3221226505 time=2025-10-17T14:35:01.092+02:00 level=TRACE source=runner.go:548 msg="runner enumerated devices" OLLAMA_LIBRARY_PATH="[C:\\develop\\ollama\\lib\\ollama C:\\develop\\ollama\\lib\\ollama\\cuda_v13]" devices=[] time=2025-10-17T14:35:01.092+02:00 level=DEBUG source=runner.go:451 msg="bootstrap discovery took" duration=294.4044ms OLLAMA_LIBRARY_PATH="[C:\\develop\\ollama\\lib\\ollama C:\\develop\\ollama\\lib\\ollama\\cuda_v13]" extra_envs=[] time=2025-10-17T14:35:01.092+02:00 level=DEBUG source=runner.go:448 msg="spawning runner with" OLLAMA_LIBRARY_PATH="[C:\\develop\\ollama\\lib\\ollama C:\\develop\\ollama\\lib\\ollama\\rocm]" extra_envs=[] time=2025-10-17T14:35:01.093+02:00 level=TRACE source=runner.go:529 msg="starting runner for device discovery" env="[ALLUSERSPROFILE=C:\\ProgramData APPDATA=C:\\Users\\jack\\AppData\\Roaming CLIENTNAME=SK-5CD422DMZ6 CommonProgramFiles=C:\\Program Files\\Common Files CommonProgramFiles(x86)=C:\\Program Files (x86)\\Common Files CommonProgramW6432=C:\\Program Files\\Common Files COMPUTERNAME=WIN-UPOOAPUI2PC ComSpec=C:\\WINDOWS\\system32\\cmd.exe DriverData=C:\\Windows\\System32\\Drivers\\DriverData HIP_PATH=C:\\Program Files\\AMD\\ROCm\\6.4\\ HIP_PATH_64=C:\\Program Files\\AMD\\ROCm\\6.4\\ HOMEDRIVE=C: HOMEPATH=\\Users\\jack LOCALAPPDATA=C:\\Users\\jack\\AppData\\Local LOGONSERVER=\\\\WIN-UPOOAPUI2PC NUMBER_OF_PROCESSORS=16 OLLAMA_DEBUG=2 OneDrive=C:\\Users\\jack\\OneDrive OS=Windows_NT PATH=C:\\develop\\ollama\\lib\\ollama;C:\\develop\\ollama\\lib\\ollama;C:\\develop\\ollama\\lib\\ollama\\rocm;C:\\develop\\ollama\\lib\\ollama\\rocm;C:\\WINDOWS\\system32;C:\\WINDOWS;C:\\WINDOWS\\System32\\Wbem;C:\\WINDOWS\\System32\\WindowsPowerShell\\v1.0\\;C:\\WINDOWS\\System32\\OpenSSH\\;C:\\Program Files\\PowerShell\\7\\;C:\\Users\\jack\\AppData\\Local\\Microsoft\\WindowsApps;;C:\\Users\\jack\\.lmstudio\\bin;C:\\applications\\Cinebench;;C:\\develop\\ollama PATHEXT=.COM;.EXE;.BAT;.CMD;.VBS;.VBE;.JS;.JSE;.WSF;.WSH;.MSC;.CPL POWERSHELL_DISTRIBUTION_CHANNEL=MSI:Windows 10 Pro PROCESSOR_ARCHITECTURE=AMD64 PROCESSOR_IDENTIFIER=AMD64 Family 25 Model 117 Stepping 2, AuthenticAMD PROCESSOR_LEVEL=25 PROCESSOR_REVISION=7502 ProgramData=C:\\ProgramData ProgramFiles=C:\\Program Files ProgramFiles(x86)=C:\\Program Files (x86) ProgramW6432=C:\\Program Files PSModulePath=C:\\Users\\jack\\Documents\\WindowsPowerShell\\Modules;C:\\Program Files\\WindowsPowerShell\\Modules;C:\\WINDOWS\\system32\\WindowsPowerShell\\v1.0\\Modules PUBLIC=C:\\Users\\Public SESSIONNAME=RDP-Tcp#0 SystemDrive=C: SystemRoot=C:\\WINDOWS TEMP=C:\\Users\\jack\\AppData\\Local\\Temp TMP=C:\\Users\\jack\\AppData\\Local\\Temp USERDOMAIN=WIN-UPOOAPUI2PC USERDOMAIN_ROAMINGPROFILE=WIN-UPOOAPUI2PC USERNAME=jack USERPROFILE=C:\\Users\\jack windir=C:\\WINDOWS WSLENV=WT_SESSION:WT_PROFILE_ID: WT_PROFILE_ID={61c54bbd-c2c6-5271-96e7-009a87ff44bf} WT_SESSION=f8e91e60-9199-47c5-9982-1702cfd36181 OLLAMA_LIBRARY_PATH=C:\\develop\\ollama\\lib\\ollama;C:\\develop\\ollama\\lib\\ollama\\rocm]" cmd="C:\\develop\\ollama\\ollama.exe runner --ollama-engine --port 51882" time=2025-10-17T14:35:01.116+02:00 level=INFO source=runner.go:1332 msg="starting ollama engine" time=2025-10-17T14:35:01.117+02:00 level=INFO source=runner.go:1367 msg="Server listening on 127.0.0.1:51882" time=2025-10-17T14:35:01.127+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32 time=2025-10-17T14:35:01.127+02:00 level=DEBUG source=gguf.go:578 msg=general.architecture type=string time=2025-10-17T14:35:01.127+02:00 level=DEBUG source=gguf.go:578 msg=tokenizer.ggml.model type=string time=2025-10-17T14:35:01.128+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32 time=2025-10-17T14:35:01.129+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.file_type default=0 time=2025-10-17T14:35:01.129+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.name default="" time=2025-10-17T14:35:01.129+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.description default="" time=2025-10-17T14:35:01.129+02:00 level=INFO source=ggml.go:134 msg="" architecture=llama file_type=unknown name="" description="" num_tensors=0 num_key_values=3 time=2025-10-17T14:35:01.129+02:00 level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=C:\develop\ollama\lib\ollama load_backend: loaded CPU backend from C:\develop\ollama\lib\ollama\ggml-cpu-icelake.dll time=2025-10-17T14:35:01.140+02:00 level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=C:\develop\ollama\lib\ollama\rocm ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 2 ROCm devices: Device 0: AMD Radeon 780M Graphics, gfx1103 (0x1103), VMM: no, Wave Size: 32, ID: 0 Device 1: AMD Radeon RX 7900 XTX, gfx1100 (0x1100), VMM: no, Wave Size: 32, ID: 1 load_backend: loaded ROCm backend from C:\develop\ollama\lib\ollama\rocm\ggml-hip.dll time=2025-10-17T14:35:01.180+02:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.AVX512=1 CPU.0.AVX512_VBMI=1 CPU.0.AVX512_VNNI=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 ROCm.0.NO_VMM=1 ROCm.0.NO_PEER_COPY=1 ROCm.0.PEER_MAX_BATCH_SIZE=128 ROCm.1.NO_VMM=1 ROCm.1.NO_PEER_COPY=1 ROCm.1.PEER_MAX_BATCH_SIZE=128 compiler=cgo(clang) time=2025-10-17T14:35:01.181+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.block_count default=0 time=2025-10-17T14:35:01.181+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.pooling_type default=0 time=2025-10-17T14:35:01.181+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.expert_count default=0 time=2025-10-17T14:35:01.181+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.tokens default="&{size:0 values:[]}" time=2025-10-17T14:35:01.181+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.scores default="&{size:0 values:[]}" time=2025-10-17T14:35:01.181+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.token_type default="&{size:0 values:[]}" time=2025-10-17T14:35:01.181+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.merges default="&{size:0 values:[]}" time=2025-10-17T14:35:01.181+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.add_bos_token default=true time=2025-10-17T14:35:01.181+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.bos_token_id default=0 time=2025-10-17T14:35:01.181+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.add_eos_token default=false time=2025-10-17T14:35:01.181+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.eos_token_id default=0 time=2025-10-17T14:35:01.181+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}" time=2025-10-17T14:35:01.181+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.pre default="" time=2025-10-17T14:35:01.181+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.block_count default=0 time=2025-10-17T14:35:01.181+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.embedding_length default=0 time=2025-10-17T14:35:01.181+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.attention.head_count default=0 time=2025-10-17T14:35:01.181+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.attention.head_count_kv default=0 time=2025-10-17T14:35:01.181+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.attention.key_length default=0 time=2025-10-17T14:35:01.181+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.rope.dimension_count default=0 time=2025-10-17T14:35:01.181+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.attention.layer_norm_rms_epsilon default=0 time=2025-10-17T14:35:01.181+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.rope.freq_base default=100000 time=2025-10-17T14:35:01.181+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.rope.scaling.factor default=1 time=2025-10-17T14:35:01.181+02:00 level=DEBUG source=runner.go:1307 msg="dummy model load took" duration=53.9027ms ggml_hip_mgmt_init located ADLX version 1.4 ggml_backend_cuda_device_get_memory utilizing ADLX memory reporting free: 2129657856 total: 2147483648 ggml_hip_mgmt_init located ADLX version 1.4 ggml_hip_get_device_memory 0] GPU UniqueId: c800 does not match target 03 00 ggml_backend_cuda_device_get_memory utilizing ADLX memory reporting free: 25044189184 total: 25753026560 time=2025-10-17T14:35:01.518+02:00 level=DEBUG source=runner.go:1312 msg="gathering device infos took" duration=337.4036ms time=2025-10-17T14:35:01.520+02:00 level=TRACE source=runner.go:548 msg="runner enumerated devices" OLLAMA_LIBRARY_PATH="[C:\\develop\\ollama\\lib\\ollama C:\\develop\\ollama\\lib\\ollama\\rocm]" devices="[{DeviceID:{ID:0 Library:ROCm} Name:ROCm0 Description:AMD Radeon 780M Graphics FilteredID: Integrated:true PCIID:c8:00.0 TotalMemory:2147483648 FreeMemory:2129657856 ComputeMajor:17 ComputeMinor:3 DriverMajor:60450 DriverMinor:10 LibraryPath:[C:\\develop\\ollama\\lib\\ollama C:\\develop\\ollama\\lib\\ollama\\rocm]} {DeviceID:{ID:1 Library:ROCm} Name:ROCm1 Description:AMD Radeon RX 7900 XTX FilteredID: Integrated:false PCIID:03:00.0 TotalMemory:25753026560 FreeMemory:25044189184 ComputeMajor:17 ComputeMinor:0 DriverMajor:60450 DriverMinor:10 LibraryPath:[C:\\develop\\ollama\\lib\\ollama C:\\develop\\ollama\\lib\\ollama\\rocm]}]" time=2025-10-17T14:35:01.520+02:00 level=DEBUG source=runner.go:451 msg="bootstrap discovery took" duration=427.7923ms OLLAMA_LIBRARY_PATH="[C:\\develop\\ollama\\lib\\ollama C:\\develop\\ollama\\lib\\ollama\\rocm]" extra_envs=[] time=2025-10-17T14:35:01.520+02:00 level=DEBUG source=runner.go:118 msg="filtering out unsupported or overlapping GPU library combinations" count=2 time=2025-10-17T14:35:01.520+02:00 level=DEBUG source=runner.go:130 msg="verifying GPU is supported" library=C:\develop\ollama\lib\ollama\rocm description="AMD Radeon 780M Graphics" compute=gfx1103 pci_id=c8:00.0 time=2025-10-17T14:35:01.520+02:00 level=DEBUG source=runner.go:130 msg="verifying GPU is supported" library=C:\develop\ollama\lib\ollama\rocm description="AMD Radeon RX 7900 XTX" compute=gfx1100 pci_id=03:00.0 time=2025-10-17T14:35:01.520+02:00 level=DEBUG source=runner.go:448 msg="spawning runner with" OLLAMA_LIBRARY_PATH="[C:\\develop\\ollama\\lib\\ollama C:\\develop\\ollama\\lib\\ollama\\rocm]" extra_envs="[GGML_CUDA_INIT=1 HIP_VISIBLE_DEVICES=1]" time=2025-10-17T14:35:01.520+02:00 level=DEBUG source=runner.go:448 msg="spawning runner with" OLLAMA_LIBRARY_PATH="[C:\\develop\\ollama\\lib\\ollama C:\\develop\\ollama\\lib\\ollama\\rocm]" extra_envs="[GGML_CUDA_INIT=1 HIP_VISIBLE_DEVICES=0]" time=2025-10-17T14:35:01.521+02:00 level=TRACE source=runner.go:529 msg="starting runner for device discovery" env="[ALLUSERSPROFILE=C:\\ProgramData APPDATA=C:\\Users\\jack\\AppData\\Roaming CLIENTNAME=SK-5CD422DMZ6 CommonProgramFiles=C:\\Program Files\\Common Files CommonProgramFiles(x86)=C:\\Program Files (x86)\\Common Files CommonProgramW6432=C:\\Program Files\\Common Files COMPUTERNAME=WIN-UPOOAPUI2PC ComSpec=C:\\WINDOWS\\system32\\cmd.exe DriverData=C:\\Windows\\System32\\Drivers\\DriverData HIP_PATH=C:\\Program Files\\AMD\\ROCm\\6.4\\ HIP_PATH_64=C:\\Program Files\\AMD\\ROCm\\6.4\\ HOMEDRIVE=C: HOMEPATH=\\Users\\jack LOCALAPPDATA=C:\\Users\\jack\\AppData\\Local LOGONSERVER=\\\\WIN-UPOOAPUI2PC NUMBER_OF_PROCESSORS=16 OLLAMA_DEBUG=2 OneDrive=C:\\Users\\jack\\OneDrive OS=Windows_NT PATH=C:\\develop\\ollama\\lib\\ollama;C:\\develop\\ollama\\lib\\ollama;C:\\develop\\ollama\\lib\\ollama\\rocm;C:\\develop\\ollama\\lib\\ollama\\rocm;C:\\WINDOWS\\system32;C:\\WINDOWS;C:\\WINDOWS\\System32\\Wbem;C:\\WINDOWS\\System32\\WindowsPowerShell\\v1.0\\;C:\\WINDOWS\\System32\\OpenSSH\\;C:\\Program Files\\PowerShell\\7\\;C:\\Users\\jack\\AppData\\Local\\Microsoft\\WindowsApps;;C:\\Users\\jack\\.lmstudio\\bin;C:\\applications\\Cinebench;;C:\\develop\\ollama PATHEXT=.COM;.EXE;.BAT;.CMD;.VBS;.VBE;.JS;.JSE;.WSF;.WSH;.MSC;.CPL POWERSHELL_DISTRIBUTION_CHANNEL=MSI:Windows 10 Pro PROCESSOR_ARCHITECTURE=AMD64 PROCESSOR_IDENTIFIER=AMD64 Family 25 Model 117 Stepping 2, AuthenticAMD PROCESSOR_LEVEL=25 PROCESSOR_REVISION=7502 ProgramData=C:\\ProgramData ProgramFiles=C:\\Program Files ProgramFiles(x86)=C:\\Program Files (x86) ProgramW6432=C:\\Program Files PSModulePath=C:\\Users\\jack\\Documents\\WindowsPowerShell\\Modules;C:\\Program Files\\WindowsPowerShell\\Modules;C:\\WINDOWS\\system32\\WindowsPowerShell\\v1.0\\Modules PUBLIC=C:\\Users\\Public SESSIONNAME=RDP-Tcp#0 SystemDrive=C: SystemRoot=C:\\WINDOWS TEMP=C:\\Users\\jack\\AppData\\Local\\Temp TMP=C:\\Users\\jack\\AppData\\Local\\Temp USERDOMAIN=WIN-UPOOAPUI2PC USERDOMAIN_ROAMINGPROFILE=WIN-UPOOAPUI2PC USERNAME=jack USERPROFILE=C:\\Users\\jack windir=C:\\WINDOWS WSLENV=WT_SESSION:WT_PROFILE_ID: WT_PROFILE_ID={61c54bbd-c2c6-5271-96e7-009a87ff44bf} WT_SESSION=f8e91e60-9199-47c5-9982-1702cfd36181 OLLAMA_LIBRARY_PATH=C:\\develop\\ollama\\lib\\ollama;C:\\develop\\ollama\\lib\\ollama\\rocm GGML_CUDA_INIT=1 HIP_VISIBLE_DEVICES=0]" cmd="C:\\develop\\ollama\\ollama.exe runner --ollama-engine --port 51886" time=2025-10-17T14:35:01.521+02:00 level=TRACE source=runner.go:529 msg="starting runner for device discovery" env="[ALLUSERSPROFILE=C:\\ProgramData APPDATA=C:\\Users\\jack\\AppData\\Roaming CLIENTNAME=SK-5CD422DMZ6 CommonProgramFiles=C:\\Program Files\\Common Files CommonProgramFiles(x86)=C:\\Program Files (x86)\\Common Files CommonProgramW6432=C:\\Program Files\\Common Files COMPUTERNAME=WIN-UPOOAPUI2PC ComSpec=C:\\WINDOWS\\system32\\cmd.exe DriverData=C:\\Windows\\System32\\Drivers\\DriverData HIP_PATH=C:\\Program Files\\AMD\\ROCm\\6.4\\ HIP_PATH_64=C:\\Program Files\\AMD\\ROCm\\6.4\\ HOMEDRIVE=C: HOMEPATH=\\Users\\jack LOCALAPPDATA=C:\\Users\\jack\\AppData\\Local LOGONSERVER=\\\\WIN-UPOOAPUI2PC NUMBER_OF_PROCESSORS=16 OLLAMA_DEBUG=2 OneDrive=C:\\Users\\jack\\OneDrive OS=Windows_NT PATH=C:\\develop\\ollama\\lib\\ollama;C:\\develop\\ollama\\lib\\ollama;C:\\develop\\ollama\\lib\\ollama\\rocm;C:\\develop\\ollama\\lib\\ollama\\rocm;C:\\WINDOWS\\system32;C:\\WINDOWS;C:\\WINDOWS\\System32\\Wbem;C:\\WINDOWS\\System32\\WindowsPowerShell\\v1.0\\;C:\\WINDOWS\\System32\\OpenSSH\\;C:\\Program Files\\PowerShell\\7\\;C:\\Users\\jack\\AppData\\Local\\Microsoft\\WindowsApps;;C:\\Users\\jack\\.lmstudio\\bin;C:\\applications\\Cinebench;;C:\\develop\\ollama PATHEXT=.COM;.EXE;.BAT;.CMD;.VBS;.VBE;.JS;.JSE;.WSF;.WSH;.MSC;.CPL POWERSHELL_DISTRIBUTION_CHANNEL=MSI:Windows 10 Pro PROCESSOR_ARCHITECTURE=AMD64 PROCESSOR_IDENTIFIER=AMD64 Family 25 Model 117 Stepping 2, AuthenticAMD PROCESSOR_LEVEL=25 PROCESSOR_REVISION=7502 ProgramData=C:\\ProgramData ProgramFiles=C:\\Program Files ProgramFiles(x86)=C:\\Program Files (x86) ProgramW6432=C:\\Program Files PSModulePath=C:\\Users\\jack\\Documents\\WindowsPowerShell\\Modules;C:\\Program Files\\WindowsPowerShell\\Modules;C:\\WINDOWS\\system32\\WindowsPowerShell\\v1.0\\Modules PUBLIC=C:\\Users\\Public SESSIONNAME=RDP-Tcp#0 SystemDrive=C: SystemRoot=C:\\WINDOWS TEMP=C:\\Users\\jack\\AppData\\Local\\Temp TMP=C:\\Users\\jack\\AppData\\Local\\Temp USERDOMAIN=WIN-UPOOAPUI2PC USERDOMAIN_ROAMINGPROFILE=WIN-UPOOAPUI2PC USERNAME=jack USERPROFILE=C:\\Users\\jack windir=C:\\WINDOWS WSLENV=WT_SESSION:WT_PROFILE_ID: WT_PROFILE_ID={61c54bbd-c2c6-5271-96e7-009a87ff44bf} WT_SESSION=f8e91e60-9199-47c5-9982-1702cfd36181 OLLAMA_LIBRARY_PATH=C:\\develop\\ollama\\lib\\ollama;C:\\develop\\ollama\\lib\\ollama\\rocm GGML_CUDA_INIT=1 HIP_VISIBLE_DEVICES=1]" cmd="C:\\develop\\ollama\\ollama.exe runner --ollama-engine --port 51887" time=2025-10-17T14:35:01.544+02:00 level=INFO source=runner.go:1332 msg="starting ollama engine" time=2025-10-17T14:35:01.544+02:00 level=INFO source=runner.go:1332 msg="starting ollama engine" time=2025-10-17T14:35:01.545+02:00 level=INFO source=runner.go:1367 msg="Server listening on 127.0.0.1:51887" time=2025-10-17T14:35:01.545+02:00 level=INFO source=runner.go:1367 msg="Server listening on 127.0.0.1:51886" time=2025-10-17T14:35:01.554+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32 time=2025-10-17T14:35:01.554+02:00 level=DEBUG source=gguf.go:578 msg=general.architecture type=string time=2025-10-17T14:35:01.554+02:00 level=DEBUG source=gguf.go:578 msg=tokenizer.ggml.model type=string time=2025-10-17T14:35:01.555+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32 time=2025-10-17T14:35:01.555+02:00 level=DEBUG source=gguf.go:578 msg=general.architecture type=string time=2025-10-17T14:35:01.555+02:00 level=DEBUG source=gguf.go:578 msg=tokenizer.ggml.model type=string time=2025-10-17T14:35:01.555+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32 time=2025-10-17T14:35:01.555+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.file_type default=0 time=2025-10-17T14:35:01.555+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.name default="" time=2025-10-17T14:35:01.555+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.description default="" time=2025-10-17T14:35:01.555+02:00 level=INFO source=ggml.go:134 msg="" architecture=llama file_type=unknown name="" description="" num_tensors=0 num_key_values=3 time=2025-10-17T14:35:01.555+02:00 level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=C:\develop\ollama\lib\ollama time=2025-10-17T14:35:01.555+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32 time=2025-10-17T14:35:01.555+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.file_type default=0 time=2025-10-17T14:35:01.555+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.name default="" time=2025-10-17T14:35:01.555+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.description default="" time=2025-10-17T14:35:01.555+02:00 level=INFO source=ggml.go:134 msg="" architecture=llama file_type=unknown name="" description="" num_tensors=0 num_key_values=3 time=2025-10-17T14:35:01.555+02:00 level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=C:\develop\ollama\lib\ollama load_backend: loaded CPU backend from C:\develop\ollama\lib\ollama\ggml-cpu-icelake.dll load_backend: loaded CPU backend from C:\develop\ollama\lib\ollama\ggml-cpu-icelake.dll time=2025-10-17T14:35:01.566+02:00 level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=C:\develop\ollama\lib\ollama\rocm time=2025-10-17T14:35:01.566+02:00 level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=C:\develop\ollama\lib\ollama\rocm ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 ROCm devices: ggml_cuda_init: initializing rocBLAS on device 0 ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 ROCm devices: ggml_cuda_init: initializing rocBLAS on device 0 System.Management.Automation.RemoteException rocBLAS error: Cannot read C:\develop\ollama\lib\ollama\rocm\/rocblas/library/TensileLibrary.dat: No such file or directory for GPU arch : gfx1103 List of available TensileLibrary Files : time=2025-10-17T14:35:01.663+02:00 level=TRACE source=runner.go:543 msg="runner exited" OLLAMA_LIBRARY_PATH="[C:\\develop\\ollama\\lib\\ollama C:\\develop\\ollama\\lib\\ollama\\rocm]" extra_envs="[GGML_CUDA_INIT=1 HIP_VISIBLE_DEVICES=0]" code=3221226505 time=2025-10-17T14:35:01.663+02:00 level=TRACE source=runner.go:548 msg="runner enumerated devices" OLLAMA_LIBRARY_PATH="[C:\\develop\\ollama\\lib\\ollama C:\\develop\\ollama\\lib\\ollama\\rocm]" devices=[] time=2025-10-17T14:35:01.663+02:00 level=DEBUG source=runner.go:451 msg="bootstrap discovery took" duration=143.3038ms OLLAMA_LIBRARY_PATH="[C:\\develop\\ollama\\lib\\ollama C:\\develop\\ollama\\lib\\ollama\\rocm]" extra_envs="[GGML_CUDA_INIT=1 HIP_VISIBLE_DEVICES=0]" ggml_cuda_init: rocBLAS initialized on device 0 Device 0: AMD Radeon RX 7900 XTX, gfx1100 (0x1100), VMM: no, Wave Size: 32, ID: 0 load_backend: loaded ROCm backend from C:\develop\ollama\lib\ollama\rocm\ggml-hip.dll time=2025-10-17T14:35:03.658+02:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.AVX512=1 CPU.0.AVX512_VBMI=1 CPU.0.AVX512_VNNI=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 ROCm.0.NO_VMM=1 ROCm.0.NO_PEER_COPY=1 ROCm.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(clang) time=2025-10-17T14:35:03.659+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.block_count default=0 time=2025-10-17T14:35:03.659+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.pooling_type default=0 time=2025-10-17T14:35:03.659+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.expert_count default=0 time=2025-10-17T14:35:03.659+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.tokens default="&{size:0 values:[]}" time=2025-10-17T14:35:03.659+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.scores default="&{size:0 values:[]}" time=2025-10-17T14:35:03.659+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.token_type default="&{size:0 values:[]}" time=2025-10-17T14:35:03.659+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.merges default="&{size:0 values:[]}" time=2025-10-17T14:35:03.659+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.add_bos_token default=true time=2025-10-17T14:35:03.659+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.bos_token_id default=0 time=2025-10-17T14:35:03.659+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.add_eos_token default=false time=2025-10-17T14:35:03.659+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.eos_token_id default=0 time=2025-10-17T14:35:03.659+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}" time=2025-10-17T14:35:03.659+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.pre default="" time=2025-10-17T14:35:03.659+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.block_count default=0 time=2025-10-17T14:35:03.659+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.embedding_length default=0 time=2025-10-17T14:35:03.659+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.attention.head_count default=0 time=2025-10-17T14:35:03.659+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.attention.head_count_kv default=0 time=2025-10-17T14:35:03.659+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.attention.key_length default=0 time=2025-10-17T14:35:03.659+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.rope.dimension_count default=0 time=2025-10-17T14:35:03.659+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.attention.layer_norm_rms_epsilon default=0 time=2025-10-17T14:35:03.659+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.rope.freq_base default=100000 time=2025-10-17T14:35:03.659+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.rope.scaling.factor default=1 time=2025-10-17T14:35:03.659+02:00 level=DEBUG source=runner.go:1307 msg="dummy model load took" duration=2.1046517s ggml_hip_mgmt_init located ADLX version 1.4 ggml_hip_get_device_memory 0] GPU UniqueId: c800 does not match target 03 00 ggml_backend_cuda_device_get_memory utilizing ADLX memory reporting free: 24657264640 total: 25753026560 time=2025-10-17T14:35:03.862+02:00 level=DEBUG source=runner.go:1312 msg="gathering device infos took" duration=202.9503ms time=2025-10-17T14:35:03.862+02:00 level=TRACE source=runner.go:548 msg="runner enumerated devices" OLLAMA_LIBRARY_PATH="[C:\\develop\\ollama\\lib\\ollama C:\\develop\\ollama\\lib\\ollama\\rocm]" devices="[{DeviceID:{ID:0 Library:ROCm} Name:ROCm0 Description:AMD Radeon RX 7900 XTX FilteredID: Integrated:false PCIID:03:00.0 TotalMemory:25753026560 FreeMemory:24657264640 ComputeMajor:17 ComputeMinor:0 DriverMajor:60450 DriverMinor:10 LibraryPath:[C:\\develop\\ollama\\lib\\ollama C:\\develop\\ollama\\lib\\ollama\\rocm]}]" time=2025-10-17T14:35:03.863+02:00 level=DEBUG source=runner.go:451 msg="bootstrap discovery took" duration=2.3425196s OLLAMA_LIBRARY_PATH="[C:\\develop\\ollama\\lib\\ollama C:\\develop\\ollama\\lib\\ollama\\rocm]" extra_envs="[GGML_CUDA_INIT=1 HIP_VISIBLE_DEVICES=1]" time=2025-10-17T14:35:03.863+02:00 level=TRACE source=runner.go:171 msg="supported GPU library combinations" supported=map[ROCm:map[C:\develop\ollama\lib\ollama\rocm:map[1:1]]] time=2025-10-17T14:35:03.863+02:00 level=TRACE source=runner.go:182 msg="removing unsupported or overlapping GPU combination" libDir=C:\develop\ollama\lib\ollama\rocm description="AMD Radeon 780M Graphics" compute=gfx1103 pci_id=c8:00.0 time=2025-10-17T14:35:03.863+02:00 level=DEBUG source=runner.go:45 msg="GPU bootstrap discovery took" duration=3.1959757s time=2025-10-17T14:35:03.863+02:00 level=INFO source=types.go:112 msg="inference compute" id=0 library=ROCm compute=gfx1100 name=ROCm1 description="AMD Radeon RX 7900 XTX" libdirs=ollama,rocm driver=60450.10 pci_id=03:00.0 type=discrete total="24.0 GiB" available="23.3 GiB" ``` and then the logs running the inference: ``` [GIN] 2025/10/17 - 14:42:55 | 200 | 0s | 127.0.0.1 | HEAD "/" time=2025-10-17T14:42:55.231+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32 [GIN] 2025/10/17 - 14:42:55 | 200 | 73.6745ms | 127.0.0.1 | POST "/api/show" time=2025-10-17T14:42:55.355+02:00 level=DEBUG source=runner.go:259 msg="refreshing free memory" time=2025-10-17T14:42:55.355+02:00 level=DEBUG source=runner.go:323 msg="unable to refresh all GPUs with existing runners, performing bootstrap discovery" time=2025-10-17T14:42:55.355+02:00 level=DEBUG source=runner.go:448 msg="spawning runner with" OLLAMA_LIBRARY_PATH="[C:\\develop\\ollama\\lib\\ollama C:\\develop\\ollama\\lib\\ollama\\rocm]" extra_envs=[] time=2025-10-17T14:42:55.363+02:00 level=TRACE source=runner.go:529 msg="starting runner for device discovery" env="[ALLUSERSPROFILE=C:\\ProgramData APPDATA=C:\\Users\\jack\\AppData\\Roaming CLIENTNAME=SK-5CD422DMZ6 CommonProgramFiles=C:\\Program Files\\Common Files CommonProgramFiles(x86)=C:\\Program Files (x86)\\Common Files CommonProgramW6432=C:\\Program Files\\Common Files COMPUTERNAME=WIN-UPOOAPUI2PC ComSpec=C:\\WINDOWS\\system32\\cmd.exe DriverData=C:\\Windows\\System32\\Drivers\\DriverData HIP_PATH=C:\\Program Files\\AMD\\ROCm\\6.4\\ HIP_PATH_64=C:\\Program Files\\AMD\\ROCm\\6.4\\ HOMEDRIVE=C: HOMEPATH=\\Users\\jack LOCALAPPDATA=C:\\Users\\jack\\AppData\\Local LOGONSERVER=\\\\WIN-UPOOAPUI2PC NUMBER_OF_PROCESSORS=16 OLLAMA_DEBUG=2 OneDrive=C:\\Users\\jack\\OneDrive OS=Windows_NT PATH=C:\\develop\\ollama\\lib\\ollama;C:\\develop\\ollama\\lib\\ollama;C:\\develop\\ollama\\lib\\ollama\\rocm;C:\\develop\\ollama\\lib\\ollama\\rocm;C:\\WINDOWS\\system32;C:\\WINDOWS;C:\\WINDOWS\\System32\\Wbem;C:\\WINDOWS\\System32\\WindowsPowerShell\\v1.0\\;C:\\WINDOWS\\System32\\OpenSSH\\;C:\\Program Files\\PowerShell\\7\\;C:\\Users\\jack\\AppData\\Local\\Microsoft\\WindowsApps;;C:\\Users\\jack\\.lmstudio\\bin;C:\\applications\\Cinebench;;C:\\develop\\ollama PATHEXT=.COM;.EXE;.BAT;.CMD;.VBS;.VBE;.JS;.JSE;.WSF;.WSH;.MSC;.CPL POWERSHELL_DISTRIBUTION_CHANNEL=MSI:Windows 10 Pro PROCESSOR_ARCHITECTURE=AMD64 PROCESSOR_IDENTIFIER=AMD64 Family 25 Model 117 Stepping 2, AuthenticAMD PROCESSOR_LEVEL=25 PROCESSOR_REVISION=7502 ProgramData=C:\\ProgramData ProgramFiles=C:\\Program Files ProgramFiles(x86)=C:\\Program Files (x86) ProgramW6432=C:\\Program Files PSModulePath=C:\\Users\\jack\\Documents\\WindowsPowerShell\\Modules;C:\\Program Files\\WindowsPowerShell\\Modules;C:\\WINDOWS\\system32\\WindowsPowerShell\\v1.0\\Modules PUBLIC=C:\\Users\\Public SESSIONNAME=RDP-Tcp#0 SystemDrive=C: SystemRoot=C:\\WINDOWS TEMP=C:\\Users\\jack\\AppData\\Local\\Temp TMP=C:\\Users\\jack\\AppData\\Local\\Temp USERDOMAIN=WIN-UPOOAPUI2PC USERDOMAIN_ROAMINGPROFILE=WIN-UPOOAPUI2PC USERNAME=jack USERPROFILE=C:\\Users\\jack windir=C:\\WINDOWS WSLENV=WT_SESSION:WT_PROFILE_ID: WT_PROFILE_ID={61c54bbd-c2c6-5271-96e7-009a87ff44bf} WT_SESSION=f8e91e60-9199-47c5-9982-1702cfd36181 OLLAMA_LIBRARY_PATH=C:\\develop\\ollama\\lib\\ollama;C:\\develop\\ollama\\lib\\ollama\\rocm]" cmd="C:\\develop\\ollama\\ollama.exe runner --ollama-engine --port 51584" time=2025-10-17T14:42:55.384+02:00 level=INFO source=runner.go:1332 msg="starting ollama engine" time=2025-10-17T14:42:55.385+02:00 level=INFO source=runner.go:1367 msg="Server listening on 127.0.0.1:51584" time=2025-10-17T14:42:55.387+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32 time=2025-10-17T14:42:55.387+02:00 level=DEBUG source=gguf.go:578 msg=general.architecture type=string time=2025-10-17T14:42:55.387+02:00 level=DEBUG source=gguf.go:578 msg=tokenizer.ggml.model type=string time=2025-10-17T14:42:55.387+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32 time=2025-10-17T14:42:55.387+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.file_type default=0 time=2025-10-17T14:42:55.388+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.name default="" time=2025-10-17T14:42:55.388+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.description default="" time=2025-10-17T14:42:55.388+02:00 level=INFO source=ggml.go:134 msg="" architecture=llama file_type=unknown name="" description="" num_tensors=0 num_key_values=3 time=2025-10-17T14:42:55.388+02:00 level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=C:\develop\ollama\lib\ollama load_backend: loaded CPU backend from C:\develop\ollama\lib\ollama\ggml-cpu-icelake.dll time=2025-10-17T14:42:55.398+02:00 level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=C:\develop\ollama\lib\ollama\rocm ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 2 ROCm devices: Device 0: AMD Radeon 780M Graphics, gfx1103 (0x1103), VMM: no, Wave Size: 32, ID: 0 Device 1: AMD Radeon RX 7900 XTX, gfx1100 (0x1100), VMM: no, Wave Size: 32, ID: 1 load_backend: loaded ROCm backend from C:\develop\ollama\lib\ollama\rocm\ggml-hip.dll time=2025-10-17T14:42:55.437+02:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.AVX512=1 CPU.0.AVX512_VBMI=1 CPU.0.AVX512_VNNI=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 ROCm.0.NO_VMM=1 ROCm.0.NO_PEER_COPY=1 ROCm.0.PEER_MAX_BATCH_SIZE=128 ROCm.1.NO_VMM=1 ROCm.1.NO_PEER_COPY=1 ROCm.1.PEER_MAX_BATCH_SIZE=128 compiler=cgo(clang) time=2025-10-17T14:42:55.438+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.block_count default=0 time=2025-10-17T14:42:55.438+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.pooling_type default=0 time=2025-10-17T14:42:55.438+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.expert_count default=0 time=2025-10-17T14:42:55.438+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.tokens default="&{size:0 values:[]}" time=2025-10-17T14:42:55.438+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.scores default="&{size:0 values:[]}" time=2025-10-17T14:42:55.438+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.token_type default="&{size:0 values:[]}" time=2025-10-17T14:42:55.438+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.merges default="&{size:0 values:[]}" time=2025-10-17T14:42:55.438+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.add_bos_token default=true time=2025-10-17T14:42:55.438+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.bos_token_id default=0 time=2025-10-17T14:42:55.438+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.add_eos_token default=false time=2025-10-17T14:42:55.438+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.eos_token_id default=0 time=2025-10-17T14:42:55.438+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}" time=2025-10-17T14:42:55.438+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.pre default="" time=2025-10-17T14:42:55.438+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.block_count default=0 time=2025-10-17T14:42:55.438+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.embedding_length default=0 time=2025-10-17T14:42:55.438+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.attention.head_count default=0 time=2025-10-17T14:42:55.438+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.attention.head_count_kv default=0 time=2025-10-17T14:42:55.438+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.attention.key_length default=0 time=2025-10-17T14:42:55.438+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.rope.dimension_count default=0 time=2025-10-17T14:42:55.438+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.attention.layer_norm_rms_epsilon default=0 time=2025-10-17T14:42:55.438+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.rope.freq_base default=100000 time=2025-10-17T14:42:55.438+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.rope.scaling.factor default=1 time=2025-10-17T14:42:55.438+02:00 level=DEBUG source=runner.go:1307 msg="dummy model load took" duration=51.2401ms ggml_hip_mgmt_init located ADLX version 1.4 ggml_backend_cuda_device_get_memory utilizing ADLX memory reporting free: 2129657856 total: 2147483648 ggml_hip_mgmt_init located ADLX version 1.4 ggml_hip_get_device_memory 0] GPU UniqueId: c800 does not match target 03 00 ggml_backend_cuda_device_get_memory utilizing ADLX memory reporting free: 24987566080 total: 25753026560 time=2025-10-17T14:42:55.772+02:00 level=DEBUG source=runner.go:1312 msg="gathering device infos took" duration=334.968ms time=2025-10-17T14:42:55.774+02:00 level=TRACE source=runner.go:548 msg="runner enumerated devices" OLLAMA_LIBRARY_PATH="[C:\\develop\\ollama\\lib\\ollama C:\\develop\\ollama\\lib\\ollama\\rocm]" devices="[{DeviceID:{ID:0 Library:ROCm} Name:ROCm0 Description:AMD Radeon 780M Graphics FilteredID: Integrated:true PCIID:c8:00.0 TotalMemory:2147483648 FreeMemory:2129657856 ComputeMajor:17 ComputeMinor:3 DriverMajor:60450 DriverMinor:10 LibraryPath:[C:\\develop\\ollama\\lib\\ollama C:\\develop\\ollama\\lib\\ollama\\rocm]} {DeviceID:{ID:1 Library:ROCm} Name:ROCm1 Description:AMD Radeon RX 7900 XTX FilteredID: Integrated:false PCIID:03:00.0 TotalMemory:25753026560 FreeMemory:24987566080 ComputeMajor:17 ComputeMinor:0 DriverMajor:60450 DriverMinor:10 LibraryPath:[C:\\develop\\ollama\\lib\\ollama C:\\develop\\ollama\\lib\\ollama\\rocm]}]" time=2025-10-17T14:42:55.774+02:00 level=DEBUG source=runner.go:451 msg="bootstrap discovery took" duration=419.13ms OLLAMA_LIBRARY_PATH="[C:\\develop\\ollama\\lib\\ollama C:\\develop\\ollama\\lib\\ollama\\rocm]" extra_envs=[] time=2025-10-17T14:42:55.775+02:00 level=DEBUG source=runner.go:45 msg="overall device VRAM discovery took" duration=419.867ms time=2025-10-17T14:42:55.775+02:00 level=DEBUG source=sched.go:195 msg="updating default concurrency" OLLAMA_MAX_LOADED_MODELS=3 gpu_count=1 time=2025-10-17T14:42:55.794+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32 time=2025-10-17T14:42:55.795+02:00 level=DEBUG source=sched.go:215 msg="loading first model" model=C:\Users\jack\.ollama\models\blobs\sha256-e7b273f9636059a689e3ddcab3716e4f65abe0143ac978e46673ad0e52d09efb time=2025-10-17T14:42:55.866+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32 time=2025-10-17T14:42:55.866+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=gptoss.pooling_type default=0 time=2025-10-17T14:42:55.866+02:00 level=INFO source=cpu_windows.go:139 msg=packages count=1 time=2025-10-17T14:42:55.866+02:00 level=INFO source=cpu_windows.go:186 msg="" package=0 cores=8 efficiency=0 threads=16 time=2025-10-17T14:42:55.866+02:00 level=INFO source=server.go:216 msg="enabling flash attention" time=2025-10-17T14:42:55.866+02:00 level=DEBUG source=server.go:331 msg="adding gpu dependency paths" paths="[C:\\develop\\ollama\\lib\\ollama C:\\develop\\ollama\\lib\\ollama\\rocm C:\\develop\\ollama\\lib\\ollama\\rocm]" time=2025-10-17T14:42:55.866+02:00 level=INFO source=server.go:400 msg="starting runner" cmd="C:\\develop\\ollama\\ollama.exe runner --ollama-engine --model C:\\Users\\jack\\.ollama\\models\\blobs\\sha256-e7b273f9636059a689e3ddcab3716e4f65abe0143ac978e46673ad0e52d09efb --port 51587" time=2025-10-17T14:42:55.866+02:00 level=DEBUG source=server.go:401 msg=subprocess HIP_PATH="C:\\Program Files\\AMD\\ROCm\\6.4\\" HIP_PATH_64="C:\\Program Files\\AMD\\ROCm\\6.4\\" OLLAMA_DEBUG=2 OLLAMA_MAX_LOADED_MODELS=3 PATH="C:\\develop\\ollama\\lib\\ollama;C:\\develop\\ollama\\lib\\ollama\\rocm;C:\\develop\\ollama\\lib\\ollama\\rocm;C:\\develop\\ollama\\lib\\ollama;C:\\WINDOWS\\system32;C:\\WINDOWS;C:\\WINDOWS\\System32\\Wbem;C:\\WINDOWS\\System32\\WindowsPowerShell\\v1.0\\;C:\\WINDOWS\\System32\\OpenSSH\\;C:\\Program Files\\PowerShell\\7\\;C:\\Users\\jack\\AppData\\Local\\Microsoft\\WindowsApps;;C:\\Users\\jack\\.lmstudio\\bin;C:\\applications\\Cinebench;;C:\\develop\\ollama;C:\\develop\\ollama\\lib\\ollama" OLLAMA_LIBRARY_PATH=C:\develop\ollama\lib\ollama;C:\develop\ollama\lib\ollama;C:\develop\ollama\lib\ollama\rocm;C:\develop\ollama\lib\ollama\rocm HIP_VISIBLE_DEVICES=1 time=2025-10-17T14:42:55.866+02:00 level=INFO source=server.go:676 msg="loading model" "model layers"=25 requested=-1 time=2025-10-17T14:42:55.866+02:00 level=INFO source=cpu_windows.go:139 msg=packages count=1 time=2025-10-17T14:42:55.866+02:00 level=INFO source=cpu_windows.go:186 msg="" package=0 cores=8 efficiency=0 threads=16 time=2025-10-17T14:42:55.866+02:00 level=INFO source=server.go:682 msg="system memory" total="61.8 GiB" free="52.9 GiB" free_swap="54.1 GiB" time=2025-10-17T14:42:55.866+02:00 level=INFO source=server.go:690 msg="gpu memory" id=0 library=ROCm available="1.5 GiB" free="2.0 GiB" minimum="457.0 MiB" overhead="0 B" time=2025-10-17T14:42:55.866+02:00 level=TRACE source=server.go:890 msg="layer to assign" layer=0 size="0 B" time=2025-10-17T14:42:55.866+02:00 level=TRACE source=server.go:890 msg="layer to assign" layer=1 size="0 B" time=2025-10-17T14:42:55.866+02:00 level=TRACE source=server.go:890 msg="layer to assign" layer=2 size="0 B" time=2025-10-17T14:42:55.866+02:00 level=TRACE source=server.go:890 msg="layer to assign" layer=3 size="0 B" time=2025-10-17T14:42:55.866+02:00 level=TRACE source=server.go:890 msg="layer to assign" layer=4 size="0 B" time=2025-10-17T14:42:55.866+02:00 level=TRACE source=server.go:890 msg="layer to assign" layer=5 size="0 B" time=2025-10-17T14:42:55.866+02:00 level=TRACE source=server.go:890 msg="layer to assign" layer=6 size="0 B" time=2025-10-17T14:42:55.866+02:00 level=TRACE source=server.go:890 msg="layer to assign" layer=7 size="0 B" time=2025-10-17T14:42:55.866+02:00 level=TRACE source=server.go:890 msg="layer to assign" layer=8 size="0 B" time=2025-10-17T14:42:55.866+02:00 level=TRACE source=server.go:890 msg="layer to assign" layer=9 size="0 B" time=2025-10-17T14:42:55.866+02:00 level=TRACE source=server.go:890 msg="layer to assign" layer=10 size="0 B" time=2025-10-17T14:42:55.866+02:00 level=TRACE source=server.go:890 msg="layer to assign" layer=11 size="0 B" time=2025-10-17T14:42:55.866+02:00 level=TRACE source=server.go:890 msg="layer to assign" layer=12 size="0 B" time=2025-10-17T14:42:55.866+02:00 level=TRACE source=server.go:890 msg="layer to assign" layer=13 size="0 B" .... ime=2025-10-17T14:43:01.750+02:00 level=TRACE source=runner.go:745 msg="computeBatch: EOS" batchID=33 seqIdx=0 time=2025-10-17T14:43:01.750+02:00 level=TRACE source=runner.go:630 msg="computeBatch: outputs are ready" batchID=33 time=2025-10-17T14:43:01.750+02:00 level=TRACE source=runner.go:625 msg="computeBatch: inputs are ready" batchID=34 time=2025-10-17T14:43:01.750+02:00 level=TRACE source=runner.go:698 msg="computeBatch: signaling computeStartedCh" batchID=34 time=2025-10-17T14:43:01.750+02:00 level=TRACE source=runner.go:450 msg="forwardBatch compute started, setting up next batch" pendingBatch.id=34 id=35 time=2025-10-17T14:43:01.818+02:00 level=TRACE source=runner.go:706 msg="computeBatch: logits ready" batchID=34 time=2025-10-17T14:43:01.818+02:00 level=TRACE source=runner.go:711 msg="computeBatch: decoding" batchID=34 time=2025-10-17T14:43:01.818+02:00 level=TRACE source=runner.go:630 msg="computeBatch: outputs are ready" batchID=34 time=2025-10-17T14:43:01.849+02:00 level=TRACE source=bytepairencoding.go:244 msg=encoded string="<|start|>system<|message|>You are ChatGPT, a large language model trained by OpenAI.\nKnowledge cutoff: 2024-06\nCurrent date: 2025-10-17\n\nReasoning: medium\n\n# Valid channels: analysis, commentary, final. Channel must be included for every message.<|end|><|start|>user<|message|>hello<|end|><|start|>assistant<|channel|>analysis<|message|>The user says \"hello\". They want a response. We respond politely.<|end|><|start|>assistant<|channel|>final<|message|>Hello! How can I help you today?" ids="[200006 17360 200008 3575 553 17554 162016 11 261 4410 6439 2359 22203 656 7788 17527 558 87447 100594 25 220 1323 19 12 3218 198 6576 3521 25 220 1323 20 12 702 12 1422 279 30377 289 25 14093 279 2 13888 18403 25 8450 11 49159 11 1721 13 21030 2804 413 7360 395 1753 3176 13 200007 200006 1428 200008 24912 200007 200006 173781 200005 35644 200008 976 1825 5003 392 24912 4050 3164 1682 261 3322 13 1416 9570 167705 13 200007 200006 173781 200005 17196 200008 13225 0 3253 665 357 1652 481 4044 30]" [GIN] 2025/10/17 - 14:43:01 | 200 | 6.6153224s | 127.0.0.1 | POST "/api/generate" time=2025-10-17T14:43:01.850+02:00 level=DEBUG source=sched.go:502 msg="context for request finished" time=2025-10-17T14:43:01.850+02:00 level=DEBUG source=sched.go:294 msg="runner with non-zero duration has gone idle, adding timer" runner.name=registry.ollama.ai/library/gpt-oss:20b runner.inference="[{ID:0 Library:ROCm}]" runner.size="13.3 GiB" runner.vram="1.5 GiB" runner.parallel=1 runner.pid=8044 runner.model=C:\Users\jack\.ollama\models\blobs\sha256-e7b273f9636059a689e3ddcab3716e4f65abe0143ac978e46673ad0e52d09efb runner.num_ctx=8192 duration=5m0s time=2025-10-17T14:43:01.850+02:00 level=DEBUG source=sched.go:312 msg="after processing request finished event" runner.name=registry.ollama.ai/library/gpt-oss:20b runner.inference="[{ID:0 Library:ROCm}]" runner.size="13.3 GiB" runner.vram="1.5 GiB" runner.parallel=1 runner.pid=8044 runner.model=C:\Users\jack\.ollama\models\blobs\sha256-e7b273f9636059a689e3ddcab3716e4f65abe0143ac978e46673ad0e52d09efb runner.num_ctx=8192 refCount=0 ```

GiteaMirror commented

2026-04-29 07:37:05 -05:00

@dhiltgen commented on GitHub (Oct 17, 2025):

@jack-running the GPU discovery output looks like it found your discrete GPU. Could you include a bit more of the portion of the log you omitted in the .... which looks something like this?

time=2025-10-17T09:05:41.156-07:00 level=DEBUG source=server.go:915 msg="available gpu" id=0 library=ROCm "available layer vram"="23.4 GiB" backoff=0.00 minimum="457.0 MiB" overhead="0 B" graph="146.8 MiB"
time=2025-10-17T09:05:41.157-07:00 level=DEBUG source=server.go:732 msg="new layout created" layers="25[ID:0 Layers:25(0..24)]"
time=2025-10-17T09:05:41.157-07:00 level=INFO source=runner.go:1205 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:8192 KvCacheType: NumThreads:12 GPULayers:25[ID:0 Layers:25(0..24)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2025-10-17T09:05:41.157-07:00 level=INFO source=ggml.go:480 msg="offloading 24 repeating layers to GPU"
time=2025-10-17T09:05:41.157-07:00 level=INFO source=ggml.go:487 msg="offloading output layer to GPU"
time=2025-10-17T09:05:41.157-07:00 level=INFO source=ggml.go:492 msg="offloaded 25/25 layers to GPU"
time=2025-10-17T09:05:41.157-07:00 level=INFO source=device.go:206 msg="model weights" device=ROCm0 size="11.8 GiB"
time=2025-10-17T09:05:41.157-07:00 level=INFO source=device.go:211 msg="model weights" device=CPU size="1.1 GiB"
time=2025-10-17T09:05:41.157-07:00 level=INFO source=device.go:217 msg="kv cache" device=ROCm0 size="300.0 MiB"
time=2025-10-17T09:05:41.157-07:00 level=INFO source=device.go:228 msg="compute graph" device=ROCm0 size="146.8 MiB"
time=2025-10-17T09:05:41.157-07:00 level=INFO source=device.go:233 msg="compute graph" device=CPU size="5.6 MiB"
time=2025-10-17T09:05:41.157-07:00 level=INFO source=device.go:238 msg="total memory" size="13.3 GiB"
time=2025-10-17T09:05:41.157-07:00 level=INFO source=sched.go:482 msg="loaded runners" count=1
time=2025-10-17T09:05:41.157-07:00 level=INFO source=server.go:1272 msg="waiting for llama runner to start responding"
time=2025-10-17T09:05:41.157-07:00 level=INFO source=server.go:1306 msg="waiting for server to become available" status="llm server loading model"
time=2025-10-17T09:05:41.408-07:00 level=DEBUG source=server.go:1316 msg="model load progress 0.04"

@dhiltgen commented on GitHub (Oct 17, 2025): @jack-running the GPU discovery output looks like it found your discrete GPU. Could you include a bit more of the portion of the log you omitted in the `....` which looks something like this? ``` time=2025-10-17T09:05:41.156-07:00 level=DEBUG source=server.go:915 msg="available gpu" id=0 library=ROCm "available layer vram"="23.4 GiB" backoff=0.00 minimum="457.0 MiB" overhead="0 B" graph="146.8 MiB" time=2025-10-17T09:05:41.157-07:00 level=DEBUG source=server.go:732 msg="new layout created" layers="25[ID:0 Layers:25(0..24)]" time=2025-10-17T09:05:41.157-07:00 level=INFO source=runner.go:1205 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:8192 KvCacheType: NumThreads:12 GPULayers:25[ID:0 Layers:25(0..24)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2025-10-17T09:05:41.157-07:00 level=INFO source=ggml.go:480 msg="offloading 24 repeating layers to GPU" time=2025-10-17T09:05:41.157-07:00 level=INFO source=ggml.go:487 msg="offloading output layer to GPU" time=2025-10-17T09:05:41.157-07:00 level=INFO source=ggml.go:492 msg="offloaded 25/25 layers to GPU" time=2025-10-17T09:05:41.157-07:00 level=INFO source=device.go:206 msg="model weights" device=ROCm0 size="11.8 GiB" time=2025-10-17T09:05:41.157-07:00 level=INFO source=device.go:211 msg="model weights" device=CPU size="1.1 GiB" time=2025-10-17T09:05:41.157-07:00 level=INFO source=device.go:217 msg="kv cache" device=ROCm0 size="300.0 MiB" time=2025-10-17T09:05:41.157-07:00 level=INFO source=device.go:228 msg="compute graph" device=ROCm0 size="146.8 MiB" time=2025-10-17T09:05:41.157-07:00 level=INFO source=device.go:233 msg="compute graph" device=CPU size="5.6 MiB" time=2025-10-17T09:05:41.157-07:00 level=INFO source=device.go:238 msg="total memory" size="13.3 GiB" time=2025-10-17T09:05:41.157-07:00 level=INFO source=sched.go:482 msg="loaded runners" count=1 time=2025-10-17T09:05:41.157-07:00 level=INFO source=server.go:1272 msg="waiting for llama runner to start responding" time=2025-10-17T09:05:41.157-07:00 level=INFO source=server.go:1306 msg="waiting for server to become available" status="llm server loading model" time=2025-10-17T09:05:41.408-07:00 level=DEBUG source=server.go:1316 msg="model load progress 0.04" ```

GiteaMirror commented

2026-04-29 07:37:05 -05:00

@dhiltgen commented on GitHub (Oct 17, 2025):

@thk-socal can you try running with OLLAMA_DEBUG=2 and not setting any overrides for visible devices or the LLM Library to use, and if it still doesn't find your GPU, share the logs up to the point of "inference compute" reporting CPU?

@dhiltgen commented on GitHub (Oct 17, 2025): @thk-socal can you try running with OLLAMA_DEBUG=2 and not setting any overrides for visible devices or the LLM Library to use, and if it still doesn't find your GPU, share the logs up to the point of "inference compute" reporting CPU?

GiteaMirror commented

2026-04-29 07:37:06 -05:00

@thk-socal commented on GitHub (Oct 17, 2025):

@thk-socal can you try running with OLLAMA_DEBUG=2 and not setting any overrides for visible devices or the LLM Library to use, and if it still doesn't find your GPU, share the logs up to the point of "inference compute" reporting CPU?

I removed all the previous environment variables and mappings that I have been using for a long time. Went very generic and now it appears to be doing discovery correctly etc... So the new auto configuring is working well now. The only environment variable I have set is TZ now and one volume mapped for the docker container for the ollama root folder.

Tried 0.12.3, 0.12.5 and 0.12.6 all with success now for identifying, but the models will NOT load and run. Working on that data later when I have a moment.

@thk-socal commented on GitHub (Oct 17, 2025): > [@thk-socal](https://github.com/thk-socal) can you try running with OLLAMA_DEBUG=2 and not setting any overrides for visible devices or the LLM Library to use, and if it still doesn't find your GPU, share the logs up to the point of "inference compute" reporting CPU? I removed all the previous environment variables and mappings that I have been using for a long time. Went very generic and now it appears to be doing discovery correctly etc... So the new auto configuring is working well now. The only environment variable I have set is TZ now and one volume mapped for the docker container for the ollama root folder. Tried 0.12.3, 0.12.5 and 0.12.6 all with success now for identifying, but the models will **NOT load** and run. Working on that data later when I have a moment.

GiteaMirror commented

2026-04-29 07:37:10 -05:00

@thk-socal commented on GitHub (Oct 17, 2025):

and now back to this:

time=2025-10-17T17:49:34.585Z level=INFO source=routes.go:1511 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:DEBUG-4 OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
time=2025-10-17T17:49:34.589Z level=INFO source=images.go:522 msg="total blobs: 38"
time=2025-10-17T17:49:34.590Z level=INFO source=images.go:529 msg="total unused blobs removed: 0"
time=2025-10-17T17:49:34.591Z level=INFO source=routes.go:1564 msg="Listening on [::]:11434 (version 0.12.6)"
time=2025-10-17T17:49:34.591Z level=DEBUG source=sched.go:123 msg="starting llm scheduler"
time=2025-10-17T17:49:34.591Z level=INFO source=runner.go:80 msg="discovering available GPUs..."
time=2025-10-17T17:49:34.591Z level=DEBUG source=runner.go:448 msg="spawning runner with" OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/rocm]" extra_envs=[]
time=2025-10-17T17:49:34.592Z level=TRACE source=runner.go:529 msg="starting runner for device discovery" env="[PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin HOSTNAME=compute-01 TZ=America/Los_Angeles OLLAMA_DEBUG=2 LD_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama:/usr/lib/ollama/rocm:/usr/lib/ollama/rocm:/usr/local/nvidia/lib:/usr/local/nvidia/lib64 NVIDIA_DRIVER_CAPABILITIES=compute,utility NVIDIA_VISIBLE_DEVICES=all OLLAMA_HOST=0.0.0.0:11434 HOME=/root OLLAMA_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/rocm]" cmd="/usr/bin/ollama runner --ollama-engine --port 41091"
time=2025-10-17T17:49:34.611Z level=INFO source=runner.go:1332 msg="starting ollama engine"
time=2025-10-17T17:49:34.612Z level=INFO source=runner.go:1367 msg="Server listening on 127.0.0.1:41091"
time=2025-10-17T17:49:34.615Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32
time=2025-10-17T17:49:34.615Z level=DEBUG source=gguf.go:578 msg=general.architecture type=string
time=2025-10-17T17:49:34.615Z level=DEBUG source=gguf.go:578 msg=tokenizer.ggml.model type=string
time=2025-10-17T17:49:34.616Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32
time=2025-10-17T17:49:34.616Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.file_type default=0
time=2025-10-17T17:49:34.616Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.name default=""
time=2025-10-17T17:49:34.616Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.description default=""
time=2025-10-17T17:49:34.616Z level=INFO source=ggml.go:134 msg="" architecture=llama file_type=unknown name="" description="" num_tensors=0 num_key_values=3
time=2025-10-17T17:49:34.616Z level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/lib/ollama
load_backend: loaded CPU backend from /usr/lib/ollama/libggml-cpu-icelake.so
time=2025-10-17T17:49:34.626Z level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/lib/ollama/rocm
time=2025-10-17T17:50:04.622Z level=INFO source=runner.go:545 msg="failure during GPU discovery" OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/rocm]" extra_envs=[] error="failed to finish discovery before timeout"
time=2025-10-17T17:50:04.622Z level=TRACE source=runner.go:548 msg="runner enumerated devices" OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/rocm]" devices=[]
time=2025-10-17T17:50:04.622Z level=DEBUG source=runner.go:451 msg="bootstrap discovery took" duration=30.030444884s OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/rocm]" extra_envs=[]
time=2025-10-17T17:50:04.622Z level=DEBUG source=runner.go:118 msg="filtering out unsupported or overlapping GPU library combinations" count=0
time=2025-10-17T17:50:04.622Z level=TRACE source=runner.go:171 msg="supported GPU library combinations" supported=map[]
time=2025-10-17T17:50:04.622Z level=DEBUG source=runner.go:45 msg="GPU bootstrap discovery took" duration=30.030942458s
time=2025-10-17T17:50:04.622Z level=INFO source=types.go:129 msg="inference compute" id=cpu library=cpu compute="" name=cpu description=cpu libdirs=ollama driver="" pci_id="" type="" total="31.0 GiB" available="28.9 GiB"
time=2025-10-17T17:50:04.622Z level=INFO source=routes.go:1605 msg="entering low vram mode" "total vram"="0 B" threshold="20.0 GiB"

@thk-socal commented on GitHub (Oct 17, 2025): and now back to this: ``` time=2025-10-17T17:49:34.585Z level=INFO source=routes.go:1511 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:DEBUG-4 OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]" time=2025-10-17T17:49:34.589Z level=INFO source=images.go:522 msg="total blobs: 38" time=2025-10-17T17:49:34.590Z level=INFO source=images.go:529 msg="total unused blobs removed: 0" time=2025-10-17T17:49:34.591Z level=INFO source=routes.go:1564 msg="Listening on [::]:11434 (version 0.12.6)" time=2025-10-17T17:49:34.591Z level=DEBUG source=sched.go:123 msg="starting llm scheduler" time=2025-10-17T17:49:34.591Z level=INFO source=runner.go:80 msg="discovering available GPUs..." time=2025-10-17T17:49:34.591Z level=DEBUG source=runner.go:448 msg="spawning runner with" OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/rocm]" extra_envs=[] time=2025-10-17T17:49:34.592Z level=TRACE source=runner.go:529 msg="starting runner for device discovery" env="[PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin HOSTNAME=compute-01 TZ=America/Los_Angeles OLLAMA_DEBUG=2 LD_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama:/usr/lib/ollama/rocm:/usr/lib/ollama/rocm:/usr/local/nvidia/lib:/usr/local/nvidia/lib64 NVIDIA_DRIVER_CAPABILITIES=compute,utility NVIDIA_VISIBLE_DEVICES=all OLLAMA_HOST=0.0.0.0:11434 HOME=/root OLLAMA_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/rocm]" cmd="/usr/bin/ollama runner --ollama-engine --port 41091" time=2025-10-17T17:49:34.611Z level=INFO source=runner.go:1332 msg="starting ollama engine" time=2025-10-17T17:49:34.612Z level=INFO source=runner.go:1367 msg="Server listening on 127.0.0.1:41091" time=2025-10-17T17:49:34.615Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32 time=2025-10-17T17:49:34.615Z level=DEBUG source=gguf.go:578 msg=general.architecture type=string time=2025-10-17T17:49:34.615Z level=DEBUG source=gguf.go:578 msg=tokenizer.ggml.model type=string time=2025-10-17T17:49:34.616Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32 time=2025-10-17T17:49:34.616Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.file_type default=0 time=2025-10-17T17:49:34.616Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.name default="" time=2025-10-17T17:49:34.616Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.description default="" time=2025-10-17T17:49:34.616Z level=INFO source=ggml.go:134 msg="" architecture=llama file_type=unknown name="" description="" num_tensors=0 num_key_values=3 time=2025-10-17T17:49:34.616Z level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/lib/ollama load_backend: loaded CPU backend from /usr/lib/ollama/libggml-cpu-icelake.so time=2025-10-17T17:49:34.626Z level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/lib/ollama/rocm time=2025-10-17T17:50:04.622Z level=INFO source=runner.go:545 msg="failure during GPU discovery" OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/rocm]" extra_envs=[] error="failed to finish discovery before timeout" time=2025-10-17T17:50:04.622Z level=TRACE source=runner.go:548 msg="runner enumerated devices" OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/rocm]" devices=[] time=2025-10-17T17:50:04.622Z level=DEBUG source=runner.go:451 msg="bootstrap discovery took" duration=30.030444884s OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/rocm]" extra_envs=[] time=2025-10-17T17:50:04.622Z level=DEBUG source=runner.go:118 msg="filtering out unsupported or overlapping GPU library combinations" count=0 time=2025-10-17T17:50:04.622Z level=TRACE source=runner.go:171 msg="supported GPU library combinations" supported=map[] time=2025-10-17T17:50:04.622Z level=DEBUG source=runner.go:45 msg="GPU bootstrap discovery took" duration=30.030942458s time=2025-10-17T17:50:04.622Z level=INFO source=types.go:129 msg="inference compute" id=cpu library=cpu compute="" name=cpu description=cpu libdirs=ollama driver="" pci_id="" type="" total="31.0 GiB" available="28.9 GiB" time=2025-10-17T17:50:04.622Z level=INFO source=routes.go:1605 msg="entering low vram mode" "total vram"="0 B" threshold="20.0 GiB" ```

GiteaMirror commented

2026-04-29 07:37:14 -05:00

@dhiltgen commented on GitHub (Oct 17, 2025):

@thk-socal it looks like ROCm is taking a really long time to initialize your GPU and occasionally hitting a timeout. I'll look at increasing the timeout and see if we can do anything to speed it up.

#12681 should mitigate this while we look to find ways to make it faster.

Until that ships in a future release, my best guess is the GPU has gone into a low power/idle state and takes a long time to warm back up. If you can find a way to warm it up before Ollama starts, or immediately restart Ollama that should workaround this until then.

@dhiltgen commented on GitHub (Oct 17, 2025): @thk-socal it looks like ROCm is taking a really long time to initialize your GPU and occasionally hitting a timeout. I'll look at increasing the timeout and see if we can do anything to speed it up. #12681 should mitigate this while we look to find ways to make it faster. Until that ships in a future release, my best guess is the GPU has gone into a low power/idle state and takes a long time to warm back up. If you can find a way to warm it up before Ollama starts, or immediately restart Ollama that should workaround this until then.

GiteaMirror commented

2026-04-29 07:37:17 -05:00

@thk-socal commented on GitHub (Oct 17, 2025):

I have also rebuilt the host from scratch now and will watch how it does. 24.04.03 with 6.14 HWE kernel and 7.0.2 drivers installed on the host. New DEBUG logs from first run.

time=2025-10-17T20:38:00.100Z level=INFO source=routes.go:1511 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:DEBUG-4 OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]" time=2025-10-17T20:38:00.101Z level=INFO source=images.go:522 msg="total blobs: 0" time=2025-10-17T20:38:00.101Z level=INFO source=images.go:529 msg="total unused blobs removed: 0" time=2025-10-17T20:38:00.101Z level=INFO source=routes.go:1564 msg="Listening on [::]:11434 (version 0.12.6)" time=2025-10-17T20:38:00.101Z level=DEBUG source=sched.go:123 msg="starting llm scheduler" time=2025-10-17T20:38:00.101Z level=INFO source=runner.go:80 msg="discovering available GPUs..." time=2025-10-17T20:38:00.101Z level=DEBUG source=runner.go:448 msg="spawning runner with" OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/rocm]" extra_envs=[] time=2025-10-17T20:38:00.101Z level=TRACE source=runner.go:529 msg="starting runner for device discovery" env="[PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin HOSTNAME=compute-01 TZ=America/Los_Angeles OLLAMA_DEBUG=2 LD_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama:/usr/lib/ollama/rocm:/usr/lib/ollama/rocm:/usr/local/nvidia/lib:/usr/local/nvidia/lib64 NVIDIA_DRIVER_CAPABILITIES=compute,utility NVIDIA_VISIBLE_DEVICES=all OLLAMA_HOST=0.0.0.0:11434 HOME=/root OLLAMA_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/rocm]" cmd="/usr/bin/ollama runner --ollama-engine --port 46473" time=2025-10-17T20:38:00.109Z level=INFO source=runner.go:1332 msg="starting ollama engine" time=2025-10-17T20:38:00.109Z level=INFO source=runner.go:1367 msg="Server listening on 127.0.0.1:46473" time=2025-10-17T20:38:00.114Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32 time=2025-10-17T20:38:00.114Z level=DEBUG source=gguf.go:578 msg=general.architecture type=string time=2025-10-17T20:38:00.114Z level=DEBUG source=gguf.go:578 msg=tokenizer.ggml.model type=string time=2025-10-17T20:38:00.114Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32 time=2025-10-17T20:38:00.114Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.file_type default=0 time=2025-10-17T20:38:00.114Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.name default="" time=2025-10-17T20:38:00.114Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.description default="" time=2025-10-17T20:38:00.114Z level=INFO source=ggml.go:134 msg="" architecture=llama file_type=unknown name="" description="" num_tensors=0 num_key_values=3 time=2025-10-17T20:38:00.114Z level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/lib/ollama load_backend: loaded CPU backend from /usr/lib/ollama/libggml-cpu-icelake.so time=2025-10-17T20:38:00.120Z level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/lib/ollama/rocm /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon Graphics, gfx1100 (0x1100), VMM: no, Wave Size: 32, ID: GPU-44d96d7bf1798b3e load_backend: loaded ROCm backend from /usr/lib/ollama/rocm/libggml-hip.so time=2025-10-17T20:38:01.535Z level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.AVX512=1 CPU.0.AVX512_VBMI=1 CPU.0.AVX512_VNNI=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 ROCm.0.NO_VMM=1 ROCm.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc) time=2025-10-17T20:38:01.535Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.block_count default=0 time=2025-10-17T20:38:01.536Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.pooling_type default=0 time=2025-10-17T20:38:01.536Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.expert_count default=0 time=2025-10-17T20:38:01.536Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.tokens default="&{size:0 values:[]}" time=2025-10-17T20:38:01.536Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.scores default="&{size:0 values:[]}" time=2025-10-17T20:38:01.536Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.token_type default="&{size:0 values:[]}" time=2025-10-17T20:38:01.536Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.merges default="&{size:0 values:[]}" time=2025-10-17T20:38:01.536Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.add_bos_token default=true time=2025-10-17T20:38:01.536Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.bos_token_id default=0 time=2025-10-17T20:38:01.536Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.add_eos_token default=false time=2025-10-17T20:38:01.536Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.eos_token_id default=0 time=2025-10-17T20:38:01.536Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}" time=2025-10-17T20:38:01.536Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.pre default="" time=2025-10-17T20:38:01.536Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.block_count default=0 time=2025-10-17T20:38:01.536Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.embedding_length default=0 time=2025-10-17T20:38:01.536Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.attention.head_count default=0 time=2025-10-17T20:38:01.536Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.attention.head_count_kv default=0 time=2025-10-17T20:38:01.536Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.attention.key_length default=0 time=2025-10-17T20:38:01.536Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.rope.dimension_count default=0 time=2025-10-17T20:38:01.536Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.attention.layer_norm_rms_epsilon default=0 time=2025-10-17T20:38:01.536Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.rope.freq_base default=100000 time=2025-10-17T20:38:01.536Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.rope.scaling.factor default=1 time=2025-10-17T20:38:01.536Z level=DEBUG source=runner.go:1307 msg="dummy model load took" duration=1.423881543s time=2025-10-17T20:38:01.536Z level=DEBUG source=runner.go:1312 msg="gathering device infos took" duration=26.978µs time=2025-10-17T20:38:01.536Z level=TRACE source=runner.go:548 msg="runner enumerated devices" OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/rocm]" devices="[{DeviceID:{ID:GPU-44d96d7bf1798b3e Library:ROCm} Name:ROCm0 Description:AMD Radeon Graphics FilteredID: Integrated:false PCIID:06:00.0 TotalMemory:25753026560 FreeMemory:25715277824 ComputeMajor:17 ComputeMinor:0 DriverMajor:60342 DriverMinor:13 LibraryPath:[/usr/lib/ollama /usr/lib/ollama/rocm]}]" time=2025-10-17T20:38:01.536Z level=DEBUG source=runner.go:451 msg="bootstrap discovery took" duration=1.43527571s OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/rocm]" extra_envs=[] time=2025-10-17T20:38:01.536Z level=DEBUG source=runner.go:118 msg="filtering out unsupported or overlapping GPU library combinations" count=1 time=2025-10-17T20:38:01.536Z level=DEBUG source=runner.go:130 msg="verifying GPU is supported" library=/usr/lib/ollama/rocm description="AMD Radeon Graphics" compute=gfx1100 pci_id=06:00.0 time=2025-10-17T20:38:01.536Z level=DEBUG source=runner.go:448 msg="spawning runner with" OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/rocm]" extra_envs="[GGML_CUDA_INIT=1 ROCR_VISIBLE_DEVICES=GPU-44d96d7bf1798b3e]" time=2025-10-17T20:38:01.537Z level=TRACE source=runner.go:529 msg="starting runner for device discovery" env="[PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin HOSTNAME=compute-01 TZ=America/Los_Angeles OLLAMA_DEBUG=2 LD_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama:/usr/lib/ollama/rocm:/usr/lib/ollama/rocm:/usr/local/nvidia/lib:/usr/local/nvidia/lib64 NVIDIA_DRIVER_CAPABILITIES=compute,utility NVIDIA_VISIBLE_DEVICES=all OLLAMA_HOST=0.0.0.0:11434 HOME=/root OLLAMA_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/rocm GGML_CUDA_INIT=1 ROCR_VISIBLE_DEVICES=GPU-44d96d7bf1798b3e]" cmd="/usr/bin/ollama runner --ollama-engine --port 46607" time=2025-10-17T20:38:01.544Z level=INFO source=runner.go:1332 msg="starting ollama engine" time=2025-10-17T20:38:01.544Z level=INFO source=runner.go:1367 msg="Server listening on 127.0.0.1:46607" time=2025-10-17T20:38:01.547Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32 time=2025-10-17T20:38:01.547Z level=DEBUG source=gguf.go:578 msg=general.architecture type=string time=2025-10-17T20:38:01.547Z level=DEBUG source=gguf.go:578 msg=tokenizer.ggml.model type=string time=2025-10-17T20:38:01.547Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32 time=2025-10-17T20:38:01.547Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.file_type default=0 time=2025-10-17T20:38:01.547Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.name default="" time=2025-10-17T20:38:01.547Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.description default="" time=2025-10-17T20:38:01.547Z level=INFO source=ggml.go:134 msg="" architecture=llama file_type=unknown name="" description="" num_tensors=0 num_key_values=3 time=2025-10-17T20:38:01.547Z level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/lib/ollama load_backend: loaded CPU backend from /usr/lib/ollama/libggml-cpu-icelake.so time=2025-10-17T20:38:01.551Z level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/lib/ollama/rocm /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 ROCm devices: ggml_cuda_init: initializing rocBLAS on device 0 ggml_cuda_init: rocBLAS initialized on device 0 Device 0: AMD Radeon Graphics, gfx1100 (0x1100), VMM: no, Wave Size: 32, ID: GPU-44d96d7bf1798b3e load_backend: loaded ROCm backend from /usr/lib/ollama/rocm/libggml-hip.so time=2025-10-17T20:38:03.317Z level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.AVX512=1 CPU.0.AVX512_VBMI=1 CPU.0.AVX512_VNNI=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 ROCm.0.NO_VMM=1 ROCm.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc) time=2025-10-17T20:38:03.317Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.block_count default=0 time=2025-10-17T20:38:03.317Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.pooling_type default=0 time=2025-10-17T20:38:03.317Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.expert_count default=0 time=2025-10-17T20:38:03.317Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.tokens default="&{size:0 values:[]}" time=2025-10-17T20:38:03.317Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.scores default="&{size:0 values:[]}" time=2025-10-17T20:38:03.317Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.token_type default="&{size:0 values:[]}" time=2025-10-17T20:38:03.317Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.merges default="&{size:0 values:[]}" time=2025-10-17T20:38:03.317Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.add_bos_token default=true time=2025-10-17T20:38:03.317Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.bos_token_id default=0 time=2025-10-17T20:38:03.317Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.add_eos_token default=false time=2025-10-17T20:38:03.317Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.eos_token_id default=0 time=2025-10-17T20:38:03.317Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}" time=2025-10-17T20:38:03.317Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.pre default="" time=2025-10-17T20:38:03.318Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.block_count default=0 time=2025-10-17T20:38:03.318Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.embedding_length default=0 time=2025-10-17T20:38:03.318Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.attention.head_count default=0 time=2025-10-17T20:38:03.318Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.attention.head_count_kv default=0 time=2025-10-17T20:38:03.318Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.attention.key_length default=0 time=2025-10-17T20:38:03.318Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.rope.dimension_count default=0 time=2025-10-17T20:38:03.318Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.attention.layer_norm_rms_epsilon default=0 time=2025-10-17T20:38:03.318Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.rope.freq_base default=100000 time=2025-10-17T20:38:03.318Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.rope.scaling.factor default=1 time=2025-10-17T20:38:03.318Z level=DEBUG source=runner.go:1307 msg="dummy model load took" duration=1.77050867s time=2025-10-17T20:38:03.318Z level=DEBUG source=runner.go:1312 msg="gathering device infos took" duration=18.181µs time=2025-10-17T20:38:03.318Z level=TRACE source=runner.go:548 msg="runner enumerated devices" OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/rocm]" devices="[{DeviceID:{ID:GPU-44d96d7bf1798b3e Library:ROCm} Name:ROCm0 Description:AMD Radeon Graphics FilteredID: Integrated:false PCIID:06:00.0 TotalMemory:25753026560 FreeMemory:25193086976 ComputeMajor:17 ComputeMinor:0 DriverMajor:60342 DriverMinor:13 LibraryPath:[/usr/lib/ollama /usr/lib/ollama/rocm]}]" time=2025-10-17T20:38:03.318Z level=DEBUG source=runner.go:451 msg="bootstrap discovery took" duration=1.781482227s OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/rocm]" extra_envs="[GGML_CUDA_INIT=1 ROCR_VISIBLE_DEVICES=GPU-44d96d7bf1798b3e]" time=2025-10-17T20:38:03.318Z level=TRACE source=runner.go:171 msg="supported GPU library combinations" supported=map[ROCm:map[/usr/lib/ollama/rocm:map[GPU-44d96d7bf1798b3e:0]]] time=2025-10-17T20:38:03.318Z level=DEBUG source=runner.go:45 msg="GPU bootstrap discovery took" duration=3.217050308s time=2025-10-17T20:38:03.318Z level=INFO source=types.go:112 msg="inference compute" id=GPU-44d96d7bf1798b3e library=ROCm compute=gfx1100 name=ROCm0 description="AMD Radeon Graphics" libdirs=ollama,rocm driver=60342.13 pci_id=06:00.0 type=discrete total="24.0 GiB" available="23.9 GiB"

@thk-socal commented on GitHub (Oct 17, 2025): I have also rebuilt the host from scratch now and will watch how it does. 24.04.03 with 6.14 HWE kernel and 7.0.2 drivers installed on the host. New DEBUG logs from first run. ` time=2025-10-17T20:38:00.100Z level=INFO source=routes.go:1511 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:DEBUG-4 OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]" time=2025-10-17T20:38:00.101Z level=INFO source=images.go:522 msg="total blobs: 0" time=2025-10-17T20:38:00.101Z level=INFO source=images.go:529 msg="total unused blobs removed: 0" time=2025-10-17T20:38:00.101Z level=INFO source=routes.go:1564 msg="Listening on [::]:11434 (version 0.12.6)" time=2025-10-17T20:38:00.101Z level=DEBUG source=sched.go:123 msg="starting llm scheduler" time=2025-10-17T20:38:00.101Z level=INFO source=runner.go:80 msg="discovering available GPUs..." time=2025-10-17T20:38:00.101Z level=DEBUG source=runner.go:448 msg="spawning runner with" OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/rocm]" extra_envs=[] time=2025-10-17T20:38:00.101Z level=TRACE source=runner.go:529 msg="starting runner for device discovery" env="[PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin HOSTNAME=compute-01 TZ=America/Los_Angeles OLLAMA_DEBUG=2 LD_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama:/usr/lib/ollama/rocm:/usr/lib/ollama/rocm:/usr/local/nvidia/lib:/usr/local/nvidia/lib64 NVIDIA_DRIVER_CAPABILITIES=compute,utility NVIDIA_VISIBLE_DEVICES=all OLLAMA_HOST=0.0.0.0:11434 HOME=/root OLLAMA_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/rocm]" cmd="/usr/bin/ollama runner --ollama-engine --port 46473" time=2025-10-17T20:38:00.109Z level=INFO source=runner.go:1332 msg="starting ollama engine" time=2025-10-17T20:38:00.109Z level=INFO source=runner.go:1367 msg="Server listening on 127.0.0.1:46473" time=2025-10-17T20:38:00.114Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32 time=2025-10-17T20:38:00.114Z level=DEBUG source=gguf.go:578 msg=general.architecture type=string time=2025-10-17T20:38:00.114Z level=DEBUG source=gguf.go:578 msg=tokenizer.ggml.model type=string time=2025-10-17T20:38:00.114Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32 time=2025-10-17T20:38:00.114Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.file_type default=0 time=2025-10-17T20:38:00.114Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.name default="" time=2025-10-17T20:38:00.114Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.description default="" time=2025-10-17T20:38:00.114Z level=INFO source=ggml.go:134 msg="" architecture=llama file_type=unknown name="" description="" num_tensors=0 num_key_values=3 time=2025-10-17T20:38:00.114Z level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/lib/ollama load_backend: loaded CPU backend from /usr/lib/ollama/libggml-cpu-icelake.so time=2025-10-17T20:38:00.120Z level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/lib/ollama/rocm /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon Graphics, gfx1100 (0x1100), VMM: no, Wave Size: 32, ID: GPU-44d96d7bf1798b3e load_backend: loaded ROCm backend from /usr/lib/ollama/rocm/libggml-hip.so time=2025-10-17T20:38:01.535Z level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.AVX512=1 CPU.0.AVX512_VBMI=1 CPU.0.AVX512_VNNI=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 ROCm.0.NO_VMM=1 ROCm.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc) time=2025-10-17T20:38:01.535Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.block_count default=0 time=2025-10-17T20:38:01.536Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.pooling_type default=0 time=2025-10-17T20:38:01.536Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.expert_count default=0 time=2025-10-17T20:38:01.536Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.tokens default="&{size:0 values:[]}" time=2025-10-17T20:38:01.536Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.scores default="&{size:0 values:[]}" time=2025-10-17T20:38:01.536Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.token_type default="&{size:0 values:[]}" time=2025-10-17T20:38:01.536Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.merges default="&{size:0 values:[]}" time=2025-10-17T20:38:01.536Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.add_bos_token default=true time=2025-10-17T20:38:01.536Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.bos_token_id default=0 time=2025-10-17T20:38:01.536Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.add_eos_token default=false time=2025-10-17T20:38:01.536Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.eos_token_id default=0 time=2025-10-17T20:38:01.536Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}" time=2025-10-17T20:38:01.536Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.pre default="" time=2025-10-17T20:38:01.536Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.block_count default=0 time=2025-10-17T20:38:01.536Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.embedding_length default=0 time=2025-10-17T20:38:01.536Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.attention.head_count default=0 time=2025-10-17T20:38:01.536Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.attention.head_count_kv default=0 time=2025-10-17T20:38:01.536Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.attention.key_length default=0 time=2025-10-17T20:38:01.536Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.rope.dimension_count default=0 time=2025-10-17T20:38:01.536Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.attention.layer_norm_rms_epsilon default=0 time=2025-10-17T20:38:01.536Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.rope.freq_base default=100000 time=2025-10-17T20:38:01.536Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.rope.scaling.factor default=1 time=2025-10-17T20:38:01.536Z level=DEBUG source=runner.go:1307 msg="dummy model load took" duration=1.423881543s time=2025-10-17T20:38:01.536Z level=DEBUG source=runner.go:1312 msg="gathering device infos took" duration=26.978µs time=2025-10-17T20:38:01.536Z level=TRACE source=runner.go:548 msg="runner enumerated devices" OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/rocm]" devices="[{DeviceID:{ID:GPU-44d96d7bf1798b3e Library:ROCm} Name:ROCm0 Description:AMD Radeon Graphics FilteredID: Integrated:false PCIID:06:00.0 TotalMemory:25753026560 FreeMemory:25715277824 ComputeMajor:17 ComputeMinor:0 DriverMajor:60342 DriverMinor:13 LibraryPath:[/usr/lib/ollama /usr/lib/ollama/rocm]}]" time=2025-10-17T20:38:01.536Z level=DEBUG source=runner.go:451 msg="bootstrap discovery took" duration=1.43527571s OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/rocm]" extra_envs=[] time=2025-10-17T20:38:01.536Z level=DEBUG source=runner.go:118 msg="filtering out unsupported or overlapping GPU library combinations" count=1 time=2025-10-17T20:38:01.536Z level=DEBUG source=runner.go:130 msg="verifying GPU is supported" library=/usr/lib/ollama/rocm description="AMD Radeon Graphics" compute=gfx1100 pci_id=06:00.0 time=2025-10-17T20:38:01.536Z level=DEBUG source=runner.go:448 msg="spawning runner with" OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/rocm]" extra_envs="[GGML_CUDA_INIT=1 ROCR_VISIBLE_DEVICES=GPU-44d96d7bf1798b3e]" time=2025-10-17T20:38:01.537Z level=TRACE source=runner.go:529 msg="starting runner for device discovery" env="[PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin HOSTNAME=compute-01 TZ=America/Los_Angeles OLLAMA_DEBUG=2 LD_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama:/usr/lib/ollama/rocm:/usr/lib/ollama/rocm:/usr/local/nvidia/lib:/usr/local/nvidia/lib64 NVIDIA_DRIVER_CAPABILITIES=compute,utility NVIDIA_VISIBLE_DEVICES=all OLLAMA_HOST=0.0.0.0:11434 HOME=/root OLLAMA_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/rocm GGML_CUDA_INIT=1 ROCR_VISIBLE_DEVICES=GPU-44d96d7bf1798b3e]" cmd="/usr/bin/ollama runner --ollama-engine --port 46607" time=2025-10-17T20:38:01.544Z level=INFO source=runner.go:1332 msg="starting ollama engine" time=2025-10-17T20:38:01.544Z level=INFO source=runner.go:1367 msg="Server listening on 127.0.0.1:46607" time=2025-10-17T20:38:01.547Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32 time=2025-10-17T20:38:01.547Z level=DEBUG source=gguf.go:578 msg=general.architecture type=string time=2025-10-17T20:38:01.547Z level=DEBUG source=gguf.go:578 msg=tokenizer.ggml.model type=string time=2025-10-17T20:38:01.547Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32 time=2025-10-17T20:38:01.547Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.file_type default=0 time=2025-10-17T20:38:01.547Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.name default="" time=2025-10-17T20:38:01.547Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.description default="" time=2025-10-17T20:38:01.547Z level=INFO source=ggml.go:134 msg="" architecture=llama file_type=unknown name="" description="" num_tensors=0 num_key_values=3 time=2025-10-17T20:38:01.547Z level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/lib/ollama load_backend: loaded CPU backend from /usr/lib/ollama/libggml-cpu-icelake.so time=2025-10-17T20:38:01.551Z level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/lib/ollama/rocm /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 ROCm devices: ggml_cuda_init: initializing rocBLAS on device 0 ggml_cuda_init: rocBLAS initialized on device 0 Device 0: AMD Radeon Graphics, gfx1100 (0x1100), VMM: no, Wave Size: 32, ID: GPU-44d96d7bf1798b3e load_backend: loaded ROCm backend from /usr/lib/ollama/rocm/libggml-hip.so time=2025-10-17T20:38:03.317Z level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.AVX512=1 CPU.0.AVX512_VBMI=1 CPU.0.AVX512_VNNI=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 ROCm.0.NO_VMM=1 ROCm.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc) time=2025-10-17T20:38:03.317Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.block_count default=0 time=2025-10-17T20:38:03.317Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.pooling_type default=0 time=2025-10-17T20:38:03.317Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.expert_count default=0 time=2025-10-17T20:38:03.317Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.tokens default="&{size:0 values:[]}" time=2025-10-17T20:38:03.317Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.scores default="&{size:0 values:[]}" time=2025-10-17T20:38:03.317Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.token_type default="&{size:0 values:[]}" time=2025-10-17T20:38:03.317Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.merges default="&{size:0 values:[]}" time=2025-10-17T20:38:03.317Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.add_bos_token default=true time=2025-10-17T20:38:03.317Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.bos_token_id default=0 time=2025-10-17T20:38:03.317Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.add_eos_token default=false time=2025-10-17T20:38:03.317Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.eos_token_id default=0 time=2025-10-17T20:38:03.317Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}" time=2025-10-17T20:38:03.317Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.pre default="" time=2025-10-17T20:38:03.318Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.block_count default=0 time=2025-10-17T20:38:03.318Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.embedding_length default=0 time=2025-10-17T20:38:03.318Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.attention.head_count default=0 time=2025-10-17T20:38:03.318Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.attention.head_count_kv default=0 time=2025-10-17T20:38:03.318Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.attention.key_length default=0 time=2025-10-17T20:38:03.318Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.rope.dimension_count default=0 time=2025-10-17T20:38:03.318Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.attention.layer_norm_rms_epsilon default=0 time=2025-10-17T20:38:03.318Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.rope.freq_base default=100000 time=2025-10-17T20:38:03.318Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.rope.scaling.factor default=1 time=2025-10-17T20:38:03.318Z level=DEBUG source=runner.go:1307 msg="dummy model load took" duration=1.77050867s time=2025-10-17T20:38:03.318Z level=DEBUG source=runner.go:1312 msg="gathering device infos took" duration=18.181µs time=2025-10-17T20:38:03.318Z level=TRACE source=runner.go:548 msg="runner enumerated devices" OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/rocm]" devices="[{DeviceID:{ID:GPU-44d96d7bf1798b3e Library:ROCm} Name:ROCm0 Description:AMD Radeon Graphics FilteredID: Integrated:false PCIID:06:00.0 TotalMemory:25753026560 FreeMemory:25193086976 ComputeMajor:17 ComputeMinor:0 DriverMajor:60342 DriverMinor:13 LibraryPath:[/usr/lib/ollama /usr/lib/ollama/rocm]}]" time=2025-10-17T20:38:03.318Z level=DEBUG source=runner.go:451 msg="bootstrap discovery took" duration=1.781482227s OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/rocm]" extra_envs="[GGML_CUDA_INIT=1 ROCR_VISIBLE_DEVICES=GPU-44d96d7bf1798b3e]" time=2025-10-17T20:38:03.318Z level=TRACE source=runner.go:171 msg="supported GPU library combinations" supported=map[ROCm:map[/usr/lib/ollama/rocm:map[GPU-44d96d7bf1798b3e:0]]] time=2025-10-17T20:38:03.318Z level=DEBUG source=runner.go:45 msg="GPU bootstrap discovery took" duration=3.217050308s time=2025-10-17T20:38:03.318Z level=INFO source=types.go:112 msg="inference compute" id=GPU-44d96d7bf1798b3e library=ROCm compute=gfx1100 name=ROCm0 description="AMD Radeon Graphics" libdirs=ollama,rocm driver=60342.13 pci_id=06:00.0 type=discrete total="24.0 GiB" available="23.9 GiB" `

GiteaMirror commented

2026-04-29 07:37:20 -05:00

@thk-socal commented on GitHub (Oct 17, 2025):

Next set of errors after it loaded up a model, responded, and then I ran another small model:

time=2025-10-17T20:56:18.269Z level=INFO source=runner.go:545 msg="failure during GPU discovery" OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/rocm]" extra_envs=[] error="failed to finish discovery before timeout" time=2025-10-17T20:56:18.269Z level=WARN source=runner.go:347 msg="unable to refresh free memory, using old values" time=2025-10-17T20:56:20.016Z level=INFO source=runner.go:545 msg="failure during GPU discovery" OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/rocm]" extra_envs=[] error="failed to finish discovery before timeout" time=2025-10-17T20:56:20.017Z level=WARN source=runner.go:347 msg="unable to refresh free memory, using old values" time=2025-10-17T20:56:20.017Z level=INFO source=runner.go:545 msg="failure during GPU discovery" OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/rocm]" extra_envs=[] error="failed to finish discovery before timeout" time=2025-10-17T20:56:20.017Z level=WARN source=runner.go:347 msg="unable to refresh free memory, using old values"

@thk-socal commented on GitHub (Oct 17, 2025): Next set of errors after it loaded up a model, responded, and then I ran another small model: ` time=2025-10-17T20:56:18.269Z level=INFO source=runner.go:545 msg="failure during GPU discovery" OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/rocm]" extra_envs=[] error="failed to finish discovery before timeout" time=2025-10-17T20:56:18.269Z level=WARN source=runner.go:347 msg="unable to refresh free memory, using old values" time=2025-10-17T20:56:20.016Z level=INFO source=runner.go:545 msg="failure during GPU discovery" OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/rocm]" extra_envs=[] error="failed to finish discovery before timeout" time=2025-10-17T20:56:20.017Z level=WARN source=runner.go:347 msg="unable to refresh free memory, using old values" time=2025-10-17T20:56:20.017Z level=INFO source=runner.go:545 msg="failure during GPU discovery" OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/rocm]" extra_envs=[] error="failed to finish discovery before timeout" time=2025-10-17T20:56:20.017Z level=WARN source=runner.go:347 msg="unable to refresh free memory, using old values"`

GiteaMirror commented

2026-04-29 07:37:25 -05:00

@dhiltgen commented on GitHub (Oct 17, 2025):

@thk-socal from those latest logs it looks like it did discover your GPU and run inference on it. Is that correct?

@dhiltgen commented on GitHub (Oct 17, 2025): @thk-socal from those latest logs it looks like it did discover your GPU and run inference on it. Is that correct?

GiteaMirror commented

2026-04-29 07:37:27 -05:00

@thk-socal commented on GitHub (Oct 17, 2025):

@dhiltgen it did discover and run inference correctly, but then it appeared to have memory issue and break. Dropping back to 0.12.3 now to see what happens and will continue trying to narrow down the issue.

@thk-socal commented on GitHub (Oct 17, 2025): @dhiltgen it did discover and run inference correctly, but then it appeared to have memory issue and break. Dropping back to 0.12.3 now to see what happens and will continue trying to narrow down the issue.

GiteaMirror commented

2026-04-29 07:37:30 -05:00

@dhiltgen commented on GitHub (Oct 17, 2025):

@thk-socal depending on what models you're trying to use, you can try to leverage OLLAMA_NEW_ENGINE=1 to get the benefits of the new memory management logic which might help.

@dhiltgen commented on GitHub (Oct 17, 2025): @thk-socal depending on what models you're trying to use, you can try to leverage OLLAMA_NEW_ENGINE=1 to get the benefits of the new memory management logic which might help.

GiteaMirror commented

2026-04-29 07:37:32 -05:00

@thk-socal commented on GitHub (Oct 20, 2025):

@dhiltgen in previous version, it did not seem to matter if the OS had done power management on the GPU when not in use. Since some of the latest versions, that appears to be an issue. I just told Linux to not power manage that specific PCI card and things appear to be stabilizing the issues. I am wondering if the new codes does not allow time to spin the GPU backup from a power minimizing state.

@thk-socal commented on GitHub (Oct 20, 2025): @dhiltgen in previous version, it did not seem to matter if the OS had done power management on the GPU when not in use. Since some of the latest versions, that appears to be an issue. I just told Linux to not power manage that specific PCI card and things appear to be stabilizing the issues. I am wondering if the new codes does not allow time to spin the GPU backup from a power minimizing state.

GiteaMirror commented

2026-04-29 07:37:38 -05:00

@dhiltgen commented on GitHub (Oct 20, 2025):

@thk-socal thanks for that info. We do want GPUs to be able to go back to low power state when not in use. I've merged a change to give the system more time for discovering AMD GPUs which should help mitigate this, but I'm hoping we can find a way to speed up the process of waking the device back up.

@dhiltgen commented on GitHub (Oct 20, 2025): @thk-socal thanks for that info. We do want GPUs to be able to go back to low power state when not in use. I've merged a change to give the system more time for discovering AMD GPUs which should help mitigate this, but I'm hoping we can find a way to speed up the process of waking the device back up.

GiteaMirror commented

2026-04-29 07:37:41 -05:00

@jack-running commented on GitHub (Oct 21, 2025):

msg="available gpu" id=

@jack-running the GPU discovery output looks like it found your discrete GPU. Could you include a bit more of the portion of the log you omitted in the .... which looks something like this?

time=2025-10-17T09:05:41.156-07:00 level=DEBUG source=server.go:915 msg="available gpu" id=0 library=ROCm "available layer vram"="23.4 GiB" backoff=0.00 minimum="457.0 MiB" overhead="0 B" graph="146.8 MiB"
time=2025-10-17T09:05:41.157-07:00 level=DEBUG source=server.go:732 msg="new layout created" layers="25[ID:0 Layers:25(0..24)]"
time=2025-10-17T09:05:41.157-07:00 level=INFO source=runner.go:1205 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:8192 KvCacheType: NumThreads:12 GPULayers:25[ID:0 Layers:25(0..24)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2025-10-17T09:05:41.157-07:00 level=INFO source=ggml.go:480 msg="offloading 24 repeating layers to GPU"
time=2025-10-17T09:05:41.157-07:00 level=INFO source=ggml.go:487 msg="offloading output layer to GPU"
time=2025-10-17T09:05:41.157-07:00 level=INFO source=ggml.go:492 msg="offloaded 25/25 layers to GPU"
time=2025-10-17T09:05:41.157-07:00 level=INFO source=device.go:206 msg="model weights" device=ROCm0 size="11.8 GiB"
time=2025-10-17T09:05:41.157-07:00 level=INFO source=device.go:211 msg="model weights" device=CPU size="1.1 GiB"
time=2025-10-17T09:05:41.157-07:00 level=INFO source=device.go:217 msg="kv cache" device=ROCm0 size="300.0 MiB"
time=2025-10-17T09:05:41.157-07:00 level=INFO source=device.go:228 msg="compute graph" device=ROCm0 size="146.8 MiB"
time=2025-10-17T09:05:41.157-07:00 level=INFO source=device.go:233 msg="compute graph" device=CPU size="5.6 MiB"
time=2025-10-17T09:05:41.157-07:00 level=INFO source=device.go:238 msg="total memory" size="13.3 GiB"
time=2025-10-17T09:05:41.157-07:00 level=INFO source=sched.go:482 msg="loaded runners" count=1
time=2025-10-17T09:05:41.157-07:00 level=INFO source=server.go:1272 msg="waiting for llama runner to start responding"
time=2025-10-17T09:05:41.157-07:00 level=INFO source=server.go:1306 msg="waiting for server to become available" status="llm server loading model"
time=2025-10-17T09:05:41.408-07:00 level=DEBUG source=server.go:1316 msg="model load progress 0.04"

It looks it found the discrete GPU only in the first case, but then it doesn't use the GPU RX 7900 XTX 24GB at all, everything runs as if there is only the onboard GPU and only CPU is used for inference.
Trying to attach the whole debug log file from the commands given by you previously.

ollama_0.12.6_DEBUG_2.txt

time=2025-10-17T14:42:56.662+02:00 level=DEBUG source=server.go:915 msg="available gpu" id=0 library=ROCm "available layer vram"="1.4 GiB" backoff=0.00 minimum="457.0 MiB" overhead="0 B" graph="123.2 MiB"
time=2025-10-17T14:42:56.662+02:00 level=DEBUG source=server.go:732 msg="new layout created" layers="3[ID:0 Layers:3(21..23)]"
time=2025-10-17T14:42:56.662+02:00 level=INFO source=runner.go:1205 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:8192 KvCacheType: NumThreads:8 GPULayers:3[ID:0 Layers:3(21..23)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2025-10-17T14:42:56.696+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32
time=2025-10-17T14:42:56.697+02:00 level=TRACE source=ggml.go:277 msg="created tensor" name=blk.0.attn_k.bias shape=[512] dtype=0 buffer_type=CPU
time=2025-10-17T14:42:56.697+02:00 level=TRACE source=ggml.go:277 msg="created tensor" name=blk.0.attn_k.weight shape="[2880 512]" dtype=30 buffer_type=CPU

@jack-running commented on GitHub (Oct 21, 2025): > msg="available gpu" id= > [@jack-running](https://github.com/jack-running) the GPU discovery output looks like it found your discrete GPU. Could you include a bit more of the portion of the log you omitted in the `....` which looks something like this? > > ``` > time=2025-10-17T09:05:41.156-07:00 level=DEBUG source=server.go:915 msg="available gpu" id=0 library=ROCm "available layer vram"="23.4 GiB" backoff=0.00 minimum="457.0 MiB" overhead="0 B" graph="146.8 MiB" > time=2025-10-17T09:05:41.157-07:00 level=DEBUG source=server.go:732 msg="new layout created" layers="25[ID:0 Layers:25(0..24)]" > time=2025-10-17T09:05:41.157-07:00 level=INFO source=runner.go:1205 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:8192 KvCacheType: NumThreads:12 GPULayers:25[ID:0 Layers:25(0..24)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" > time=2025-10-17T09:05:41.157-07:00 level=INFO source=ggml.go:480 msg="offloading 24 repeating layers to GPU" > time=2025-10-17T09:05:41.157-07:00 level=INFO source=ggml.go:487 msg="offloading output layer to GPU" > time=2025-10-17T09:05:41.157-07:00 level=INFO source=ggml.go:492 msg="offloaded 25/25 layers to GPU" > time=2025-10-17T09:05:41.157-07:00 level=INFO source=device.go:206 msg="model weights" device=ROCm0 size="11.8 GiB" > time=2025-10-17T09:05:41.157-07:00 level=INFO source=device.go:211 msg="model weights" device=CPU size="1.1 GiB" > time=2025-10-17T09:05:41.157-07:00 level=INFO source=device.go:217 msg="kv cache" device=ROCm0 size="300.0 MiB" > time=2025-10-17T09:05:41.157-07:00 level=INFO source=device.go:228 msg="compute graph" device=ROCm0 size="146.8 MiB" > time=2025-10-17T09:05:41.157-07:00 level=INFO source=device.go:233 msg="compute graph" device=CPU size="5.6 MiB" > time=2025-10-17T09:05:41.157-07:00 level=INFO source=device.go:238 msg="total memory" size="13.3 GiB" > time=2025-10-17T09:05:41.157-07:00 level=INFO source=sched.go:482 msg="loaded runners" count=1 > time=2025-10-17T09:05:41.157-07:00 level=INFO source=server.go:1272 msg="waiting for llama runner to start responding" > time=2025-10-17T09:05:41.157-07:00 level=INFO source=server.go:1306 msg="waiting for server to become available" status="llm server loading model" > time=2025-10-17T09:05:41.408-07:00 level=DEBUG source=server.go:1316 msg="model load progress 0.04" > ``` It looks it found the discrete GPU only in the first case, but then it doesn't use the GPU RX 7900 XTX 24GB at all, everything runs as if there is only the onboard GPU and only CPU is used for inference. Trying to attach the whole debug log file from the commands given by you previously. [ollama_0.12.6_DEBUG_2.txt](https://github.com/user-attachments/files/23015387/ollama_0.12.6_DEBUG_2.txt) time=2025-10-17T14:42:56.662+02:00 level=DEBUG source=server.go:915 msg="available gpu" id=0 library=ROCm "available layer vram"="1.4 GiB" backoff=0.00 minimum="457.0 MiB" overhead="0 B" graph="123.2 MiB" time=2025-10-17T14:42:56.662+02:00 level=DEBUG source=server.go:732 msg="new layout created" layers="3[ID:0 Layers:3(21..23)]" time=2025-10-17T14:42:56.662+02:00 level=INFO source=runner.go:1205 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:8192 KvCacheType: NumThreads:8 GPULayers:3[ID:0 Layers:3(21..23)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2025-10-17T14:42:56.696+02:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32 time=2025-10-17T14:42:56.697+02:00 level=TRACE source=ggml.go:277 msg="created tensor" name=blk.0.attn_k.bias shape=[512] dtype=0 buffer_type=CPU time=2025-10-17T14:42:56.697+02:00 level=TRACE source=ggml.go:277 msg="created tensor" name=blk.0.attn_k.weight shape="[2880 512]" dtype=30 buffer_type=CPU

GiteaMirror commented

2026-04-29 07:37:44 -05:00

@dhiltgen commented on GitHub (Oct 21, 2025):

@jack-running it looks like the IDs may be getting messed up and it's incorrectly looking up the iGPU information when it should be matching the discrete GPU. This might be fixed by #12540, however as a temporary workaround, you can try setting HIP_VISIBLE_DEVICES=1 until we get this fixed.

@dhiltgen commented on GitHub (Oct 21, 2025): @jack-running it looks like the IDs may be getting messed up and it's incorrectly looking up the iGPU information when it should be matching the discrete GPU. This might be fixed by #12540, however as a temporary workaround, you can try setting `HIP_VISIBLE_DEVICES=1` until we get this fixed.

GiteaMirror commented

2026-04-29 07:37:47 -05:00

@dhiltgen commented on GitHub (Oct 29, 2025):

The new release 0.12.7 should resolve these GPU discovery problems. Please upgrade and give it a try. If you're still having problems, please share a new server log with OLLAMA_DEBUG=2 set so we can take another look.

@dhiltgen commented on GitHub (Oct 29, 2025): The new release 0.12.7 should resolve these GPU discovery problems. Please upgrade and give it a try. If you're still having problems, please share a new server log with OLLAMA_DEBUG=2 set so we can take another look.

GiteaMirror commented

2026-04-29 07:37:50 -05:00

@thk-socal commented on GitHub (Oct 30, 2025):

The new release 0.12.7 should resolve these GPU discovery problems. Please upgrade and give it a try. If you're still having problems, please share a new server log with OLLAMA_DEBUG=2 set so we can take another look.

Looks good so far. I left all device IDs commented out and they were discovered properly. I am running the new engine flag only. Loaded and ran inference....now to see what it does after sitting for a few hours before I run it again.

Thanks for all the work!

@thk-socal commented on GitHub (Oct 30, 2025): > The new release 0.12.7 should resolve these GPU discovery problems. Please upgrade and give it a try. If you're still having problems, please share a new server log with OLLAMA_DEBUG=2 set so we can take another look. Looks good so far. I left all device IDs commented out and they were discovered properly. I am running the new engine flag only. Loaded and ran inference....now to see what it does after sitting for a few hours before I run it again. Thanks for all the work!

GiteaMirror commented

2026-04-29 07:37:55 -05:00

@thk-socal commented on GitHub (Oct 30, 2025):

@dhiltgen worked great for the first run. Then after I let it sit for 30 minutes or so, I tried again and it hung. The first run model appeared to still be in memory of the GPU and it would not unload it or use it again. I restarted the docker container and the memory was STILL locked from the previous run so instead of 24GB I had 11GB. Not enough to load the 13GB and it just hung.

Not sure yet why.

@thk-socal commented on GitHub (Oct 30, 2025): @dhiltgen worked great for the first run. Then after I let it sit for 30 minutes or so, I tried again and it hung. The first run model appeared to still be in memory of the GPU and it would not unload it or use it again. I restarted the docker container and the memory was STILL locked from the previous run so instead of 24GB I had 11GB. Not enough to load the 13GB and it just hung. Not sure yet why.

GiteaMirror commented

2026-04-29 07:38:00 -05:00

@dhiltgen commented on GitHub (Oct 30, 2025):

@thk-socal bouncing the docker container should clear everything up. Maybe there's a docker bug or amdgpu driver bug? You could try bouncing the docker service itself to see if that clears things up to help isolate where the problem is.

@dhiltgen commented on GitHub (Oct 30, 2025): @thk-socal bouncing the docker container should clear everything up. Maybe there's a docker bug or amdgpu driver bug? You could try bouncing the docker service itself to see if that clears things up to help isolate where the problem is.

GiteaMirror commented

2026-04-29 07:38:01 -05:00

@thk-socal commented on GitHub (Oct 30, 2025):

@dhiltgen bounce/kill/etc... had to reboot it when I have 0.12.7 running. Now with 0.12.1 and I do not have the problem. It is like it does not unload the model properly after the time period expires. 0.12.1 is working as intended without issues. Again, first run on 0.12.7 was perfect, but after the standard 5 minuted timeout for unloading models, it is like something hangs there. I will try other things to try to narrow it down.

@thk-socal commented on GitHub (Oct 30, 2025): @dhiltgen bounce/kill/etc... had to reboot it when I have 0.12.7 running. Now with 0.12.1 and I do not have the problem. It is like it does not unload the model properly after the time period expires. 0.12.1 is working as intended without issues. Again, first run on 0.12.7 was perfect, but after the standard 5 minuted timeout for unloading models, it is like something hangs there. I will try other things to try to narrow it down.

GiteaMirror commented

2026-04-29 07:38:04 -05:00

@dhiltgen commented on GitHub (Oct 30, 2025):

One data point that will be helpful to understand is if the actual VRAM usage stays up, or if we're getting bad information about VRAM usage for some reason. On Windows, we are using a windows specific AMD Library that yields accurate VRAM information, but on Linux, we're leveraging ROCm APIs in the new discovery code. The old discovery code used sysfs.

@dhiltgen commented on GitHub (Oct 30, 2025): One data point that will be helpful to understand is if the actual VRAM usage stays up, or if we're getting bad information about VRAM usage for some reason. On Windows, we are using a windows specific [AMD Library](https://gpuopen.com/adlx/) that yields accurate VRAM information, but on Linux, we're leveraging ROCm APIs in the new discovery code. The old discovery code used sysfs.

GiteaMirror commented

2026-04-29 07:38:09 -05:00

@jack-running commented on GitHub (Oct 30, 2025):

The new release 0.12.7 should resolve these GPU discovery problems. Please upgrade and give it a try. If you're still having problems, please share a new server log with OLLAMA_DEBUG=2 set so we can take another look.

I was very excited when I saw your message about 0.12.7 being available and I ran the tests with 0.12.7 as soon as possible, but my excitement faded away quite quickly :(
It seems that the inference by default still runs on CPU, despite detecting the 24GB GPU and even with suggested environment variable set HIP_VISIBLE_DEVICES=1 I couldn't get it running any better. (it seems I need to get back to version 0.12.3, which is still running fine on RX 7900 XTX)

Here are the logs:

serve_2025-10-30.log

serve_2025-10-30_HIP1.log

@jack-running commented on GitHub (Oct 30, 2025): > The new release 0.12.7 should resolve these GPU discovery problems. Please upgrade and give it a try. If you're still having problems, please share a new server log with OLLAMA_DEBUG=2 set so we can take another look. I was very excited when I saw your message about 0.12.7 being available and I ran the tests with 0.12.7 as soon as possible, but my excitement faded away quite quickly :( It seems that the inference by default still runs on CPU, despite detecting the 24GB GPU and even with suggested environment variable set HIP_VISIBLE_DEVICES=1 I couldn't get it running any better. (it seems I need to get back to version 0.12.3, which is still running fine on RX 7900 XTX) Here are the logs: [serve_2025-10-30.log](https://github.com/user-attachments/files/23245300/serve_2025-10-30.log) [serve_2025-10-30_HIP1.log](https://github.com/user-attachments/files/23245302/serve_2025-10-30_HIP1.log)

GiteaMirror commented

2026-04-29 07:38:11 -05:00

@dhiltgen commented on GitHub (Oct 30, 2025):

@jack-running thanks for the updated logs. I think I see what is going wrong and I'm working on a fix...

@dhiltgen commented on GitHub (Oct 30, 2025): @jack-running thanks for the updated logs. I think I see what is going wrong and I'm working on a fix...

GiteaMirror commented

2026-04-29 07:38:13 -05:00

@thk-socal commented on GitHub (Oct 31, 2025):

@dhiltgen I rolled back my driver to 6.4 from 7.1 on the AMD GPU Linux driver and that cleared this up as well.

@thk-socal commented on GitHub (Oct 31, 2025): @dhiltgen I rolled back my driver to 6.4 from 7.1 on the AMD GPU Linux driver and that cleared this up as well.

GiteaMirror commented

2026-04-29 07:38:17 -05:00

@dhiltgen commented on GitHub (Oct 31, 2025):

@thk-socal thanks for that data point! Please go ahead and file a new issue noting that driver 7.1 seems to have problems with current Ollama so we can track that.

@dhiltgen commented on GitHub (Oct 31, 2025): @thk-socal thanks for that data point! Please go ahead and file a new issue noting that driver 7.1 seems to have problems with current Ollama so we can track that.

GiteaMirror commented

2026-04-29 07:38:28 -05:00

@jack-running commented on GitHub (Oct 31, 2025):

@jack-running thanks for the updated logs. I think I see what is going wrong and I'm working on a fix...

0.12.8 works as a charm!

@jack-running commented on GitHub (Oct 31, 2025): > [@jack-running](https://github.com/jack-running) thanks for the updated logs. I think I see what is going wrong and I'm working on a fix... 0.12.8 works as a charm!

Sign in to join this conversation.

Branches Tags

main

dhiltgen/ci

parth-launch-plan-gating

hoyyeva/anthropic-reference-images-path

parth-anthropic-reference-images-path

brucemacd/download-before-remove

hoyyeva/editor-config-repair

parth-mlx-decode-checkpoints

parth-launch-codex-app

hoyyeva/fix-codex-model-metadata-warning

hoyyeva/qwen

parth/hide-claude-desktop-till-release

hoyyeva/opencode-image-modality

parth-add-claude-code-autoinstall

release_v0.22.0

pdevine/manifest-list

codex/fix-codex-model-metadata-warning

pdevine/addressable-manifest

brucemacd/launch-fetch-reccomended

jmorganca/llama-compat

launch-copilot-cli

hoyyeva/opencode-thinking

release_v0.20.7

parth-auto-save-backup

parth-test

jmorganca/gemma4-audio-replacements

fix-manifest-digest-on-pull

hoyyeva/vscode-improve

brucemacd/install-server-wait

parth/update-claude-docs

brucemac/start-ap-install

pdevine/mlx-update

pdevine/qwen35_vision

drifkin/api-show-fallback

mintlify/image-generation-1773352582

hoyyeva/server-context-length-local-config

jmorganca/faster-reptition-penalties

jmorganca/convert-nemotron

parth-pi-thinking

pdevine/sampling-penalties

jmorganca/fix-create-quantization-memory

dongchen/resumable_transfer_fix

pdevine/sampling-cache-error

jessegross/mlx-usage

hoyyeva/openclaw-config

hoyyeva/app-html

pdevine/qwen3next

brucemacd/sign-sh-install

brucemacd/tui-update

brucemacd/usage-api

jmorganca/launch-empty

fix-app-dist-embed

mxyng/mlx-compile

mxyng/mlx-quant

mxyng/mlx-glm4.7

mxyng/mlx

brucemacd/simplify-model-picker

jmorganca/qwen3-concurrent

fix-glm-4.7-flash-mla-config

drifkin/qwen3-coder-opening-tag

brucemacd/usage-cli

fix-cuda12-fattn-shmem

ollama-imagegen-docs

parth/fix-multiline-inputs

brucemacd/config-docs

mxyng/model-files

mxyng/simple-execute

fix-imagegen-ollama-models

mxyng/async-upload

jmorganca/lazy-no-dtype-changes

imagegen-auto-detect-create

parth/decrease-concurrent-download-hf

fix-mlx-quantize-init

jmorganca/x-cleanup

usage

imagegen-readme

jmorganca/glm-image

mlx-gpu-cd

jmorganca/imagegen-modelfile

parth/agent-skills

parth/agent-allowlist

parth/signed-in-offline

parth/agents

parth/fix-context-chopping

improve-cloud-flow

parth/add-models-websearch

parth/prompt-renderer-mcp

jmorganca/native-settings

jmorganca/download-stream-hash

jmorganca/client2-rebased

brucemacd/oai-chat-req-multipart

jessegross/multi_chunk_reserve

grace/additional-omit-empty

grace/mistral-3-large

mxyng/tokenizer2

mxyng/tokenizer

jessegross/flash

hoyyeva/windows-nacked-app

mxyng/cleanup-attention

grace/deepseek-parser

hoyyeva/remember-unsent-prompt

parth/add-lfs-pointer-error-conversion

parth/olmo2-test2

hoyyeva/ollama-launchagent-plist

nicole/olmo-model

parth/olmo-test

mxyng/remove-embedded

parth/render-template

jmorganca/intellect-3

parth/remove-prealloc-linter

jmorganca/cmd-eval

nicole/nomic-embed-text-fix

mxyng/lint-2

hoyyeva/add-gemini-3-pro-preview

hoyyeva/load-model-list

mxyng/expand-path

mxyng/environ-2

hoyyeva/deeplink-json-encoding

parth/improve-tool-calling-tests

hoyyeva/conversation

hoyyeva/assistant-edit-response

hoyyeva/thinking

origin/brucemacd/invalid-char-i-err

parth/improve-tool-calling

jmorganca/required-omitempty

grace/qwen3-vl-tests

mxyng/iter-client

parth/docs-readme

nicole/embed-test

pdevine/integration-benchstat

parth/remove-generate-cmd

parth/add-toolcall-id

mxyng/server-tests

jmorganca/glm-4.6

jmorganca/gin-h-compat

drifkin/stable-tool-args

pdevine/qwen3-more-thinking

parth/add-websearch-client

nicole/websearch_local

jmorganca/qwen3-coder-updates

grace/deepseek-v3-migration-tests

mxyng/fix-create

jmorganca/cloud-errors

pdevine/parser-tidy

revert-12233-parth/simplify-entrypoints-runner

parth/enable-so-gpt-oss

brucemacd/qwen3vl

jmorganca/readme-simplify

parth/gpt-oss-structured-outputs

revert-12039-jmorganca/tools-braces

mxyng/embeddings

mxyng/gguf

mxyng/benchmark

mxyng/types-null

parth/move-parsing

mxyng/gemma2

jmorganca/docs

mxyng/16-bit

mxyng/create-stdin

pdevine/authorizedkeys

mxyng/quant

parth/opt-in-error-context-window

brucemacd/cache-models

brucemacd/runner-completion

jmorganca/llama-update-6

brucemacd/benchmark-list

brucemacd/partial-read-caps

parth/deepseek-r1-tools

mxyng/omit-array

parth/tool-prefix-temp

brucemacd/runner-test

jmorganca/qwen25vl

brucemacd/model-forward-test-ext

parth/python-function-parsing

jmorganca/cuda-compression-none

drifkin/num-parallel

drifkin/chat-truncation-fix

jmorganca/sync

parth/python-tools-calling

drifkin/array-head-count

brucemacd/create-no-loop

parth/server-enable-content-stream-with-tools

qwen25omni

mxyng/v3

brucemacd/ropeconfig

jmorganca/silence-tokenizer

parth/sample-so-test

parth/sampling-structured-outputs

brucemacd/doc-go-engine

parth/constrained-sampling-json

jmorganca/mistral-wip

brucemacd/mistral-small-convert

parth/sample-unmarshal-json-for-params

brucemacd/jomorganca/mistral

pdevine/bfloat16

jmorganca/mistral

brucemacd/mistral

pdevine/logging

parth/sample-correctness-fix

parth/sample-fix-sorting

jmorgan/sample-fix-sorting-extras

jmorganca/temp-0-images

brucemacd/parallel-embed-models

brucemacd/shim-grammar

jmorganca/fix-gguf-error

bmizerany/nameswork

jmorganca/faster-releases

bmizerany/validatenames

brucemacd/err-no-vocab

brucemacd/rope-config

brucemacd/err-hint

brucemacd/qwen2_5

brucemacd/logprobs

brucemacd/new_runner_graph_bench

progress-flicker

brucemacd/forward-test

brucemacd/go_qwen2

pdevine/gemma2

jmorganca/add-missing-symlink-eval

mxyng/next-debug

parth/set-context-size-openai

brucemacd/next-bpe-bench

brucemacd/next-bpe-test

brucemacd/new_runner_e2e

brucemacd/new_runner_qwen2

pdevine/convert-cohere2

brucemacd/convert-cli

parth/log-probs

mxyng/next-mlx

mxyng/cmd-history

parth/templating

parth/tokenize-detokenize

brucemacd/check-key-register

bmizerany/grammar

jmorganca/vendor-081b29bd

mxyng/func-checks

jmorganca/fix-null-format

parth/fix-default-to-warn-json

jmorganca/qwen2vl

jmorganca/no-concat

parth/cmd-cleanup-SO

brucemacd/check-key-register-structured-err

parth/openai-stream-usage

parth/fix-referencing-so

stream-tools-stop

jmorganca/degin-1

brucemacd/install-path-clean

brucemacd/push-name-validation

brucemacd/browser-key-register

jmorganca/openai-fix-first-message

jmorganca/fix-proxy

jessegross/sample

parth/disallow-streaming-tools

dhiltgen/remove_submodule

jmorganca/ga

jmorganca/mllama

pdevine/newlines

pdevine/geems-2b

jmorganca/llama-bump

mxyng/modelname-7

mxyng/gin-slog

mxyng/modelname-6

jyan/convert-prog

jyan/quant5

paligemma-support

pdevine/import-docs

jmorganca/openai-context

jyan/paligemma

jyan/p2

jyan/palitest

bmizerany/embedspeedup

jmorganca/llama-vit

brucemacd/allow-ollama

royh/ep-methods

royh/whisper

mxyng/api-models

mxyng/fix-memory

jyan/q4_4/8

jyan/ollama-v

royh/stream-tools

roy-embed-parallel

bmizerany/hrm

revert-5963-revert-5924-mxyng/llama3.1-rope

royh/embed-viz

jyan/local2

jyan/auth

jyan/local

jyan/parse-temp

jmorganca/template-mistral

jyan/reord-g

royh-openai-suffixdocs

royh-imgembed

royh-embed-parallel

jyan/quant4

royh-precision

jyan/progress

pdevine/fix-template

jyan/quant3

pdevine/ggla

mxyng/update-registry-domain

jmorganca/ggml-static

mxyng/create-context

jyan/v0.146

mxyng/layers-from-files

build_dist

bmizerany/noseek

royh-ls

royh-name

timeout

mxyng/server-timestamp

bmizerany/nosillyggufslurps

royh-params

jmorganca/llama-cpp-7c26775

royh-openai-delete

royh-show-rigid

jmorganca/enable-fa

jmorganca/no-error-template

jyan/format

royh-testdelete

bmizerany/fastverify

language_support

pdevine/ps-glitches

brucemacd/tokenize

bruce/iq-quants

bmizerany/filepathwithcoloninhost

mxyng/split-bin

bmizerany/client-registry

jmorganca/if-none-match

native

jmorganca/native

jmorganca/batch-embeddings

jmorganca/initcmake

jmorganca/mm

pdevine/showggmlinfo

modenameenforcealphanum

bmizerany/modenameenforcealphanum

jmorganca/done-reason

jmorganca/llama-cpp-8960fe8

ollama.com

bmizerany/filepathnobuild

bmizerany/types/model/defaultfix

rmdisplaylong

nogogen

bmizerany/x

modelfile-readme

bmizerany/replacecolon

jmorganca/limit

jmorganca/execstack

jmorganca/replace-assets

mxyng/tune-concurrency

jmorganca/testing

whitespace-detection

jmorganca/options

upgrade-all

scratch

cuda-search

mattw/airenamer

mattw/allmodelsonhuggingface

mattw/quantcontext

mattw/whatneedstorun

brucemacd/llama-mem-calc

mattw/faq-context

mattw/communitylinks

mattw/noprune

mattw/python-functioncalling

rename

mxyng/install

pulse

remove-first

editor

mattw/selfqueryingretrieval

cgo

mattw/howtoquant

api

matt/streamingapi

format-config

mxyng/extra-args

shell

update-nous-hermes

cp-model

upload-progress

fix-unknown-model

fix-model-names

delete-fix

insecure-registry

ls

deletemodels

progressbar

readme-updates

license-layers

skip-list

list-models

modelpath

matt/examplemodelfiles

distribution

go-opts

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/ollama#54848