[GH-ISSUE #13320] Latest version of Ollama does not use GPU #70856

Open
opened 2026-05-04 23:13:21 -05:00 by GiteaMirror · 20 comments
Owner

Originally created by @Theblackcat98 on GitHub (Dec 3, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/13320

Originally assigned to: @dhiltgen on GitHub.

What is the issue?

Before 0.13.1 when running any model (especially ones that fit entirely in my VRAM) Ollama would automatically use the GPU.
Now any model I try using is instantly loaded into ram only and it uses 100% CPU.

Relevant log output

log

Dec 03 13:35:31 veritas systemd[1]: Started ollama.service - Ollama Service.
Dec 03 13:35:31 veritas ollama[1359]: time=2025-12-03T13:35:31.500-08:00 level=INFO source=routes.go:1544 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:INFO OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/usr/share/ollama/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false OLLAMA_VULKAN:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
Dec 03 13:35:31 veritas ollama[1359]: time=2025-12-03T13:35:31.579-08:00 level=INFO source=images.go:522 msg="total blobs: 86"
Dec 03 13:35:31 veritas ollama[1359]: time=2025-12-03T13:35:31.579-08:00 level=INFO source=images.go:529 msg="total unused blobs removed: 0"
Dec 03 13:35:31 veritas ollama[1359]: time=2025-12-03T13:35:31.579-08:00 level=INFO source=routes.go:1597 msg="Listening on [::]:11434 (version 0.13.1)"
Dec 03 13:35:31 veritas ollama[1359]: time=2025-12-03T13:35:31.579-08:00 level=INFO source=runner.go:67 msg="discovering available GPUs..."
Dec 03 13:35:31 veritas ollama[1359]: time=2025-12-03T13:35:31.580-08:00 level=INFO source=server.go:392 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 44367"
Dec 03 13:35:31 veritas ollama[1359]: time=2025-12-03T13:35:31.592-08:00 level=INFO source=types.go:60 msg="inference compute" id=cpu library=cpu compute="" name=cpu description=cpu libdirs=ollama driver="" pci_id="" type="" total="60.4 GiB" available="58.9 GiB"
Dec 03 13:35:31 veritas ollama[1359]: time=2025-12-03T13:35:31.592-08:00 level=INFO source=routes.go:1638 msg="entering low vram mode" "total vram"="0 B" threshold="20.0 GiB"
Dec 03 14:47:56 veritas ollama[1359]: [GIN] 2025/12/03 - 14:47:56 | 200 |      31.618µs |       127.0.0.1 | HEAD     "/"
Dec 03 14:47:56 veritas ollama[1359]: [GIN] 2025/12/03 - 14:47:56 | 200 |      56.775µs |       127.0.0.1 | GET      "/api/ps"
Dec 03 14:48:00 veritas ollama[1359]: [GIN] 2025/12/03 - 14:48:00 | 200 |      20.268µs |       127.0.0.1 | HEAD     "/"
Dec 03 14:48:00 veritas ollama[1359]: [GIN] 2025/12/03 - 14:48:00 | 200 |    1.234649ms |       127.0.0.1 | GET      "/api/tags"
Dec 03 14:48:11 veritas ollama[1359]: [GIN] 2025/12/03 - 14:48:11 | 200 |      23.434µs |       127.0.0.1 | HEAD     "/"
Dec 03 14:48:11 veritas ollama[1359]: [GIN] 2025/12/03 - 14:48:11 | 200 |   90.771223ms |       127.0.0.1 | POST     "/api/show"
Dec 03 14:48:11 veritas ollama[1359]: [GIN] 2025/12/03 - 14:48:11 | 200 |   44.195266ms |       127.0.0.1 | POST     "/api/show"
Dec 03 14:48:11 veritas ollama[1359]: time=2025-12-03T14:48:11.780-08:00 level=INFO source=server.go:209 msg="enabling flash attention"
Dec 03 14:48:11 veritas ollama[1359]: time=2025-12-03T14:48:11.781-08:00 level=INFO source=server.go:392 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --model /usr/share/ollama/.ollama/models/blobs/sha256-b112e727c6f18875636c56a779790a590d705aec9e1c0eb5a97d51fc2a778583 --port 34437"
Dec 03 14:48:11 veritas ollama[1359]: time=2025-12-03T14:48:11.781-08:00 level=INFO source=sched.go:443 msg="system memory" total="60.4 GiB" free="56.8 GiB" free_swap="8.0 GiB"
Dec 03 14:48:11 veritas ollama[1359]: time=2025-12-03T14:48:11.781-08:00 level=INFO source=server.go:702 msg="loading model" "model layers"=25 requested=-1
Dec 03 14:48:11 veritas ollama[1359]: time=2025-12-03T14:48:11.786-08:00 level=INFO source=runner.go:1398 msg="starting ollama engine"
Dec 03 14:48:11 veritas ollama[1359]: time=2025-12-03T14:48:11.786-08:00 level=INFO source=runner.go:1433 msg="Server listening on 127.0.0.1:34437"
Dec 03 14:48:11 veritas ollama[1359]: time=2025-12-03T14:48:11.791-08:00 level=INFO source=runner.go:1271 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:16000 KvCacheType: NumThreads:8 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
Dec 03 14:48:11 veritas ollama[1359]: time=2025-12-03T14:48:11.816-08:00 level=INFO source=ggml.go:136 msg="" architecture=gptoss file_type=MXFP4 name="" description="" num_tensors=315 num_key_values=30
Dec 03 14:48:11 veritas ollama[1359]: time=2025-12-03T14:48:11.816-08:00 level=INFO source=ggml.go:104 msg=system CPU.0.LLAMAFILE=1 compiler=cgo(gcc)
Dec 03 14:48:11 veritas ollama[1359]: time=2025-12-03T14:48:11.834-08:00 level=INFO source=runner.go:1271 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:16000 KvCacheType: NumThreads:8 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
Dec 03 14:48:11 veritas ollama[1359]: time=2025-12-03T14:48:11.943-08:00 level=INFO source=runner.go:1271 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:16000 KvCacheType: NumThreads:8 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
Dec 03 14:48:11 veritas ollama[1359]: time=2025-12-03T14:48:11.943-08:00 level=INFO source=ggml.go:482 msg="offloading 0 repeating layers to GPU"
Dec 03 14:48:11 veritas ollama[1359]: time=2025-12-03T14:48:11.944-08:00 level=INFO source=ggml.go:486 msg="offloading output layer to CPU"
Dec 03 14:48:11 veritas ollama[1359]: time=2025-12-03T14:48:11.944-08:00 level=INFO source=ggml.go:494 msg="offloaded 0/25 layers to GPU"
Dec 03 14:48:11 veritas ollama[1359]: time=2025-12-03T14:48:11.944-08:00 level=INFO source=device.go:245 msg="model weights" device=CPU size="12.8 GiB"
Dec 03 14:48:11 veritas ollama[1359]: time=2025-12-03T14:48:11.944-08:00 level=INFO source=device.go:256 msg="kv cache" device=CPU size="486.0 MiB"
Dec 03 14:48:11 veritas ollama[1359]: time=2025-12-03T14:48:11.944-08:00 level=INFO source=device.go:267 msg="compute graph" device=CPU size="139.5 MiB"
Dec 03 14:48:11 veritas ollama[1359]: time=2025-12-03T14:48:11.944-08:00 level=INFO source=device.go:272 msg="total memory" size="13.4 GiB"
Dec 03 14:48:11 veritas ollama[1359]: time=2025-12-03T14:48:11.944-08:00 level=INFO source=sched.go:517 msg="loaded runners" count=1
Dec 03 14:48:11 veritas ollama[1359]: time=2025-12-03T14:48:11.944-08:00 level=INFO source=server.go:1294 msg="waiting for llama runner to start responding"
Dec 03 14:48:11 veritas ollama[1359]: time=2025-12-03T14:48:11.944-08:00 level=INFO source=server.go:1328 msg="waiting for server to become available" status="llm server loading model"
Dec 03 14:48:49 veritas ollama[1359]: time=2025-12-03T14:48:49.292-08:00 level=INFO source=server.go:1332 msg="llama runner started in 37.51 seconds"
Dec 03 14:48:49 veritas ollama[1359]: [GIN] 2025/12/03 - 14:48:49 | 200 | 37.670049567s |       127.0.0.1 | POST     "/api/generate"
Dec 03 14:50:44 veritas ollama[1359]: [GIN] 2025/12/03 - 14:50:44 | 200 | 17.345050616s |       127.0.0.1 | POST     "/api/chat"
Dec 03 14:51:00 veritas ollama[1359]: [GIN] 2025/12/03 - 14:51:00 | 200 |      30.677µs |       127.0.0.1 | GET      "/api/version"
Dec 03 14:51:02 veritas ollama[1359]: [GIN] 2025/12/03 - 14:51:02 | 200 |      18.154µs |       127.0.0.1 | HEAD     "/"
Dec 03 14:51:02 veritas ollama[1359]: [GIN] 2025/12/03 - 14:51:02 | 200 |      18.965µs |       127.0.0.1 | GET      "/api/ps"
Dec 03 15:04:54 veritas ollama[1359]: [GIN] 2025/12/03 - 15:04:54 | 200 |      17.913µs |       127.0.0.1 | HEAD     "/"
Dec 03 15:04:54 veritas ollama[1359]: [GIN] 2025/12/03 - 15:04:54 | 200 |     985.845µs |       127.0.0.1 | GET      "/api/tags"
Dec 03 15:04:59 veritas ollama[1359]: [GIN] 2025/12/03 - 15:04:59 | 200 |      17.563µs |       127.0.0.1 | HEAD     "/"
Dec 03 15:04:59 veritas ollama[1359]: [GIN] 2025/12/03 - 15:04:59 | 200 |   45.993599ms |       127.0.0.1 | POST     "/api/show"
Dec 03 15:04:59 veritas ollama[1359]: [GIN] 2025/12/03 - 15:04:59 | 200 |   43.656007ms |       127.0.0.1 | POST     "/api/show"
Dec 03 15:05:00 veritas ollama[1359]: time=2025-12-03T15:05:00.005-08:00 level=INFO source=server.go:209 msg="enabling flash attention"
Dec 03 15:05:00 veritas ollama[1359]: time=2025-12-03T15:05:00.006-08:00 level=INFO source=server.go:392 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --model /usr/share/ollama/.ollama/models/blobs/sha256-b112e727c6f18875636c56a779790a590d705aec9e1c0eb5a97d51fc2a778583 --port 44463"
Dec 03 15:05:00 veritas ollama[1359]: time=2025-12-03T15:05:00.006-08:00 level=INFO source=sched.go:443 msg="system memory" total="60.4 GiB" free="57.0 GiB" free_swap="8.0 GiB"
Dec 03 15:05:00 veritas ollama[1359]: time=2025-12-03T15:05:00.006-08:00 level=INFO source=server.go:702 msg="loading model" "model layers"=25 requested=-1
Dec 03 15:05:00 veritas ollama[1359]: time=2025-12-03T15:05:00.011-08:00 level=INFO source=runner.go:1398 msg="starting ollama engine"
Dec 03 15:05:00 veritas ollama[1359]: time=2025-12-03T15:05:00.011-08:00 level=INFO source=runner.go:1433 msg="Server listening on 127.0.0.1:44463"
Dec 03 15:05:00 veritas ollama[1359]: time=2025-12-03T15:05:00.018-08:00 level=INFO source=runner.go:1271 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:16000 KvCacheType: NumThreads:8 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
Dec 03 15:05:00 veritas ollama[1359]: time=2025-12-03T15:05:00.043-08:00 level=INFO source=ggml.go:136 msg="" architecture=gptoss file_type=MXFP4 name="" description="" num_tensors=315 num_key_values=30
Dec 03 15:05:00 veritas ollama[1359]: time=2025-12-03T15:05:00.043-08:00 level=INFO source=ggml.go:104 msg=system CPU.0.LLAMAFILE=1 compiler=cgo(gcc)
Dec 03 15:05:00 veritas ollama[1359]: time=2025-12-03T15:05:00.060-08:00 level=INFO source=runner.go:1271 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:16000 KvCacheType: NumThreads:8 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
Dec 03 15:05:00 veritas ollama[1359]: time=2025-12-03T15:05:00.170-08:00 level=INFO source=runner.go:1271 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:16000 KvCacheType: NumThreads:8 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
Dec 03 15:05:00 veritas ollama[1359]: time=2025-12-03T15:05:00.170-08:00 level=INFO source=ggml.go:482 msg="offloading 0 repeating layers to GPU"
Dec 03 15:05:00 veritas ollama[1359]: time=2025-12-03T15:05:00.170-08:00 level=INFO source=ggml.go:486 msg="offloading output layer to CPU"
Dec 03 15:05:00 veritas ollama[1359]: time=2025-12-03T15:05:00.170-08:00 level=INFO source=ggml.go:494 msg="offloaded 0/25 layers to GPU"
Dec 03 15:05:00 veritas ollama[1359]: time=2025-12-03T15:05:00.170-08:00 level=INFO source=device.go:245 msg="model weights" device=CPU size="12.8 GiB"
Dec 03 15:05:00 veritas ollama[1359]: time=2025-12-03T15:05:00.170-08:00 level=INFO source=device.go:256 msg="kv cache" device=CPU size="486.0 MiB"
Dec 03 15:05:00 veritas ollama[1359]: time=2025-12-03T15:05:00.170-08:00 level=INFO source=device.go:267 msg="compute graph" device=CPU size="139.5 MiB"
Dec 03 15:05:00 veritas ollama[1359]: time=2025-12-03T15:05:00.170-08:00 level=INFO source=device.go:272 msg="total memory" size="13.4 GiB"
Dec 03 15:05:00 veritas ollama[1359]: time=2025-12-03T15:05:00.170-08:00 level=INFO source=sched.go:517 msg="loaded runners" count=1
Dec 03 15:05:00 veritas ollama[1359]: time=2025-12-03T15:05:00.170-08:00 level=INFO source=server.go:1294 msg="waiting for llama runner to start responding"
Dec 03 15:05:00 veritas ollama[1359]: time=2025-12-03T15:05:00.170-08:00 level=INFO source=server.go:1328 msg="waiting for server to become available" status="llm server loading model"
Dec 03 15:05:01 veritas ollama[1359]: time=2025-12-03T15:05:01.429-08:00 level=INFO source=server.go:1332 msg="llama runner started in 1.42 seconds"
Dec 03 15:05:01 veritas ollama[1359]: [GIN] 2025/12/03 - 15:05:01 | 200 |  1.582630134s |       127.0.0.1 | POST     "/api/generate"
Dec 03 15:05:18 veritas ollama[1359]: [GIN] 2025/12/03 - 15:05:18 | 200 | 13.939118537s |       127.0.0.1 | POST     "/api/chat"
Dec 03 15:05:23 veritas ollama[1359]: [GIN] 2025/12/03 - 15:05:23 | 200 |      21.309µs |       127.0.0.1 | HEAD     "/"
Dec 03 15:05:23 veritas ollama[1359]: [GIN] 2025/12/03 - 15:05:23 | 200 |      19.596µs |       127.0.0.1 | GET      "/api/ps"
Dec 03 15:12:53 veritas ollama[1359]: [GIN] 2025/12/03 - 15:12:53 | 200 |      17.853µs |       127.0.0.1 | HEAD     "/"
Dec 03 15:12:53 veritas ollama[1359]: [GIN] 2025/12/03 - 15:12:53 | 200 |      10.349µs |       127.0.0.1 | GET      "/api/ps"

OS

Linux

GPU

AMD

CPU

AMD

Ollama version

0.13.1

Originally created by @Theblackcat98 on GitHub (Dec 3, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/13320 Originally assigned to: @dhiltgen on GitHub. ### What is the issue? Before 0.13.1 when running any model (especially ones that fit entirely in my VRAM) Ollama would automatically use the GPU. Now any model I try using is instantly loaded into ram only and it uses 100% CPU. ### Relevant log output <details><summary>log</summary> <p> ```shell Dec 03 13:35:31 veritas systemd[1]: Started ollama.service - Ollama Service. Dec 03 13:35:31 veritas ollama[1359]: time=2025-12-03T13:35:31.500-08:00 level=INFO source=routes.go:1544 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:INFO OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/usr/share/ollama/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false OLLAMA_VULKAN:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]" Dec 03 13:35:31 veritas ollama[1359]: time=2025-12-03T13:35:31.579-08:00 level=INFO source=images.go:522 msg="total blobs: 86" Dec 03 13:35:31 veritas ollama[1359]: time=2025-12-03T13:35:31.579-08:00 level=INFO source=images.go:529 msg="total unused blobs removed: 0" Dec 03 13:35:31 veritas ollama[1359]: time=2025-12-03T13:35:31.579-08:00 level=INFO source=routes.go:1597 msg="Listening on [::]:11434 (version 0.13.1)" Dec 03 13:35:31 veritas ollama[1359]: time=2025-12-03T13:35:31.579-08:00 level=INFO source=runner.go:67 msg="discovering available GPUs..." Dec 03 13:35:31 veritas ollama[1359]: time=2025-12-03T13:35:31.580-08:00 level=INFO source=server.go:392 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 44367" Dec 03 13:35:31 veritas ollama[1359]: time=2025-12-03T13:35:31.592-08:00 level=INFO source=types.go:60 msg="inference compute" id=cpu library=cpu compute="" name=cpu description=cpu libdirs=ollama driver="" pci_id="" type="" total="60.4 GiB" available="58.9 GiB" Dec 03 13:35:31 veritas ollama[1359]: time=2025-12-03T13:35:31.592-08:00 level=INFO source=routes.go:1638 msg="entering low vram mode" "total vram"="0 B" threshold="20.0 GiB" Dec 03 14:47:56 veritas ollama[1359]: [GIN] 2025/12/03 - 14:47:56 | 200 | 31.618µs | 127.0.0.1 | HEAD "/" Dec 03 14:47:56 veritas ollama[1359]: [GIN] 2025/12/03 - 14:47:56 | 200 | 56.775µs | 127.0.0.1 | GET "/api/ps" Dec 03 14:48:00 veritas ollama[1359]: [GIN] 2025/12/03 - 14:48:00 | 200 | 20.268µs | 127.0.0.1 | HEAD "/" Dec 03 14:48:00 veritas ollama[1359]: [GIN] 2025/12/03 - 14:48:00 | 200 | 1.234649ms | 127.0.0.1 | GET "/api/tags" Dec 03 14:48:11 veritas ollama[1359]: [GIN] 2025/12/03 - 14:48:11 | 200 | 23.434µs | 127.0.0.1 | HEAD "/" Dec 03 14:48:11 veritas ollama[1359]: [GIN] 2025/12/03 - 14:48:11 | 200 | 90.771223ms | 127.0.0.1 | POST "/api/show" Dec 03 14:48:11 veritas ollama[1359]: [GIN] 2025/12/03 - 14:48:11 | 200 | 44.195266ms | 127.0.0.1 | POST "/api/show" Dec 03 14:48:11 veritas ollama[1359]: time=2025-12-03T14:48:11.780-08:00 level=INFO source=server.go:209 msg="enabling flash attention" Dec 03 14:48:11 veritas ollama[1359]: time=2025-12-03T14:48:11.781-08:00 level=INFO source=server.go:392 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --model /usr/share/ollama/.ollama/models/blobs/sha256-b112e727c6f18875636c56a779790a590d705aec9e1c0eb5a97d51fc2a778583 --port 34437" Dec 03 14:48:11 veritas ollama[1359]: time=2025-12-03T14:48:11.781-08:00 level=INFO source=sched.go:443 msg="system memory" total="60.4 GiB" free="56.8 GiB" free_swap="8.0 GiB" Dec 03 14:48:11 veritas ollama[1359]: time=2025-12-03T14:48:11.781-08:00 level=INFO source=server.go:702 msg="loading model" "model layers"=25 requested=-1 Dec 03 14:48:11 veritas ollama[1359]: time=2025-12-03T14:48:11.786-08:00 level=INFO source=runner.go:1398 msg="starting ollama engine" Dec 03 14:48:11 veritas ollama[1359]: time=2025-12-03T14:48:11.786-08:00 level=INFO source=runner.go:1433 msg="Server listening on 127.0.0.1:34437" Dec 03 14:48:11 veritas ollama[1359]: time=2025-12-03T14:48:11.791-08:00 level=INFO source=runner.go:1271 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:16000 KvCacheType: NumThreads:8 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" Dec 03 14:48:11 veritas ollama[1359]: time=2025-12-03T14:48:11.816-08:00 level=INFO source=ggml.go:136 msg="" architecture=gptoss file_type=MXFP4 name="" description="" num_tensors=315 num_key_values=30 Dec 03 14:48:11 veritas ollama[1359]: time=2025-12-03T14:48:11.816-08:00 level=INFO source=ggml.go:104 msg=system CPU.0.LLAMAFILE=1 compiler=cgo(gcc) Dec 03 14:48:11 veritas ollama[1359]: time=2025-12-03T14:48:11.834-08:00 level=INFO source=runner.go:1271 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:16000 KvCacheType: NumThreads:8 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" Dec 03 14:48:11 veritas ollama[1359]: time=2025-12-03T14:48:11.943-08:00 level=INFO source=runner.go:1271 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:16000 KvCacheType: NumThreads:8 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" Dec 03 14:48:11 veritas ollama[1359]: time=2025-12-03T14:48:11.943-08:00 level=INFO source=ggml.go:482 msg="offloading 0 repeating layers to GPU" Dec 03 14:48:11 veritas ollama[1359]: time=2025-12-03T14:48:11.944-08:00 level=INFO source=ggml.go:486 msg="offloading output layer to CPU" Dec 03 14:48:11 veritas ollama[1359]: time=2025-12-03T14:48:11.944-08:00 level=INFO source=ggml.go:494 msg="offloaded 0/25 layers to GPU" Dec 03 14:48:11 veritas ollama[1359]: time=2025-12-03T14:48:11.944-08:00 level=INFO source=device.go:245 msg="model weights" device=CPU size="12.8 GiB" Dec 03 14:48:11 veritas ollama[1359]: time=2025-12-03T14:48:11.944-08:00 level=INFO source=device.go:256 msg="kv cache" device=CPU size="486.0 MiB" Dec 03 14:48:11 veritas ollama[1359]: time=2025-12-03T14:48:11.944-08:00 level=INFO source=device.go:267 msg="compute graph" device=CPU size="139.5 MiB" Dec 03 14:48:11 veritas ollama[1359]: time=2025-12-03T14:48:11.944-08:00 level=INFO source=device.go:272 msg="total memory" size="13.4 GiB" Dec 03 14:48:11 veritas ollama[1359]: time=2025-12-03T14:48:11.944-08:00 level=INFO source=sched.go:517 msg="loaded runners" count=1 Dec 03 14:48:11 veritas ollama[1359]: time=2025-12-03T14:48:11.944-08:00 level=INFO source=server.go:1294 msg="waiting for llama runner to start responding" Dec 03 14:48:11 veritas ollama[1359]: time=2025-12-03T14:48:11.944-08:00 level=INFO source=server.go:1328 msg="waiting for server to become available" status="llm server loading model" Dec 03 14:48:49 veritas ollama[1359]: time=2025-12-03T14:48:49.292-08:00 level=INFO source=server.go:1332 msg="llama runner started in 37.51 seconds" Dec 03 14:48:49 veritas ollama[1359]: [GIN] 2025/12/03 - 14:48:49 | 200 | 37.670049567s | 127.0.0.1 | POST "/api/generate" Dec 03 14:50:44 veritas ollama[1359]: [GIN] 2025/12/03 - 14:50:44 | 200 | 17.345050616s | 127.0.0.1 | POST "/api/chat" Dec 03 14:51:00 veritas ollama[1359]: [GIN] 2025/12/03 - 14:51:00 | 200 | 30.677µs | 127.0.0.1 | GET "/api/version" Dec 03 14:51:02 veritas ollama[1359]: [GIN] 2025/12/03 - 14:51:02 | 200 | 18.154µs | 127.0.0.1 | HEAD "/" Dec 03 14:51:02 veritas ollama[1359]: [GIN] 2025/12/03 - 14:51:02 | 200 | 18.965µs | 127.0.0.1 | GET "/api/ps" Dec 03 15:04:54 veritas ollama[1359]: [GIN] 2025/12/03 - 15:04:54 | 200 | 17.913µs | 127.0.0.1 | HEAD "/" Dec 03 15:04:54 veritas ollama[1359]: [GIN] 2025/12/03 - 15:04:54 | 200 | 985.845µs | 127.0.0.1 | GET "/api/tags" Dec 03 15:04:59 veritas ollama[1359]: [GIN] 2025/12/03 - 15:04:59 | 200 | 17.563µs | 127.0.0.1 | HEAD "/" Dec 03 15:04:59 veritas ollama[1359]: [GIN] 2025/12/03 - 15:04:59 | 200 | 45.993599ms | 127.0.0.1 | POST "/api/show" Dec 03 15:04:59 veritas ollama[1359]: [GIN] 2025/12/03 - 15:04:59 | 200 | 43.656007ms | 127.0.0.1 | POST "/api/show" Dec 03 15:05:00 veritas ollama[1359]: time=2025-12-03T15:05:00.005-08:00 level=INFO source=server.go:209 msg="enabling flash attention" Dec 03 15:05:00 veritas ollama[1359]: time=2025-12-03T15:05:00.006-08:00 level=INFO source=server.go:392 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --model /usr/share/ollama/.ollama/models/blobs/sha256-b112e727c6f18875636c56a779790a590d705aec9e1c0eb5a97d51fc2a778583 --port 44463" Dec 03 15:05:00 veritas ollama[1359]: time=2025-12-03T15:05:00.006-08:00 level=INFO source=sched.go:443 msg="system memory" total="60.4 GiB" free="57.0 GiB" free_swap="8.0 GiB" Dec 03 15:05:00 veritas ollama[1359]: time=2025-12-03T15:05:00.006-08:00 level=INFO source=server.go:702 msg="loading model" "model layers"=25 requested=-1 Dec 03 15:05:00 veritas ollama[1359]: time=2025-12-03T15:05:00.011-08:00 level=INFO source=runner.go:1398 msg="starting ollama engine" Dec 03 15:05:00 veritas ollama[1359]: time=2025-12-03T15:05:00.011-08:00 level=INFO source=runner.go:1433 msg="Server listening on 127.0.0.1:44463" Dec 03 15:05:00 veritas ollama[1359]: time=2025-12-03T15:05:00.018-08:00 level=INFO source=runner.go:1271 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:16000 KvCacheType: NumThreads:8 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" Dec 03 15:05:00 veritas ollama[1359]: time=2025-12-03T15:05:00.043-08:00 level=INFO source=ggml.go:136 msg="" architecture=gptoss file_type=MXFP4 name="" description="" num_tensors=315 num_key_values=30 Dec 03 15:05:00 veritas ollama[1359]: time=2025-12-03T15:05:00.043-08:00 level=INFO source=ggml.go:104 msg=system CPU.0.LLAMAFILE=1 compiler=cgo(gcc) Dec 03 15:05:00 veritas ollama[1359]: time=2025-12-03T15:05:00.060-08:00 level=INFO source=runner.go:1271 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:16000 KvCacheType: NumThreads:8 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" Dec 03 15:05:00 veritas ollama[1359]: time=2025-12-03T15:05:00.170-08:00 level=INFO source=runner.go:1271 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:16000 KvCacheType: NumThreads:8 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" Dec 03 15:05:00 veritas ollama[1359]: time=2025-12-03T15:05:00.170-08:00 level=INFO source=ggml.go:482 msg="offloading 0 repeating layers to GPU" Dec 03 15:05:00 veritas ollama[1359]: time=2025-12-03T15:05:00.170-08:00 level=INFO source=ggml.go:486 msg="offloading output layer to CPU" Dec 03 15:05:00 veritas ollama[1359]: time=2025-12-03T15:05:00.170-08:00 level=INFO source=ggml.go:494 msg="offloaded 0/25 layers to GPU" Dec 03 15:05:00 veritas ollama[1359]: time=2025-12-03T15:05:00.170-08:00 level=INFO source=device.go:245 msg="model weights" device=CPU size="12.8 GiB" Dec 03 15:05:00 veritas ollama[1359]: time=2025-12-03T15:05:00.170-08:00 level=INFO source=device.go:256 msg="kv cache" device=CPU size="486.0 MiB" Dec 03 15:05:00 veritas ollama[1359]: time=2025-12-03T15:05:00.170-08:00 level=INFO source=device.go:267 msg="compute graph" device=CPU size="139.5 MiB" Dec 03 15:05:00 veritas ollama[1359]: time=2025-12-03T15:05:00.170-08:00 level=INFO source=device.go:272 msg="total memory" size="13.4 GiB" Dec 03 15:05:00 veritas ollama[1359]: time=2025-12-03T15:05:00.170-08:00 level=INFO source=sched.go:517 msg="loaded runners" count=1 Dec 03 15:05:00 veritas ollama[1359]: time=2025-12-03T15:05:00.170-08:00 level=INFO source=server.go:1294 msg="waiting for llama runner to start responding" Dec 03 15:05:00 veritas ollama[1359]: time=2025-12-03T15:05:00.170-08:00 level=INFO source=server.go:1328 msg="waiting for server to become available" status="llm server loading model" Dec 03 15:05:01 veritas ollama[1359]: time=2025-12-03T15:05:01.429-08:00 level=INFO source=server.go:1332 msg="llama runner started in 1.42 seconds" Dec 03 15:05:01 veritas ollama[1359]: [GIN] 2025/12/03 - 15:05:01 | 200 | 1.582630134s | 127.0.0.1 | POST "/api/generate" Dec 03 15:05:18 veritas ollama[1359]: [GIN] 2025/12/03 - 15:05:18 | 200 | 13.939118537s | 127.0.0.1 | POST "/api/chat" Dec 03 15:05:23 veritas ollama[1359]: [GIN] 2025/12/03 - 15:05:23 | 200 | 21.309µs | 127.0.0.1 | HEAD "/" Dec 03 15:05:23 veritas ollama[1359]: [GIN] 2025/12/03 - 15:05:23 | 200 | 19.596µs | 127.0.0.1 | GET "/api/ps" Dec 03 15:12:53 veritas ollama[1359]: [GIN] 2025/12/03 - 15:12:53 | 200 | 17.853µs | 127.0.0.1 | HEAD "/" Dec 03 15:12:53 veritas ollama[1359]: [GIN] 2025/12/03 - 15:12:53 | 200 | 10.349µs | 127.0.0.1 | GET "/api/ps" ``` </p> </details> ### OS Linux ### GPU AMD ### CPU AMD ### Ollama version 0.13.1
GiteaMirror added the bug label 2026-05-04 23:13:21 -05:00
Author
Owner

@Theblackcat98 commented on GitHub (Dec 3, 2025):

Here is the log after downgrading to 0.12.1:

Details

Dec 03 15:40:27 veritas systemd[1]: Stopping ollama.service - Ollama Service... Dec 03 15:40:27 veritas systemd[1]: ollama.service: Deactivated successfully. Dec 03 15:40:27 veritas systemd[1]: Stopped ollama.service - Ollama Service.
Dec 03 15:40:27 veritas systemd[1]: Started ollama.service - Ollama Service. Dec 03 15:40:27 veritas ollama[54399]: time=2025-12-03T15:40:27.608-08:00 level=INFO source=routes.go:1475 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:INFO OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/usr/share/ollama/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
Dec 03 15:40:27 veritas ollama[54399]: time=2025-12-03T15:40:27.610-08:00 level=INFO source=images.go:518 msg="total blobs: 86"
Dec 03 15:40:27 veritas ollama[54399]: time=2025-12-03T15:40:27.610-08:00 level=INFO source=images.go:525 msg="total unused blobs removed: 0"
Dec 03 15:40:27 veritas ollama[54399]: time=2025-12-03T15:40:27.610-08:00 level=INFO source=routes.go:1528 msg="Listening on [::]:11434 (version 0.12.1)"
Dec 03 15:40:27 veritas ollama[54399]: time=2025-12-03T15:40:27.610-08:00 level=INFO source=gpu.go:217 msg="looking for compatible GPUs"
Dec 03 15:40:27 veritas ollama[54399]: time=2025-12-03T15:40:27.614-08:00 level=WARN source=amd_linux.go:61 msg="ollama recommends running the https://www.amd.com/en/support/download/linux-drivers.html" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory" Dec 03 15:40:27 veritas ollama[54399]: time=2025-12-03T15:40:27.615-08:00 level=INFO source=amd_linux.go:390 msg="amdgpu is supported" gpu=GPU-ca2ba20945f921d5 gpu_type=gfx1100 Dec 03 15:40:27 veritas ollama[54399]: time=2025-12-03T15:40:27.615-08:00 level=WARN source=amd_linux.go:380 msg="amdgpu is not supported (supported types:[gfx1010 gfx1012 gfx1030 gfx1100 gfx1101 gfx1102 gfx1151 gfx1200 gfx1201 gfx900 gfx906 gfx908 gfx90a gfx942])" gpu_type=gfx1036 gpu=1 library=/usr/local/lib/ollama/rocm Dec 03 15:40:27 veritas ollama[54399]: time=2025-12-03T15:40:27.615-08:00 level=WARN source=amd_linux.go:387 msg="See https://github.com/ollama/ollama/blob/main/docs/gpu.md#overrides for HSA_OVERRIDE_GFX_VERSION usage"
Dec 03 15:40:27 veritas ollama[54399]: time=2025-12-03T15:40:27.615-08:00 level=INFO source=types.go:131 msg="inference compute" id=GPU-ca2ba20945f921d5 library=rocm variant="" compute=gfx1100 driver=0.0 name=1002:744c total="20.0 GiB" available="19.3 GiB" Dec 03 15:40:27 veritas ollama[54399]: time=2025-12-03T15:40:27.615-08:00 level=INFO source=routes.go:1569 msg="entering low vram mode" "total vram"="20.0 GiB" threshold="20.0 GiB"Dec 03 15:40:38 veritas ollama[54399]: [GIN] 2025/12/03 - 15:40:38 | 200 | 43.331µs | 127.0.0.1 | GET "/api/version"
Dec 03 15:40:42 veritas ollama[54399]: [GIN] 2025/12/03 - 15:40:42 | 200 | 26.479µs | 127.0.0.1 | GET "/api/version"
Dec 03 15:40:59 veritas ollama[54399]: [GIN] 2025/12/03 - 15:40:59 | 200 | 26.189µs | 127.0.0.1 | GET "/api/version"
Dec 03 15:41:02 veritas ollama[54399]: [GIN] 2025/12/03 - 15:41:02 | 200 | 22.162µs | 127.0.0.1 | HEAD "/" Dec 03 15:41:02 veritas ollama[54399]: [GIN] 2025/12/03 - 15:41:02 | 200 | 1.050075ms | 127.0.0.1 | GET "/api/tags" Dec 03 15:41:07 veritas ollama[54399]: [GIN] 2025/12/03 - 15:41:07 | 200 | 20.578µs | 127.0.0.1 | HEAD "/" Dec 03 15:41:07 veritas ollama[54399]: [GIN] 2025/12/03 - 15:41:07 | 200 | 45.693055ms | 127.0.0.1 | POST "/api/show" Dec 03 15:41:07 veritas ollama[54399]: time=2025-12-03T15:41:07.529-08:00 level=INFO source=server.go:200 msg="model wants flash attention" Dec 03 15:41:07 veritas ollama[54399]: time=2025-12-03T15:41:07.529-08:00 level=INFO source=server.go:217 msg="enabling flash attention" Dec 03 15:41:07 veritas ollama[54399]: time=2025-12-03T15:41:07.530-08:00 level=INFO source=server.go:399 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --model /usr/share/ollama/.ollama/models/blobs/sha256-b112e727c6f18875636c56a779790a590d705aec9e1c0eb5a97d51fc2a778583 --port 36357"
Dec 03 15:41:07 veritas ollama[54399]: time=2025-12-03T15:41:07.530-08:00 level=INFO source=server.go:672 msg="loading model" "model layers"=25 requested=-1
Dec 03 15:41:07 veritas ollama[54399]: time=2025-12-03T15:41:07.530-08:00 level=INFO source=server.go:678 msg="system memory" total="60.4 GiB" free="56.6 GiB" free_swap="8.0 GiB"
Dec 03 15:41:07 veritas ollama[54399]: time=2025-12-03T15:41:07.530-08:00 level=INFO source=server.go:686 msg="gpu memory" id=GPU-ca2ba20945f921d5 available="18.9 GiB" free="19.3 GiB" minimum="457.0 MiB" overhead="0 B" Dec 03 15:41:07 veritas ollama[54399]: time=2025-12-03T15:41:07.535-08:00 level=INFO source=runner.go:1252 msg="starting ollama engine"
Dec 03 15:41:07 veritas ollama[54399]: time=2025-12-03T15:41:07.535-08:00 level=INFO source=runner.go:1287 msg="Server listening on 127.0.0.1:36357" Dec 03 15:41:07 veritas ollama[54399]: time=2025-12-03T15:41:07.541-08:00 level=INFO source=runner.go:1171 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:16000 KvCacheType: NumThreads:8 GPULayers:25[ID:GPU-ca2ba20945f921d5 Layers:25(0..24)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
Dec 03 15:41:07 veritas ollama[54399]: time=2025-12-03T15:41:07.565-08:00 level=INFO source=ggml.go:131 msg="" architecture=gptoss file_type=MXFP4 name="" description="" num_tensors=315 num_key_values=30
Dec 03 15:41:07 veritas ollama[54399]: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory Dec 03 15:41:07 veritas ollama[54399]: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory Dec 03 15:41:08 veritas ollama[54399]: ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no Dec 03 15:41:08 veritas ollama[54399]: ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no Dec 03 15:41:08 veritas ollama[54399]: ggml_cuda_init: found 1 ROCm devices: Dec 03 15:41:08 veritas ollama[54399]: Device 0: AMD Radeon Graphics, gfx1100 (0x1100), VMM: no, Wave Size: 32, ID: GPU-ca2ba20945f921d5 Dec 03 15:41:08 veritas ollama[54399]: load_backend: loaded ROCm backend from /usr/local/lib/ollama/libggml-hip.so Dec 03 15:41:08 veritas ollama[54399]: load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-icelake.so
Dec 03 15:41:08 veritas ollama[54399]: time=2025-12-03T15:41:08.045-08:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.AVX512=1 CPU.0.AVX512_VBMI=1 CPU.0.AVX512_VNNI=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 ROCm.0.NO_VMM=1 ROCm.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc)
Dec 03 15:41:08 veritas ollama[54399]: time=2025-12-03T15:41:08.177-08:00 level=INFO source=runner.go:1171 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:16000 KvCacheType: NumThreads:8 GPULayers:25[ID:GPU-ca2ba20945f921d5 Layers:25(0..24)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
Dec 03 15:41:08 veritas ollama[54399]: time=2025-12-03T15:41:08.216-08:00 level=INFO source=runner.go:1171 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:16000 KvCacheType: NumThreads:8 GPULayers:25[ID:GPU-ca2ba20945f921d5 Layers:25(0..24)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" Dec 03 15:41:08 veritas ollama[54399]: time=2025-12-03T15:41:08.216-08:00 level=INFO source=ggml.go:487 msg="offloading 24 repeating layers to GPU"
Dec 03 15:41:08 veritas ollama[54399]: time=2025-12-03T15:41:08.216-08:00 level=INFO source=ggml.go:493 msg="offloading output layer to GPU"
Dec 03 15:41:08 veritas ollama[54399]: time=2025-12-03T15:41:08.216-08:00 level=INFO source=ggml.go:498 msg="offloaded 25/25 layers to GPU"
Dec 03 15:41:08 veritas ollama[54399]: time=2025-12-03T15:41:08.216-08:00 level=INFO source=backend.go:310 msg="model weights" device=ROCm0 size="11.8 GiB"
Dec 03 15:41:08 veritas ollama[54399]: time=2025-12-03T15:41:08.216-08:00 level=INFO source=backend.go:315 msg="model weights" device=CPU size="1.1 GiB"
Dec 03 15:41:08 veritas ollama[54399]: time=2025-12-03T15:41:08.216-08:00 level=INFO source=backend.go:321 msg="kv cache" device=ROCm0 size="486.0 MiB"
Dec 03 15:41:08 veritas ollama[54399]: time=2025-12-03T15:41:08.216-08:00 level=INFO source=backend.go:332 msg="compute graph" device=ROCm0 size="129.5 MiB"
Dec 03 15:41:08 veritas ollama[54399]: time=2025-12-03T15:41:08.216-08:00 level=INFO source=backend.go:337 msg="compute graph" device=CPU size="5.6 MiB" Dec 03 15:41:08 veritas ollama[54399]: time=2025-12-03T15:41:08.216-08:00 level=INFO source=backend.go:342 msg="total memory" size="13.4 GiB" Dec 03 15:41:08 veritas ollama[54399]: time=2025-12-03T15:41:08.216-08:00 level=INFO source=sched.go:470 msg="loaded runners" count=1 Dec 03 15:41:08 veritas ollama[54399]: time=2025-12-03T15:41:08.216-08:00 level=INFO source=server.go:1251 msg="waiting for llama runner to start responding" Dec 03 15:41:08 veritas ollama[54399]: time=2025-12-03T15:41:08.216-08:00 level=INFO source=server.go:1285 msg="waiting for server to become available" status="llm server loading model" Dec 03 15:41:09 veritas ollama[54399]: time=2025-12-03T15:41:09.723-08:00 level=INFO source=server.go:1289 msg="llama runner started in 2.19 seconds" Dec 03 15:41:09 veritas ollama[54399]: [GIN] 2025/12/03 - 15:41:09 | 200 | 2.322500244s | 127.0.0.1 | POST "/api/generate"
Dec 03 15:41:12 veritas ollama[54399]: [GIN] 2025/12/03 - 15:41:12 | 200 | 923.045423ms | 127.0.0.1 | POST "/api/chat"

<!-- gh-comment-id:3609301620 --> @Theblackcat98 commented on GitHub (Dec 3, 2025): Here is the log after downgrading to 0.12.1: <details><summary>Details</summary> <p> Dec 03 15:40:27 veritas systemd[1]: Stopping ollama.service - Ollama Service... Dec 03 15:40:27 veritas systemd[1]: ollama.service: Deactivated successfully. Dec 03 15:40:27 veritas systemd[1]: Stopped ollama.service - Ollama Service. Dec 03 15:40:27 veritas systemd[1]: Started ollama.service - Ollama Service. Dec 03 15:40:27 veritas ollama[54399]: time=2025-12-03T15:40:27.608-08:00 level=INFO source=routes.go:1475 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:INFO OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/usr/share/ollama/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]" Dec 03 15:40:27 veritas ollama[54399]: time=2025-12-03T15:40:27.610-08:00 level=INFO source=images.go:518 msg="total blobs: 86" Dec 03 15:40:27 veritas ollama[54399]: time=2025-12-03T15:40:27.610-08:00 level=INFO source=images.go:525 msg="total unused blobs removed: 0" Dec 03 15:40:27 veritas ollama[54399]: time=2025-12-03T15:40:27.610-08:00 level=INFO source=routes.go:1528 msg="Listening on [::]:11434 (version 0.12.1)" Dec 03 15:40:27 veritas ollama[54399]: time=2025-12-03T15:40:27.610-08:00 level=INFO source=gpu.go:217 msg="looking for compatible GPUs" Dec 03 15:40:27 veritas ollama[54399]: time=2025-12-03T15:40:27.614-08:00 level=WARN source=amd_linux.go:61 msg="ollama recommends running the https://www.amd.com/en/support/download/linux-drivers.html" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory" Dec 03 15:40:27 veritas ollama[54399]: time=2025-12-03T15:40:27.615-08:00 level=INFO source=amd_linux.go:390 msg="amdgpu is supported" gpu=GPU-ca2ba20945f921d5 gpu_type=gfx1100 Dec 03 15:40:27 veritas ollama[54399]: time=2025-12-03T15:40:27.615-08:00 level=WARN source=amd_linux.go:380 msg="amdgpu is not supported (supported types:[gfx1010 gfx1012 gfx1030 gfx1100 gfx1101 gfx1102 gfx1151 gfx1200 gfx1201 gfx900 gfx906 gfx908 gfx90a gfx942])" gpu_type=gfx1036 gpu=1 library=/usr/local/lib/ollama/rocm Dec 03 15:40:27 veritas ollama[54399]: time=2025-12-03T15:40:27.615-08:00 level=WARN source=amd_linux.go:387 msg="See https://github.com/ollama/ollama/blob/main/docs/gpu.md#overrides for HSA_OVERRIDE_GFX_VERSION usage" Dec 03 15:40:27 veritas ollama[54399]: time=2025-12-03T15:40:27.615-08:00 level=INFO source=types.go:131 msg="inference compute" id=GPU-ca2ba20945f921d5 library=rocm variant="" compute=gfx1100 driver=0.0 name=1002:744c total="20.0 GiB" available="19.3 GiB" Dec 03 15:40:27 veritas ollama[54399]: time=2025-12-03T15:40:27.615-08:00 level=INFO source=routes.go:1569 msg="entering low vram mode" "total vram"="20.0 GiB" threshold="20.0 GiB"Dec 03 15:40:38 veritas ollama[54399]: [GIN] 2025/12/03 - 15:40:38 | 200 | 43.331µs | 127.0.0.1 | GET "/api/version" Dec 03 15:40:42 veritas ollama[54399]: [GIN] 2025/12/03 - 15:40:42 | 200 | 26.479µs | 127.0.0.1 | GET "/api/version" Dec 03 15:40:59 veritas ollama[54399]: [GIN] 2025/12/03 - 15:40:59 | 200 | 26.189µs | 127.0.0.1 | GET "/api/version" Dec 03 15:41:02 veritas ollama[54399]: [GIN] 2025/12/03 - 15:41:02 | 200 | 22.162µs | 127.0.0.1 | HEAD "/" Dec 03 15:41:02 veritas ollama[54399]: [GIN] 2025/12/03 - 15:41:02 | 200 | 1.050075ms | 127.0.0.1 | GET "/api/tags" Dec 03 15:41:07 veritas ollama[54399]: [GIN] 2025/12/03 - 15:41:07 | 200 | 20.578µs | 127.0.0.1 | HEAD "/" Dec 03 15:41:07 veritas ollama[54399]: [GIN] 2025/12/03 - 15:41:07 | 200 | 45.693055ms | 127.0.0.1 | POST "/api/show" Dec 03 15:41:07 veritas ollama[54399]: time=2025-12-03T15:41:07.529-08:00 level=INFO source=server.go:200 msg="model wants flash attention" Dec 03 15:41:07 veritas ollama[54399]: time=2025-12-03T15:41:07.529-08:00 level=INFO source=server.go:217 msg="enabling flash attention" Dec 03 15:41:07 veritas ollama[54399]: time=2025-12-03T15:41:07.530-08:00 level=INFO source=server.go:399 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --model /usr/share/ollama/.ollama/models/blobs/sha256-b112e727c6f18875636c56a779790a590d705aec9e1c0eb5a97d51fc2a778583 --port 36357" Dec 03 15:41:07 veritas ollama[54399]: time=2025-12-03T15:41:07.530-08:00 level=INFO source=server.go:672 msg="loading model" "model layers"=25 requested=-1 Dec 03 15:41:07 veritas ollama[54399]: time=2025-12-03T15:41:07.530-08:00 level=INFO source=server.go:678 msg="system memory" total="60.4 GiB" free="56.6 GiB" free_swap="8.0 GiB" Dec 03 15:41:07 veritas ollama[54399]: time=2025-12-03T15:41:07.530-08:00 level=INFO source=server.go:686 msg="gpu memory" id=GPU-ca2ba20945f921d5 available="18.9 GiB" free="19.3 GiB" minimum="457.0 MiB" overhead="0 B" Dec 03 15:41:07 veritas ollama[54399]: time=2025-12-03T15:41:07.535-08:00 level=INFO source=runner.go:1252 msg="starting ollama engine" Dec 03 15:41:07 veritas ollama[54399]: time=2025-12-03T15:41:07.535-08:00 level=INFO source=runner.go:1287 msg="Server listening on 127.0.0.1:36357" Dec 03 15:41:07 veritas ollama[54399]: time=2025-12-03T15:41:07.541-08:00 level=INFO source=runner.go:1171 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:16000 KvCacheType: NumThreads:8 GPULayers:25[ID:GPU-ca2ba20945f921d5 Layers:25(0..24)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" Dec 03 15:41:07 veritas ollama[54399]: time=2025-12-03T15:41:07.565-08:00 level=INFO source=ggml.go:131 msg="" architecture=gptoss file_type=MXFP4 name="" description="" num_tensors=315 num_key_values=30 Dec 03 15:41:07 veritas ollama[54399]: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory Dec 03 15:41:07 veritas ollama[54399]: /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory Dec 03 15:41:08 veritas ollama[54399]: ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no Dec 03 15:41:08 veritas ollama[54399]: ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no Dec 03 15:41:08 veritas ollama[54399]: ggml_cuda_init: found 1 ROCm devices: Dec 03 15:41:08 veritas ollama[54399]: Device 0: AMD Radeon Graphics, gfx1100 (0x1100), VMM: no, Wave Size: 32, ID: GPU-ca2ba20945f921d5 Dec 03 15:41:08 veritas ollama[54399]: load_backend: loaded ROCm backend from /usr/local/lib/ollama/libggml-hip.so Dec 03 15:41:08 veritas ollama[54399]: load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-icelake.so Dec 03 15:41:08 veritas ollama[54399]: time=2025-12-03T15:41:08.045-08:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.AVX512=1 CPU.0.AVX512_VBMI=1 CPU.0.AVX512_VNNI=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 ROCm.0.NO_VMM=1 ROCm.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc) Dec 03 15:41:08 veritas ollama[54399]: time=2025-12-03T15:41:08.177-08:00 level=INFO source=runner.go:1171 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:16000 KvCacheType: NumThreads:8 GPULayers:25[ID:GPU-ca2ba20945f921d5 Layers:25(0..24)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" Dec 03 15:41:08 veritas ollama[54399]: time=2025-12-03T15:41:08.216-08:00 level=INFO source=runner.go:1171 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:16000 KvCacheType: NumThreads:8 GPULayers:25[ID:GPU-ca2ba20945f921d5 Layers:25(0..24)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" Dec 03 15:41:08 veritas ollama[54399]: time=2025-12-03T15:41:08.216-08:00 level=INFO source=ggml.go:487 msg="offloading 24 repeating layers to GPU" Dec 03 15:41:08 veritas ollama[54399]: time=2025-12-03T15:41:08.216-08:00 level=INFO source=ggml.go:493 msg="offloading output layer to GPU" Dec 03 15:41:08 veritas ollama[54399]: time=2025-12-03T15:41:08.216-08:00 level=INFO source=ggml.go:498 msg="offloaded 25/25 layers to GPU" Dec 03 15:41:08 veritas ollama[54399]: time=2025-12-03T15:41:08.216-08:00 level=INFO source=backend.go:310 msg="model weights" device=ROCm0 size="11.8 GiB" Dec 03 15:41:08 veritas ollama[54399]: time=2025-12-03T15:41:08.216-08:00 level=INFO source=backend.go:315 msg="model weights" device=CPU size="1.1 GiB" Dec 03 15:41:08 veritas ollama[54399]: time=2025-12-03T15:41:08.216-08:00 level=INFO source=backend.go:321 msg="kv cache" device=ROCm0 size="486.0 MiB" Dec 03 15:41:08 veritas ollama[54399]: time=2025-12-03T15:41:08.216-08:00 level=INFO source=backend.go:332 msg="compute graph" device=ROCm0 size="129.5 MiB" Dec 03 15:41:08 veritas ollama[54399]: time=2025-12-03T15:41:08.216-08:00 level=INFO source=backend.go:337 msg="compute graph" device=CPU size="5.6 MiB" Dec 03 15:41:08 veritas ollama[54399]: time=2025-12-03T15:41:08.216-08:00 level=INFO source=backend.go:342 msg="total memory" size="13.4 GiB" Dec 03 15:41:08 veritas ollama[54399]: time=2025-12-03T15:41:08.216-08:00 level=INFO source=sched.go:470 msg="loaded runners" count=1 Dec 03 15:41:08 veritas ollama[54399]: time=2025-12-03T15:41:08.216-08:00 level=INFO source=server.go:1251 msg="waiting for llama runner to start responding" Dec 03 15:41:08 veritas ollama[54399]: time=2025-12-03T15:41:08.216-08:00 level=INFO source=server.go:1285 msg="waiting for server to become available" status="llm server loading model" Dec 03 15:41:09 veritas ollama[54399]: time=2025-12-03T15:41:09.723-08:00 level=INFO source=server.go:1289 msg="llama runner started in 2.19 seconds" Dec 03 15:41:09 veritas ollama[54399]: [GIN] 2025/12/03 - 15:41:09 | 200 | 2.322500244s | 127.0.0.1 | POST "/api/generate" Dec 03 15:41:12 veritas ollama[54399]: [GIN] 2025/12/03 - 15:41:12 | 200 | 923.045423ms | 127.0.0.1 | POST "/api/chat" </p> </details>
Author
Owner

@stephensrmmartin commented on GitHub (Dec 4, 2025):

Just confirming that I experienced this same issue on 0.13.1 rc1. On vulkan, it would fail to allocate enough vram and kick everything out to cpu. On ROCM, it just failed altogether.

<!-- gh-comment-id:3609458638 --> @stephensrmmartin commented on GitHub (Dec 4, 2025): Just confirming that I experienced this same issue on 0.13.1 rc1. On vulkan, it would fail to allocate enough vram and kick everything out to cpu. On ROCM, it just failed altogether.
Author
Owner

@rick-github commented on GitHub (Dec 4, 2025):

Your device was not detected. Set OLLAMA_DEBUG=2 in the server environment and post the log up to the line that says "inference compute".

<!-- gh-comment-id:3609679882 --> @rick-github commented on GitHub (Dec 4, 2025): Your device was not detected. Set `OLLAMA_DEBUG=2` in the server environment and post the log up to the line that says "inference compute".
Author
Owner

@LukaLoginska commented on GitHub (Dec 4, 2025):

I confirm this happens on Windows 10, using nvidia / cuda .. despite running the ollama release that supposedly fixed GPUs not getting detected, it is still not detecting GPUs correctly with nvidia cuda.

Strangely, adding the ollama DLL folders to the PATH manually seems to fix the GPU detection but replace it with a different issue! Here's the log files so someone who can investigate can look into it.

Ollama installed version: 0.13.1
NVIDIA 581.57 64bit win10-11 drivers installed
NVIDIA cuda 13.0.2 installed - NOTABLY, TO A DIFFERENT DRIVE AND NOT THE DEFAULT INSTALL LOCATION. I cannot test installing this to my C:\ drive, because my C:\ drive is too small to fit anything else in it until I get a new SSD.

SCENARIO 1 - server.log without ollama DLL folders in PATH.log

SCENARIO 2 - server ollama DLL folders manually included in PATH.log

In Scenario 2 (where I added the folder containing ollama's DLL files to the PATH environment variable), take note of the following lines especially:
CUDA error: an unsupported value or parameter was passed to the function
current device: 0, in function ggml_cuda_mul_mat_batched_cublas_impl at C:\a\ollama\ollama\ml\backend\ggml\ggml\src\ggml-cuda\ggml-cuda.cu:2114

I also notice that ollama and nvidia-cuda both have a cublas64_13.dll file - maybe there is a mismatch of version.

(Edit: clarified my first sentence and also adding a pic of my environment variables in Scenario 2. Also be aware I redacted my username to "ll")
Image

<!-- gh-comment-id:3613171447 --> @LukaLoginska commented on GitHub (Dec 4, 2025): I confirm this happens on Windows 10, using nvidia / cuda .. despite running the ollama release that supposedly fixed GPUs not getting detected, it is still not detecting GPUs correctly with nvidia cuda. Strangely, **adding the ollama DLL folders to the PATH manually seems to fix the GPU detection** but replace it with a different issue! Here's the log files so someone who can investigate can look into it. Ollama installed version: 0.13.1 NVIDIA 581.57 64bit win10-11 drivers installed NVIDIA cuda 13.0.2 installed - **NOTABLY, TO A DIFFERENT DRIVE AND NOT THE DEFAULT INSTALL LOCATION. I cannot test installing this to my C:\ drive, because my C:\ drive is too small to fit anything else in it until I get a new SSD.** [SCENARIO 1 - server.log without ollama DLL folders in PATH.log](https://github.com/user-attachments/files/23939529/SCENARIO.1.-.server.log.without.ollama.DLL.folders.in.PATH.log) [SCENARIO 2 - server ollama DLL folders manually included in PATH.log](https://github.com/user-attachments/files/23939192/SCENARIO.2.-.server.ollama.DLL.folders.manually.included.in.PATH.log) In Scenario 2 (where I added the folder containing ollama's DLL files to the PATH environment variable), take note of the following lines especially: **CUDA error: an unsupported value or parameter was passed to the function current device: 0, in function ggml_cuda_mul_mat_batched_cublas_impl at C:\a\ollama\ollama\ml\backend\ggml\ggml\src\ggml-cuda\ggml-cuda.cu:2114** I also notice that ollama and nvidia-cuda both have a cublas64_13.dll file - maybe there is a mismatch of version. (Edit: clarified my first sentence and also adding a pic of my environment variables in Scenario 2. Also be aware I redacted my username to "ll") <img width="471" height="242" alt="Image" src="https://github.com/user-attachments/assets/f09a56aa-121c-491d-92b6-6d34930360a7" />
Author
Owner

@rick-github commented on GitHub (Dec 4, 2025):

Set OLLAMA_DEBUG=2 in the server environment and post the log up to the line that says "inference compute".

<!-- gh-comment-id:3613228248 --> @rick-github commented on GitHub (Dec 4, 2025): > Set OLLAMA_DEBUG=2 in the server environment and post the log up to the line that says "inference compute".
Author
Owner

@LukaLoginska commented on GitHub (Dec 4, 2025):

@rick-github Oops sorry!

Yes, I've fully confirmed now that adding the PATH env for \lib\lib\ollama fixes the GPU detection issue. Can OP try the same?

Here's the logs (this time I used OLLAMA_DEBUG=2 in my environment variables + restarted ollama):
SCENARIO 3 - only include one path + OLLAMA_DEBUG.log

Image Image
<!-- gh-comment-id:3613257779 --> @LukaLoginska commented on GitHub (Dec 4, 2025): @rick-github Oops sorry! Yes, I've fully confirmed now that adding the PATH env for <MyOllamaInstallationDirectory>\lib\lib\ollama fixes the GPU detection issue. Can OP try the same? Here's the logs (this time I used OLLAMA_DEBUG=2 in my environment variables + restarted ollama): [SCENARIO 3 - only include one path + OLLAMA_DEBUG.log](https://github.com/user-attachments/files/23939620/SCENARIO.3.-.only.include.one.path.%2B.OLLAMA_DEBUG.log) <img width="391" height="192" alt="Image" src="https://github.com/user-attachments/assets/e6ed2c24-c633-4d22-91fa-245e587dece7" /> <img width="771" height="506" alt="Image" src="https://github.com/user-attachments/assets/b3dda035-34e1-485b-93d4-a88fb1d6fba3" />
Author
Owner

@rick-github commented on GitHub (Dec 4, 2025):

OLLAMA_DEBUG is not set to 2 in the server environment.

time=2025-12-04T17:52:20.611+01:00 level=WARN source=runner.go:501 msg="potentially incompatible library detected in PATH" location=C:\Users\kenny\AppData\Local\Programs\Ollama\lib\lib\ollama\ggml-base.dll

There are too many libss in the path. How did you install ollama?

<!-- gh-comment-id:3613300691 --> @rick-github commented on GitHub (Dec 4, 2025): `OLLAMA_DEBUG` is not set to 2 in the server environment. ``` time=2025-12-04T17:52:20.611+01:00 level=WARN source=runner.go:501 msg="potentially incompatible library detected in PATH" location=C:\Users\kenny\AppData\Local\Programs\Ollama\lib\lib\ollama\ggml-base.dll ``` There are too many `libs`s in the path. How did you install ollama?
Author
Owner

@puresick commented on GitHub (Dec 5, 2025):

Just confirming that I experienced this same issue on 0.13.1 rc1. On vulkan, it would fail to allocate enough vram and kick everything out to cpu. On ROCM, it just failed altogether.

Experiencing the same, but only for some models, especially the new ministral models. qwen3:30b-a3b for example works fine on my GPU via Vulkan. ROCm is generally broken.

Other thread including some more detail and system configuration as reference: https://github.com/ollama/ollama/issues/13312#issuecomment-3607321769

<!-- gh-comment-id:3616019688 --> @puresick commented on GitHub (Dec 5, 2025): > Just confirming that I experienced this same issue on 0.13.1 rc1. On vulkan, it would fail to allocate enough vram and kick everything out to cpu. On ROCM, it just failed altogether. Experiencing the same, but only for some models, especially the new ministral models. `qwen3:30b-a3b` for example works fine on my GPU via Vulkan. ROCm is generally broken. Other thread including some more detail and system configuration as reference: https://github.com/ollama/ollama/issues/13312#issuecomment-3607321769
Author
Owner

@ndrewpj commented on GitHub (Dec 7, 2025):

Confirmed on ollama v.0.13 v.0.13.1 on AMD Strix Halo - only CPU is used and detected. Downgrading to ollama v.0.12.11 works

<!-- gh-comment-id:3621877438 --> @ndrewpj commented on GitHub (Dec 7, 2025): Confirmed on ollama v.0.13 v.0.13.1 on AMD Strix Halo - only CPU is used and detected. Downgrading to ollama v.0.12.11 works
Author
Owner

@e1ke commented on GitHub (Dec 9, 2025):

Confirmed on ollama v.0.13 v.0.13.1 on AMD Strix Halo - only CPU is used and detected. Downgrading to ollama v.0.12.11 works

thanks, working
dockerimage ollama/ollama:0.12.11-rocm
confirmed on gmktec evo-x2 + 128gb + Ryzen AI Max+ 395

<!-- gh-comment-id:3634670771 --> @e1ke commented on GitHub (Dec 9, 2025): > Confirmed on ollama v.0.13 v.0.13.1 on AMD Strix Halo - only CPU is used and detected. Downgrading to ollama v.0.12.11 works thanks, working `dockerimage ollama/ollama:0.12.11-rocm` confirmed on gmktec evo-x2 + 128gb + Ryzen AI Max+ 395
Author
Owner

@dhiltgen commented on GitHub (Dec 10, 2025):

I think there may be multiple unrelated problems being reported on this issue.

@Theblackcat98 is the latest version still failing to discover your discrete AMD GPU on linux? If so, can you share a server startup log with OLLAMA_DEBUG=2 set so we can see what's going wrong?

@stephensrmmartin it sounds like you do not have a GPU discovery problem, but are having trouble getting models loading in vulkan. There are other issues in the backlog tracking vulkan memory allocation challenges which may be related to what you're seeing. Please see if you can find an issue that matches your scenario, and if not, go ahead and file a new one with reproduction steps (what model, what GPU, etc.)

@LukaLoginska is the latest version still failing to discover your discrete NVIDIA GPU on Windows? If so, can you share a server startup log with $env:OLLAMA_DEBUG="2" set so we can see what's going wrong? It does seem that something in your PATH may be conflicting with the versions we bundle, and these logs should help narrow it down.

@puresick it sounds like you're experiencing more specific vulkan issues with some tensor operation(s) on your device. We have other issues in the backlog tracking Vulkan model specific issues. Please see if you can find one that matches your scenario, and if not, go ahead and file a new issue with the model and specifics about your GPU so we can track it.

@ndrewpj are you on linux, or windows, and are you trying to use vulkan, or ROCm? Please share your server log with OLLAMA_DEBUG=2 set, up to the point it reports "inference compute".

<!-- gh-comment-id:3638901585 --> @dhiltgen commented on GitHub (Dec 10, 2025): I think there may be multiple unrelated problems being reported on this issue. @Theblackcat98 is the latest version still failing to discover your discrete AMD GPU on linux? If so, can you share a server startup log with `OLLAMA_DEBUG=2` set so we can see what's going wrong? @stephensrmmartin it sounds like you do not have a GPU discovery problem, but are having trouble getting models loading in vulkan. There are other issues in the backlog tracking vulkan memory allocation challenges which may be related to what you're seeing. Please see if you can find an issue that matches your scenario, and if not, go ahead and file a new one with reproduction steps (what model, what GPU, etc.) @LukaLoginska is the latest version still failing to discover your discrete NVIDIA GPU on Windows? If so, can you share a server startup log with `$env:OLLAMA_DEBUG="2"` set so we can see what's going wrong? It does seem that something in your PATH may be conflicting with the versions we bundle, and these logs should help narrow it down. @puresick it sounds like you're experiencing more specific vulkan issues with some tensor operation(s) on your device. We have other issues in the backlog tracking Vulkan model specific issues. Please see if you can find one that matches your scenario, and if not, go ahead and file a new issue with the model and specifics about your GPU so we can track it. @ndrewpj are you on linux, or windows, and are you trying to use vulkan, or ROCm? Please share your server log with `OLLAMA_DEBUG=2` set, up to the point it reports "inference compute".
Author
Owner

@cheatofrom commented on GitHub (Dec 11, 2025):

I am the same, my graphics card is Nvidia A5000. I always lose my CPU for no reason while using ollama, and the graphics card cannot be detected. Then you can restart and use the graphics card again. A while ago, there was an update that didn't seem to have this issue, but after I updated it to adapt to the qwen3-vl version, this problem came back again

<!-- gh-comment-id:3640708357 --> @cheatofrom commented on GitHub (Dec 11, 2025): I am the same, my graphics card is Nvidia A5000. I always lose my CPU for no reason while using ollama, and the graphics card cannot be detected. Then you can restart and use the graphics card again. A while ago, there was an update that didn't seem to have this issue, but after I updated it to adapt to the qwen3-vl version, this problem came back again
Author
Owner

@cheatofrom commented on GitHub (Dec 11, 2025):

I am using Docker and I am quite familiar with the settings for using GPUs in Docker. There is no need to worry about this, but Ollama is always occasionally tuned to the CPU

<!-- gh-comment-id:3640731153 --> @cheatofrom commented on GitHub (Dec 11, 2025): I am using Docker and I am quite familiar with the settings for using GPUs in Docker. There is no need to worry about this, but Ollama is always occasionally tuned to the CPU
Author
Owner
<!-- gh-comment-id:3640965244 --> @rick-github commented on GitHub (Dec 11, 2025): https://github.com/ollama/ollama/blob/main/docs/troubleshooting.mdx#linux-docker
Author
Owner

@rastaman commented on GitHub (Dec 15, 2025):

I have the same issue on 2 computers with AMD iGPU (HX370/890M/128Go and 8945HS/780M/64Go), since version 0.13.* the VRAM is not recognized anymore and so all layers are deferred to CPU. The OS is 24.04LTS with ROCm.

Ollama 0.12.11 logs:

Dec 15 17:52:59 xlab ollama[3058741]: time=2025-12-15T17:52:59.184+01:00 level=TRACE source=runner.go:448 msg="runner enumerated devices" OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/rocm]" devices="[{DeviceID:{ID:0 Library:ROCm} Name:ROCm0 Description:AMD Radeon Graphics FilterID: Integrated:true PCIID:0000:c5:00.0 TotalMemory:33071665152 FreeMemory:32600580096 ComputeMajor:17 ComputeMinor:0 DriverMajor:60342 DriverMinor:13 LibraryPath:[/usr/lib/ollama /usr/lib/ollama/rocm]}]"
Dec 15 17:52:59 xlab ollama[3058741]: time=2025-12-15T17:52:59.184+01:00 level=DEBUG source=runner.go:418 msg="bootstrap discovery took" duration=2.255365555s OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/rocm]" extra_envs="map[GGML_CUDA_INIT:1 ROCR_VISIBLE_DEVICES:0]"
Dec 15 17:52:59 xlab ollama[3058741]: time=2025-12-15T17:52:59.184+01:00 level=TRACE source=runner.go:156 msg="supported GPU library combinations before filtering" supported=map[ROCm:map[/usr/lib/ollama/rocm:map[0:0]]]
Dec 15 17:52:59 xlab ollama[3058741]: time=2025-12-15T17:52:59.184+01:00 level=DEBUG source=runner.go:175 msg="adjusting filtering IDs" FilterID=0 new_ID=0
Dec 15 17:52:59 xlab ollama[3058741]: time=2025-12-15T17:52:59.184+01:00 level=DEBUG source=runner.go:40 msg="GPU bootstrap discovery took" duration=3.592841559s
Dec 15 17:52:59 xlab ollama[3058741]: time=2025-12-15T17:52:59.184+01:00 level=INFO source=types.go:42 msg="inference compute" id=0 filter_id=0 library=ROCm compute=gfx1100 name=ROCm0 description="AMD Radeon Graphics" libdirs=ollama,rocm driver=60342.13 pci_id=0000:c5:00.0 type=iGPU total="30.8 GiB" available="30.7 GiB"

Ollama 0.13.3 logs:

Dec 15 17:48:13 xlab ollama[3056082]: time=2025-12-15T17:48:13.138+01:00 level=TRACE source=runner.go:467 msg="runner enumerated devices" OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/rocm]" devices="[{DeviceID:{ID:0 Library:ROCm} Name:ROCm0 Description:AMD Radeon Graphics FilterID: Integrated:true PCIID:0000:c5:00.0 TotalMemory:1073741824 FreeMemory:652193792 ComputeMajor:17 ComputeMinor:0 DriverMajor:60342 DriverMinor:13 LibraryPath:[/usr/lib/ollama /usr/lib/ollama/rocm]}]"
Dec 15 17:48:13 xlab ollama[3056082]: time=2025-12-15T17:48:13.138+01:00 level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=2.387494025s OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/rocm]" extra_envs="map[GGML_CUDA_INIT:1 ROCR_VISIBLE_DEVICES:0]"
Dec 15 17:48:13 xlab ollama[3056082]: time=2025-12-15T17:48:13.138+01:00 level=TRACE source=runner.go:174 msg="supported GPU library combinations before filtering" supported=map[ROCm:map[/usr/lib/ollama/rocm:map[0:0]]]
Dec 15 17:48:13 xlab ollama[3056082]: time=2025-12-15T17:48:13.138+01:00 level=DEBUG source=runner.go:193 msg="adjusting filtering IDs" FilterID=0 new_ID=0
Dec 15 17:48:13 xlab ollama[3056082]: time=2025-12-15T17:48:13.138+01:00 level=DEBUG source=runner.go:40 msg="GPU bootstrap discovery took" duration=3.90757218s
Dec 15 17:48:13 xlab ollama[3056082]: time=2025-12-15T17:48:13.138+01:00 level=INFO source=types.go:42 msg="inference compute" id=0 filter_id=0 library=ROCm compute=gfx1100 name=ROCm0 description="AMD Radeon Graphics" libdirs=ollama,rocm driver=60342.13 pci_id=0000:c5:00.0 type=iGPU total="1.0 GiB" available="622.2 MiB" <------ low VRAM detected (no GTT seen? https://github.com/ollama/ollama/issues/5471 ?)
Dec 15 17:48:13 xlab ollama[3056082]: time=2025-12-15T17:48:13.138+01:00 level=INFO source=routes.go:1648  msg="entering low vram mode" "total vram"="1.0 GiB" threshold="20.0 GiB" <------ low VRAM detected

So for me it is the VRAM detection which is broken i think. It is in version 0.12.5 or 6 that the VRAM started to be fully seen on my systems IIRC. I switched back to 0.12.11 meanwhile. Unfortunately, i cannot pull new models anymore with 0.12 :-(

Edit: It is the same symptom but another issue perhaps? (even https://github.com/ollama/ollama/issues/13336 i see also memory allocation errors to GPUs for both Nvidia and rocm? or https://github.com/ollama/ollama/pull/13196 , anyway thanks for the good code!) Correct me as needed, best regards

<!-- gh-comment-id:3656902322 --> @rastaman commented on GitHub (Dec 15, 2025): I have the same issue on 2 computers with AMD iGPU (HX370/890M/128Go and 8945HS/780M/64Go), since version 0.13.* the VRAM is not recognized anymore and so all layers are deferred to CPU. The OS is 24.04LTS with ROCm. Ollama 0.12.11 logs: ```txt Dec 15 17:52:59 xlab ollama[3058741]: time=2025-12-15T17:52:59.184+01:00 level=TRACE source=runner.go:448 msg="runner enumerated devices" OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/rocm]" devices="[{DeviceID:{ID:0 Library:ROCm} Name:ROCm0 Description:AMD Radeon Graphics FilterID: Integrated:true PCIID:0000:c5:00.0 TotalMemory:33071665152 FreeMemory:32600580096 ComputeMajor:17 ComputeMinor:0 DriverMajor:60342 DriverMinor:13 LibraryPath:[/usr/lib/ollama /usr/lib/ollama/rocm]}]" Dec 15 17:52:59 xlab ollama[3058741]: time=2025-12-15T17:52:59.184+01:00 level=DEBUG source=runner.go:418 msg="bootstrap discovery took" duration=2.255365555s OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/rocm]" extra_envs="map[GGML_CUDA_INIT:1 ROCR_VISIBLE_DEVICES:0]" Dec 15 17:52:59 xlab ollama[3058741]: time=2025-12-15T17:52:59.184+01:00 level=TRACE source=runner.go:156 msg="supported GPU library combinations before filtering" supported=map[ROCm:map[/usr/lib/ollama/rocm:map[0:0]]] Dec 15 17:52:59 xlab ollama[3058741]: time=2025-12-15T17:52:59.184+01:00 level=DEBUG source=runner.go:175 msg="adjusting filtering IDs" FilterID=0 new_ID=0 Dec 15 17:52:59 xlab ollama[3058741]: time=2025-12-15T17:52:59.184+01:00 level=DEBUG source=runner.go:40 msg="GPU bootstrap discovery took" duration=3.592841559s Dec 15 17:52:59 xlab ollama[3058741]: time=2025-12-15T17:52:59.184+01:00 level=INFO source=types.go:42 msg="inference compute" id=0 filter_id=0 library=ROCm compute=gfx1100 name=ROCm0 description="AMD Radeon Graphics" libdirs=ollama,rocm driver=60342.13 pci_id=0000:c5:00.0 type=iGPU total="30.8 GiB" available="30.7 GiB" ``` Ollama 0.13.3 logs: ```txt Dec 15 17:48:13 xlab ollama[3056082]: time=2025-12-15T17:48:13.138+01:00 level=TRACE source=runner.go:467 msg="runner enumerated devices" OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/rocm]" devices="[{DeviceID:{ID:0 Library:ROCm} Name:ROCm0 Description:AMD Radeon Graphics FilterID: Integrated:true PCIID:0000:c5:00.0 TotalMemory:1073741824 FreeMemory:652193792 ComputeMajor:17 ComputeMinor:0 DriverMajor:60342 DriverMinor:13 LibraryPath:[/usr/lib/ollama /usr/lib/ollama/rocm]}]" Dec 15 17:48:13 xlab ollama[3056082]: time=2025-12-15T17:48:13.138+01:00 level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=2.387494025s OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/rocm]" extra_envs="map[GGML_CUDA_INIT:1 ROCR_VISIBLE_DEVICES:0]" Dec 15 17:48:13 xlab ollama[3056082]: time=2025-12-15T17:48:13.138+01:00 level=TRACE source=runner.go:174 msg="supported GPU library combinations before filtering" supported=map[ROCm:map[/usr/lib/ollama/rocm:map[0:0]]] Dec 15 17:48:13 xlab ollama[3056082]: time=2025-12-15T17:48:13.138+01:00 level=DEBUG source=runner.go:193 msg="adjusting filtering IDs" FilterID=0 new_ID=0 Dec 15 17:48:13 xlab ollama[3056082]: time=2025-12-15T17:48:13.138+01:00 level=DEBUG source=runner.go:40 msg="GPU bootstrap discovery took" duration=3.90757218s Dec 15 17:48:13 xlab ollama[3056082]: time=2025-12-15T17:48:13.138+01:00 level=INFO source=types.go:42 msg="inference compute" id=0 filter_id=0 library=ROCm compute=gfx1100 name=ROCm0 description="AMD Radeon Graphics" libdirs=ollama,rocm driver=60342.13 pci_id=0000:c5:00.0 type=iGPU total="1.0 GiB" available="622.2 MiB" <------ low VRAM detected (no GTT seen? https://github.com/ollama/ollama/issues/5471 ?) Dec 15 17:48:13 xlab ollama[3056082]: time=2025-12-15T17:48:13.138+01:00 level=INFO source=routes.go:1648 msg="entering low vram mode" "total vram"="1.0 GiB" threshold="20.0 GiB" <------ low VRAM detected ``` So for me it is the VRAM detection which is broken i think. It is in version 0.12.5 or 6 that the VRAM started to be fully seen on my systems IIRC. I switched back to 0.12.11 meanwhile. Unfortunately, i cannot pull new models anymore with 0.12 :-( Edit: It is the same symptom but another issue perhaps? (even https://github.com/ollama/ollama/issues/13336 i see also memory allocation errors to GPUs for both Nvidia and rocm? or https://github.com/ollama/ollama/pull/13196 , anyway thanks for the good code!) Correct me as needed, best regards
Author
Owner

@rick-github commented on GitHub (Jan 14, 2026):

0.14.0 includes the GTT/VRAM PR, does it resolve the issue?

<!-- gh-comment-id:3749128120 --> @rick-github commented on GitHub (Jan 14, 2026): [0.14.0](https://github.com/ollama/ollama/releases/tag/v0.14.0) includes the GTT/VRAM PR, does it resolve the issue?
Author
Owner

@rastaman commented on GitHub (Jan 14, 2026):

i check in a few hours, thanks for the tip!

<!-- gh-comment-id:3749545511 --> @rastaman commented on GitHub (Jan 14, 2026): i check in a few hours, thanks for the tip!
Author
Owner

@rastaman commented on GitHub (Jan 16, 2026):

Hi @rick-github ,

In my case the issue is fiex, using 0.14.2 the VRAM is recognized with my settings on the 2 systems.

XLab (8945HS/780M):

Jan 16 13:10:18 xlab ollama[1665315]: time=2026-01-16T13:10:18.357+01:00 level=DEBUG source=runner.go:1380 msg="dummy model load took" duration=3.610068387s
Jan 16 13:10:18 xlab ollama[1665315]: ggml_hip_get_device_memory searching for device 0000:c5:00.0
Jan 16 13:10:18 xlab ollama[1665315]: ggml_backend_cuda_device_get_memory device 0000:c5:00.0 utilizing AMD specific memory reporting free: 33301868544 total: 34145402880
Jan 16 13:10:18 xlab ollama[1665315]: time=2026-01-16T13:10:18.357+01:00 level=DEBUG source=runner.go:1385 msg="gathering device infos took" duration=316.84µs
Jan 16 13:10:18 xlab ollama[1665315]: time=2026-01-16T13:10:18.357+01:00 level=TRACE source=runner.go:467 msg="runner enumerated devices" OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/rocm]" devices="[{DeviceID:{ID:0 Library:ROCm} Name:ROCm0 Description:AMD Radeon Graphics FilterID: Integrated:true PCIID:0000:c5:00.0 TotalMemory:34145402880 FreeMemory:33301868544 ComputeMajor:17 ComputeMinor:0 DriverMajor:60342 DriverMinor:13 LibraryPath:[/usr/lib/ollama /usr/lib/ollama/rocm]}]"
Jan 16 13:10:18 xlab ollama[1665315]: time=2026-01-16T13:10:18.358+01:00 level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=3.622165
37s OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/rocm]" extra_envs="map[GGML_CUDA_INIT:1 ROCR_VISIBLE_DEVICES:0]"
Jan 16 13:10:18 xlab ollama[1665315]: time=2026-01-16T13:10:18.358+01:00 level=TRACE source=runner.go:174 msg="supported GPU library combinations before filtering" supported=map[ROCm:map[/usr/lib/ollama/rocm:map[0:0]]]
Jan 16 13:10:18 xlab ollama[1665315]: time=2026-01-16T13:10:18.358+01:00 level=DEBUG source=runner.go:193 msg="adjusting filtering IDs" FilterID=0 new_ID=0
Jan 16 13:10:18 xlab ollama[1665315]: time=2026-01-16T13:10:18.358+01:00 level=DEBUG source=runner.go:40 msg="GPU bootstrap discovery took" duration=6.013717596s
Jan 16 13:10:18 xlab ollama[1665315]: time=2026-01-16T13:10:18.358+01:00 level=INFO source=types.go:42 msg="inference compute" id=0 filter_id=0 library=ROCm compute=gfx1100 name=ROCm0 description="AMD Radeon Graphics" libdirs=ollama,rocm driver=60342.13 pci_id=0000:c5:00.0 type=iGPU total="31.8 GiB" availab
le="31.4 GiB"

Zlab (HX370/890M)

Jan 16 13:12:33 zlab ollama[1878001]: time=2026-01-16T13:12:33.755+01:00 level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=673.179999ms OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/rocm]" extra_envs=map[]
Jan 16 13:12:33 zlab ollama[1878001]: time=2026-01-16T13:12:33.756+01:00 level=DEBUG source=runner.go:124 msg="evaluating which, if any, devices to filter out" initial_count=1
Jan 16 13:12:33 zlab ollama[1878001]: time=2026-01-16T13:12:33.756+01:00 level=DEBUG source=runner.go:146 msg="verifying if device is supported" library=/usr/lib/ollama/rocm description="AMD Radeon Graphics" compute=gfx1151 id=0 pci_id=0000:c5:00.0
Jan 16 13:12:33 zlab ollama[1878001]: time=2026-01-16T13:12:33.756+01:00 level=INFO source=server.go:429 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 36793"
Jan 16 13:12:33 zlab ollama[1878001]: time=2026-01-16T13:12:33.756+01:00 level=DEBUG source=server.go:430 msg=subprocess PATH=$PATH OLLAMA_DEBUG=1 OLLAMA_
KV_CACHE_TYPE=q8_0 OLLAMA_CONTEXT_LENGTH=262144 OLLAMA_HOST=http://0.0.0.0:11434 HSA_OVERRIDE_GFX_VERSION=11.5.1 LD_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/rocm OLLAMA_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/rocm ROCR_VISIBLE_DEVICES=0 GGML_CUDA_INIT=1
Jan 16 13:12:34 zlab ollama[1878001]: time=2026-01-16T13:12:34.374+01:00 level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=618.342511ms OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/rocm]" extra_envs="map[GGML_CUDA_INIT:1 ROCR_VISIBLE_DEVICES:0]"
Jan 16 13:12:34 zlab ollama[1878001]: time=2026-01-16T13:12:34.374+01:00 level=DEBUG source=runner.go:193 msg="adjusting filtering IDs" FilterID=0 new_ID=0
Jan 16 13:12:34 zlab ollama[1878001]: time=2026-01-16T13:12:34.374+01:00 level=DEBUG source=runner.go:40 msg="GPU bootstrap discovery took" duration=1.327936986s
Jan 16 13:12:34 zlab ollama[1878001]: time=2026-01-16T13:12:34.374+01:00 level=INFO source=types.go:42 msg="inference compute" id=0 filter_id=0 library=ROCm compute=gfx1151 name=ROCm0 description="AMD Radeon Graphics" libdirs=ollama,rocm driver=60342.13 pci_id=0000:c5:00.0 type=iGPU total="113.0 GiB" available="111.8 GiB"
rastaman@zlab:~$ ollama --version
ollama version is 0.14.2

And prompts works. Thanks a lot for the release! So for me on AMD iGPUs it is validated 🙏 👍

<!-- gh-comment-id:3759780285 --> @rastaman commented on GitHub (Jan 16, 2026): Hi @rick-github , In my case the issue is fiex, using 0.14.2 the VRAM is recognized with my settings on the 2 systems. XLab (8945HS/780M): ```txt Jan 16 13:10:18 xlab ollama[1665315]: time=2026-01-16T13:10:18.357+01:00 level=DEBUG source=runner.go:1380 msg="dummy model load took" duration=3.610068387s Jan 16 13:10:18 xlab ollama[1665315]: ggml_hip_get_device_memory searching for device 0000:c5:00.0 Jan 16 13:10:18 xlab ollama[1665315]: ggml_backend_cuda_device_get_memory device 0000:c5:00.0 utilizing AMD specific memory reporting free: 33301868544 total: 34145402880 Jan 16 13:10:18 xlab ollama[1665315]: time=2026-01-16T13:10:18.357+01:00 level=DEBUG source=runner.go:1385 msg="gathering device infos took" duration=316.84µs Jan 16 13:10:18 xlab ollama[1665315]: time=2026-01-16T13:10:18.357+01:00 level=TRACE source=runner.go:467 msg="runner enumerated devices" OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/rocm]" devices="[{DeviceID:{ID:0 Library:ROCm} Name:ROCm0 Description:AMD Radeon Graphics FilterID: Integrated:true PCIID:0000:c5:00.0 TotalMemory:34145402880 FreeMemory:33301868544 ComputeMajor:17 ComputeMinor:0 DriverMajor:60342 DriverMinor:13 LibraryPath:[/usr/lib/ollama /usr/lib/ollama/rocm]}]" Jan 16 13:10:18 xlab ollama[1665315]: time=2026-01-16T13:10:18.358+01:00 level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=3.622165 37s OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/rocm]" extra_envs="map[GGML_CUDA_INIT:1 ROCR_VISIBLE_DEVICES:0]" Jan 16 13:10:18 xlab ollama[1665315]: time=2026-01-16T13:10:18.358+01:00 level=TRACE source=runner.go:174 msg="supported GPU library combinations before filtering" supported=map[ROCm:map[/usr/lib/ollama/rocm:map[0:0]]] Jan 16 13:10:18 xlab ollama[1665315]: time=2026-01-16T13:10:18.358+01:00 level=DEBUG source=runner.go:193 msg="adjusting filtering IDs" FilterID=0 new_ID=0 Jan 16 13:10:18 xlab ollama[1665315]: time=2026-01-16T13:10:18.358+01:00 level=DEBUG source=runner.go:40 msg="GPU bootstrap discovery took" duration=6.013717596s Jan 16 13:10:18 xlab ollama[1665315]: time=2026-01-16T13:10:18.358+01:00 level=INFO source=types.go:42 msg="inference compute" id=0 filter_id=0 library=ROCm compute=gfx1100 name=ROCm0 description="AMD Radeon Graphics" libdirs=ollama,rocm driver=60342.13 pci_id=0000:c5:00.0 type=iGPU total="31.8 GiB" availab le="31.4 GiB" ``` Zlab (HX370/890M) ```txt Jan 16 13:12:33 zlab ollama[1878001]: time=2026-01-16T13:12:33.755+01:00 level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=673.179999ms OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/rocm]" extra_envs=map[] Jan 16 13:12:33 zlab ollama[1878001]: time=2026-01-16T13:12:33.756+01:00 level=DEBUG source=runner.go:124 msg="evaluating which, if any, devices to filter out" initial_count=1 Jan 16 13:12:33 zlab ollama[1878001]: time=2026-01-16T13:12:33.756+01:00 level=DEBUG source=runner.go:146 msg="verifying if device is supported" library=/usr/lib/ollama/rocm description="AMD Radeon Graphics" compute=gfx1151 id=0 pci_id=0000:c5:00.0 Jan 16 13:12:33 zlab ollama[1878001]: time=2026-01-16T13:12:33.756+01:00 level=INFO source=server.go:429 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 36793" Jan 16 13:12:33 zlab ollama[1878001]: time=2026-01-16T13:12:33.756+01:00 level=DEBUG source=server.go:430 msg=subprocess PATH=$PATH OLLAMA_DEBUG=1 OLLAMA_ KV_CACHE_TYPE=q8_0 OLLAMA_CONTEXT_LENGTH=262144 OLLAMA_HOST=http://0.0.0.0:11434 HSA_OVERRIDE_GFX_VERSION=11.5.1 LD_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/rocm OLLAMA_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/rocm ROCR_VISIBLE_DEVICES=0 GGML_CUDA_INIT=1 Jan 16 13:12:34 zlab ollama[1878001]: time=2026-01-16T13:12:34.374+01:00 level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=618.342511ms OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/rocm]" extra_envs="map[GGML_CUDA_INIT:1 ROCR_VISIBLE_DEVICES:0]" Jan 16 13:12:34 zlab ollama[1878001]: time=2026-01-16T13:12:34.374+01:00 level=DEBUG source=runner.go:193 msg="adjusting filtering IDs" FilterID=0 new_ID=0 Jan 16 13:12:34 zlab ollama[1878001]: time=2026-01-16T13:12:34.374+01:00 level=DEBUG source=runner.go:40 msg="GPU bootstrap discovery took" duration=1.327936986s Jan 16 13:12:34 zlab ollama[1878001]: time=2026-01-16T13:12:34.374+01:00 level=INFO source=types.go:42 msg="inference compute" id=0 filter_id=0 library=ROCm compute=gfx1151 name=ROCm0 description="AMD Radeon Graphics" libdirs=ollama,rocm driver=60342.13 pci_id=0000:c5:00.0 type=iGPU total="113.0 GiB" available="111.8 GiB" ``` ```sh rastaman@zlab:~$ ollama --version ollama version is 0.14.2 ``` And prompts works. Thanks a lot for the release! So for me on AMD iGPUs it is validated 🙏 👍
Author
Owner

@chaz-clark commented on GitHub (Apr 13, 2026):

Here is my setup, i had to downgrade from 20.5 to 13.0 to get my GPU to recognize

Pop!_OS 24.04 with custom kernel 6.17.x
RX 5700 XT gfx1010:xnack-
Ollama 0.20.5 — GPU not detected, gpu_count=0
Ollama 0.13.0 — GPU works perfectly with ROCm
HSA_OVERRIDE_GFX_VERSION=10.1.0
The fix of copying ROCm 6.3 gfx1010 TensileLibrary files

<!-- gh-comment-id:4238853741 --> @chaz-clark commented on GitHub (Apr 13, 2026): Here is my setup, i had to downgrade from 20.5 to 13.0 to get my GPU to recognize >Pop!_OS 24.04 with custom kernel 6.17.x RX 5700 XT gfx1010:xnack- Ollama 0.20.5 — GPU not detected, gpu_count=0 Ollama 0.13.0 — GPU works perfectly with ROCm HSA_OVERRIDE_GFX_VERSION=10.1.0 The fix of copying ROCm 6.3 gfx1010 TensileLibrary files
Author
Owner

@rick-github commented on GitHub (Apr 13, 2026):

Server logs will aid in debugging.

<!-- gh-comment-id:4238867674 --> @rick-github commented on GitHub (Apr 13, 2026): [Server logs](https://docs.ollama.com/troubleshooting) will aid in debugging.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#70856