[GH-ISSUE #13103] ollama with vulkan specified, does not load to GPU #70733

Closed
opened 2026-05-04 22:46:55 -05:00 by GiteaMirror · 24 comments
Owner

Originally created by @baldpope on GitHub (Nov 15, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/13103

Originally assigned to: @dhiltgen on GitHub.

What is the issue?

when running OLLAM_VULKAN=1 ollama serve - the application executes implies vulkan support is enabled

$ export OLLAMA_VULKAN=1
$ sudo -E ollama serve
time=2025-11-15T22:19:17.307Z level=INFO source=routes.go:1544 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:INFO OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/home/jofficer/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false OLLAMA_VULKAN:true ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
time=2025-11-15T22:19:17.307Z level=INFO source=images.go:522 msg="total blobs: 4"
time=2025-11-15T22:19:17.307Z level=INFO source=images.go:529 msg="total unused blobs removed: 0"
time=2025-11-15T22:19:17.307Z level=INFO source=routes.go:1597 msg="Listening on 127.0.0.1:11434 (version 0.12.11)"
time=2025-11-15T22:19:17.308Z level=INFO source=runner.go:67 msg="discovering available GPUs..."
time=2025-11-15T22:19:17.309Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 40297"
time=2025-11-15T22:19:17.354Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 45813"
time=2025-11-15T22:19:17.374Z level=INFO source=types.go:60 msg="inference compute" id=cpu library=cpu compute="" name=cpu description=cpu libdirs=ollama driver="" pci_id="" type="" total="11.7 GiB" available="11.0 GiB"
time=2025-11-15T22:19:17.374Z level=INFO source=routes.go:1638 msg="entering low vram mode" "total vram"="0 B" threshold="20.0 GiB"
[GIN] 2025/11/15 - 22:19:45 | 200 |      40.117µs |       127.0.0.1 | HEAD     "/"
[GIN] 2025/11/15 - 22:19:45 | 200 |   56.518627ms |       127.0.0.1 | POST     "/api/show"
[GIN] 2025/11/15 - 22:19:45 | 200 |   53.844912ms |       127.0.0.1 | POST     "/api/show"

When loading the model, 100% goes to the CPU, the GPUs aren't used at all. running vulkaninfo --summary shows boths cards recognized:

$ vulkaninfo --summary
'DISPLAY' environment variable not set... skipping surface info
==========
VULKANINFO
==========

Vulkan Instance Version: 1.3.275


Instance Extensions: count = 24
-------------------------------
VK_EXT_acquire_drm_display             : extension revision 1
VK_EXT_acquire_xlib_display            : extension revision 1
VK_EXT_debug_report                    : extension revision 10
VK_EXT_debug_utils                     : extension revision 2
VK_EXT_direct_mode_display             : extension revision 1
VK_EXT_display_surface_counter         : extension revision 1
VK_EXT_headless_surface                : extension revision 1
VK_EXT_surface_maintenance1            : extension revision 1
VK_EXT_swapchain_colorspace            : extension revision 5
VK_KHR_device_group_creation           : extension revision 1
VK_KHR_display                         : extension revision 23
VK_KHR_external_fence_capabilities     : extension revision 1
VK_KHR_external_memory_capabilities    : extension revision 1
VK_KHR_external_semaphore_capabilities : extension revision 1
VK_KHR_get_display_properties2         : extension revision 1
VK_KHR_get_physical_device_properties2 : extension revision 2
VK_KHR_get_surface_capabilities2       : extension revision 1
VK_KHR_portability_enumeration         : extension revision 1
VK_KHR_surface                         : extension revision 25
VK_KHR_surface_protected_capabilities  : extension revision 1
VK_KHR_wayland_surface                 : extension revision 6
VK_KHR_xcb_surface                     : extension revision 6
VK_KHR_xlib_surface                    : extension revision 6
VK_LUNARG_direct_driver_loading        : extension revision 1

Instance Layers: count = 3
--------------------------
VK_LAYER_INTEL_nullhw       INTEL NULL HW                1.1.73   version 1
VK_LAYER_MESA_device_select Linux device selection layer 1.4.303  version 1
VK_LAYER_MESA_overlay       Mesa Overlay layer           1.4.303  version 1

Devices:
========
GPU0:
        apiVersion         = 1.4.305
        driverVersion      = 25.0.7
        vendorID           = 0x8086
        deviceID           = 0x56a5
        deviceType         = PHYSICAL_DEVICE_TYPE_DISCRETE_GPU
        deviceName         = Intel(R) Arc(tm) A380 Graphics (DG2)
        driverID           = DRIVER_ID_INTEL_OPEN_SOURCE_MESA
        driverName         = Intel open-source Mesa driver
        driverInfo         = Mesa 25.0.7-0ubuntu0.24.04.2
        conformanceVersion = 1.4.0.0
        deviceUUID         = 8680a556-0500-0000-0203-000000000000
        driverUUID         = 802b0057-40c2-aed9-e538-d78b797f04f4
GPU1:
        apiVersion         = 1.4.305
        driverVersion      = 25.0.7
        vendorID           = 0x8086
        deviceID           = 0x56a5
        deviceType         = PHYSICAL_DEVICE_TYPE_DISCRETE_GPU
        deviceName         = Intel(R) Arc(tm) A380 Graphics (DG2)
        driverID           = DRIVER_ID_INTEL_OPEN_SOURCE_MESA
        driverName         = Intel open-source Mesa driver
        driverInfo         = Mesa 25.0.7-0ubuntu0.24.04.2
        conformanceVersion = 1.4.0.0
        deviceUUID         = 8680a556-0500-0000-0204-000000000000
        driverUUID         = 802b0057-40c2-aed9-e538-d78b797f04f4
GPU2:
        apiVersion         = 1.4.305
        driverVersion      = 0.0.1
        vendorID           = 0x10005
        deviceID           = 0x0000
        deviceType         = PHYSICAL_DEVICE_TYPE_CPU
        deviceName         = llvmpipe (LLVM 20.1.2, 256 bits)
        driverID           = DRIVER_ID_MESA_LLVMPIPE
        driverName         = llvmpipe
        driverInfo         = Mesa 25.0.7-0ubuntu0.24.04.2 (LLVM 20.1.2)
        conformanceVersion = 1.3.1.1
        deviceUUID         = 6d657361-3235-2e30-2e37-2d3075627500
        driverUUID         = 6c6c766d-7069-7065-5555-494400000000

The server is Ubuntu 24.04, patched/updated. The same cards are recognized/used when running llama.cpp compiled with SYCL support. I understand vulkan support is still experimental, but what can I do to assist?

Relevant log output


OS

Linux

GPU

Intel

CPU

Intel

Ollama version

0.12.11

Originally created by @baldpope on GitHub (Nov 15, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/13103 Originally assigned to: @dhiltgen on GitHub. ### What is the issue? when running OLLAM_VULKAN=1 ollama serve - the application executes implies vulkan support is enabled ``` $ export OLLAMA_VULKAN=1 $ sudo -E ollama serve time=2025-11-15T22:19:17.307Z level=INFO source=routes.go:1544 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:INFO OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/home/jofficer/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false OLLAMA_VULKAN:true ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]" time=2025-11-15T22:19:17.307Z level=INFO source=images.go:522 msg="total blobs: 4" time=2025-11-15T22:19:17.307Z level=INFO source=images.go:529 msg="total unused blobs removed: 0" time=2025-11-15T22:19:17.307Z level=INFO source=routes.go:1597 msg="Listening on 127.0.0.1:11434 (version 0.12.11)" time=2025-11-15T22:19:17.308Z level=INFO source=runner.go:67 msg="discovering available GPUs..." time=2025-11-15T22:19:17.309Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 40297" time=2025-11-15T22:19:17.354Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 45813" time=2025-11-15T22:19:17.374Z level=INFO source=types.go:60 msg="inference compute" id=cpu library=cpu compute="" name=cpu description=cpu libdirs=ollama driver="" pci_id="" type="" total="11.7 GiB" available="11.0 GiB" time=2025-11-15T22:19:17.374Z level=INFO source=routes.go:1638 msg="entering low vram mode" "total vram"="0 B" threshold="20.0 GiB" [GIN] 2025/11/15 - 22:19:45 | 200 | 40.117µs | 127.0.0.1 | HEAD "/" [GIN] 2025/11/15 - 22:19:45 | 200 | 56.518627ms | 127.0.0.1 | POST "/api/show" [GIN] 2025/11/15 - 22:19:45 | 200 | 53.844912ms | 127.0.0.1 | POST "/api/show" ``` When loading the model, 100% goes to the CPU, the GPUs aren't used at all. running vulkaninfo --summary shows boths cards recognized: ``` $ vulkaninfo --summary 'DISPLAY' environment variable not set... skipping surface info ========== VULKANINFO ========== Vulkan Instance Version: 1.3.275 Instance Extensions: count = 24 ------------------------------- VK_EXT_acquire_drm_display : extension revision 1 VK_EXT_acquire_xlib_display : extension revision 1 VK_EXT_debug_report : extension revision 10 VK_EXT_debug_utils : extension revision 2 VK_EXT_direct_mode_display : extension revision 1 VK_EXT_display_surface_counter : extension revision 1 VK_EXT_headless_surface : extension revision 1 VK_EXT_surface_maintenance1 : extension revision 1 VK_EXT_swapchain_colorspace : extension revision 5 VK_KHR_device_group_creation : extension revision 1 VK_KHR_display : extension revision 23 VK_KHR_external_fence_capabilities : extension revision 1 VK_KHR_external_memory_capabilities : extension revision 1 VK_KHR_external_semaphore_capabilities : extension revision 1 VK_KHR_get_display_properties2 : extension revision 1 VK_KHR_get_physical_device_properties2 : extension revision 2 VK_KHR_get_surface_capabilities2 : extension revision 1 VK_KHR_portability_enumeration : extension revision 1 VK_KHR_surface : extension revision 25 VK_KHR_surface_protected_capabilities : extension revision 1 VK_KHR_wayland_surface : extension revision 6 VK_KHR_xcb_surface : extension revision 6 VK_KHR_xlib_surface : extension revision 6 VK_LUNARG_direct_driver_loading : extension revision 1 Instance Layers: count = 3 -------------------------- VK_LAYER_INTEL_nullhw INTEL NULL HW 1.1.73 version 1 VK_LAYER_MESA_device_select Linux device selection layer 1.4.303 version 1 VK_LAYER_MESA_overlay Mesa Overlay layer 1.4.303 version 1 Devices: ======== GPU0: apiVersion = 1.4.305 driverVersion = 25.0.7 vendorID = 0x8086 deviceID = 0x56a5 deviceType = PHYSICAL_DEVICE_TYPE_DISCRETE_GPU deviceName = Intel(R) Arc(tm) A380 Graphics (DG2) driverID = DRIVER_ID_INTEL_OPEN_SOURCE_MESA driverName = Intel open-source Mesa driver driverInfo = Mesa 25.0.7-0ubuntu0.24.04.2 conformanceVersion = 1.4.0.0 deviceUUID = 8680a556-0500-0000-0203-000000000000 driverUUID = 802b0057-40c2-aed9-e538-d78b797f04f4 GPU1: apiVersion = 1.4.305 driverVersion = 25.0.7 vendorID = 0x8086 deviceID = 0x56a5 deviceType = PHYSICAL_DEVICE_TYPE_DISCRETE_GPU deviceName = Intel(R) Arc(tm) A380 Graphics (DG2) driverID = DRIVER_ID_INTEL_OPEN_SOURCE_MESA driverName = Intel open-source Mesa driver driverInfo = Mesa 25.0.7-0ubuntu0.24.04.2 conformanceVersion = 1.4.0.0 deviceUUID = 8680a556-0500-0000-0204-000000000000 driverUUID = 802b0057-40c2-aed9-e538-d78b797f04f4 GPU2: apiVersion = 1.4.305 driverVersion = 0.0.1 vendorID = 0x10005 deviceID = 0x0000 deviceType = PHYSICAL_DEVICE_TYPE_CPU deviceName = llvmpipe (LLVM 20.1.2, 256 bits) driverID = DRIVER_ID_MESA_LLVMPIPE driverName = llvmpipe driverInfo = Mesa 25.0.7-0ubuntu0.24.04.2 (LLVM 20.1.2) conformanceVersion = 1.3.1.1 deviceUUID = 6d657361-3235-2e30-2e37-2d3075627500 driverUUID = 6c6c766d-7069-7065-5555-494400000000 ``` The server is Ubuntu 24.04, patched/updated. The same cards are recognized/used when running llama.cpp compiled with SYCL support. I understand vulkan support is still experimental, but what can I do to assist? ### Relevant log output ```shell ``` ### OS Linux ### GPU Intel ### CPU Intel ### Ollama version 0.12.11
GiteaMirror added the intelvulkanbuglinux labels 2026-05-04 22:46:56 -05:00
Author
Owner

@rick-github commented on GitHub (Nov 15, 2025):

ollama is not detecting your devices. Set OLLAMA_DEBUG=2 and start the server to get more information about device detection.

<!-- gh-comment-id:3537052587 --> @rick-github commented on GitHub (Nov 15, 2025): ollama is not detecting your devices. Set `OLLAMA_DEBUG=2` and start the server to get more information about device detection.
Author
Owner

@baldpope commented on GitHub (Nov 15, 2025):

output as requested:

$ export OLLAMA_DEBUG=2
$ export OLLAMA_VULKAN=1
$ sudo -E ollama serve
time=2025-11-15T23:19:17.647Z level=INFO source=routes.go:1544 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:DEBUG-4 OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/home/jofficer/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false OLLAMA_VULKAN:true ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
time=2025-11-15T23:19:17.647Z level=INFO source=images.go:522 msg="total blobs: 4"
time=2025-11-15T23:19:17.647Z level=INFO source=images.go:529 msg="total unused blobs removed: 0"
time=2025-11-15T23:19:17.648Z level=INFO source=routes.go:1597 msg="Listening on 127.0.0.1:11434 (version 0.12.11)"
time=2025-11-15T23:19:17.648Z level=DEBUG source=sched.go:120 msg="starting llm scheduler"
time=2025-11-15T23:19:17.648Z level=INFO source=runner.go:67 msg="discovering available GPUs..."
time=2025-11-15T23:19:17.648Z level=TRACE source=runner.go:421 msg="starting runner for device discovery" libDirs="[/usr/local/lib/ollama /usr/local/lib/ollama/cuda_v12]" extraEnvs=map[]
time=2025-11-15T23:19:17.648Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 46399"
time=2025-11-15T23:19:17.649Z level=DEBUG source=server.go:393 msg=subprocess OLLAMA_VULKAN=1 OLLAMA_DEBUG=2 PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin LD_LIBRARY_PATH=/usr/local/lib/ollama:/usr/local/lib/ollama/cuda_v12 OLLAMA_LIBRARY_PATH=/usr/local/lib/ollama:/usr/local/lib/ollama/cuda_v12
time=2025-11-15T23:19:17.660Z level=INFO source=runner.go:1398 msg="starting ollama engine"
time=2025-11-15T23:19:17.660Z level=INFO source=runner.go:1433 msg="Server listening on 127.0.0.1:46399"
time=2025-11-15T23:19:17.670Z level=DEBUG source=gguf.go:590 msg=general.architecture type=string
time=2025-11-15T23:19:17.670Z level=DEBUG source=gguf.go:590 msg=tokenizer.ggml.model type=string
time=2025-11-15T23:19:17.670Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32
time=2025-11-15T23:19:17.670Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32
time=2025-11-15T23:19:17.670Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.file_type default=0
time=2025-11-15T23:19:17.670Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.name default=""
time=2025-11-15T23:19:17.670Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.description default=""
time=2025-11-15T23:19:17.670Z level=INFO source=ggml.go:136 msg="" architecture=llama file_type=unknown name="" description="" num_tensors=0 num_key_values=3
time=2025-11-15T23:19:17.670Z level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/local/lib/ollama
load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-haswell.so
time=2025-11-15T23:19:17.676Z level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/local/lib/ollama/cuda_v12
time=2025-11-15T23:19:17.677Z level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc)
time=2025-11-15T23:19:17.677Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.block_count default=0
time=2025-11-15T23:19:17.677Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.pooling_type default=0
time=2025-11-15T23:19:17.677Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.expert_count default=0
time=2025-11-15T23:19:17.677Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.tokens default="&{size:0 values:[]}"
time=2025-11-15T23:19:17.677Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.scores default="&{size:0 values:[]}"
time=2025-11-15T23:19:17.677Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.token_type default="&{size:0 values:[]}"
time=2025-11-15T23:19:17.677Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.merges default="&{size:0 values:[]}"
time=2025-11-15T23:19:17.677Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.add_bos_token default=true
time=2025-11-15T23:19:17.677Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.bos_token_id default=0
time=2025-11-15T23:19:17.677Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.add_eos_token default=false
time=2025-11-15T23:19:17.677Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.eos_token_id default=0
time=2025-11-15T23:19:17.677Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}"
time=2025-11-15T23:19:17.677Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.pre default=""
time=2025-11-15T23:19:17.677Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.block_count default=0
time=2025-11-15T23:19:17.677Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.embedding_length default=0
time=2025-11-15T23:19:17.677Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.attention.head_count default=0
time=2025-11-15T23:19:17.677Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.attention.head_count_kv default=0
time=2025-11-15T23:19:17.677Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.attention.key_length default=0
time=2025-11-15T23:19:17.677Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.rope.dimension_count default=0
time=2025-11-15T23:19:17.677Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.attention.layer_norm_rms_epsilon default=0
time=2025-11-15T23:19:17.677Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.rope.freq_base default=100000
time=2025-11-15T23:19:17.677Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.rope.scaling.factor default=1
time=2025-11-15T23:19:17.677Z level=DEBUG source=runner.go:1373 msg="dummy model load took" duration=7.185285ms
time=2025-11-15T23:19:17.677Z level=DEBUG source=runner.go:1378 msg="gathering device infos took" duration=446ns
time=2025-11-15T23:19:17.677Z level=TRACE source=runner.go:448 msg="runner enumerated devices" OLLAMA_LIBRARY_PATH="[/usr/local/lib/ollama /usr/local/lib/ollama/cuda_v12]" devices=[]
time=2025-11-15T23:19:17.677Z level=DEBUG source=runner.go:418 msg="bootstrap discovery took" duration=29.632313ms OLLAMA_LIBRARY_PATH="[/usr/local/lib/ollama /usr/local/lib/ollama/cuda_v12]" extra_envs=map[]
time=2025-11-15T23:19:17.678Z level=TRACE source=runner.go:421 msg="starting runner for device discovery" libDirs="[/usr/local/lib/ollama /usr/local/lib/ollama/cuda_v13]" extraEnvs=map[]
time=2025-11-15T23:19:17.678Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 37531"
time=2025-11-15T23:19:17.678Z level=DEBUG source=server.go:393 msg=subprocess OLLAMA_VULKAN=1 OLLAMA_DEBUG=2 PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin LD_LIBRARY_PATH=/usr/local/lib/ollama:/usr/local/lib/ollama/cuda_v13 OLLAMA_LIBRARY_PATH=/usr/local/lib/ollama:/usr/local/lib/ollama/cuda_v13
time=2025-11-15T23:19:17.691Z level=INFO source=runner.go:1398 msg="starting ollama engine"
time=2025-11-15T23:19:17.691Z level=INFO source=runner.go:1433 msg="Server listening on 127.0.0.1:37531"
time=2025-11-15T23:19:17.699Z level=DEBUG source=gguf.go:590 msg=general.architecture type=string
time=2025-11-15T23:19:17.699Z level=DEBUG source=gguf.go:590 msg=tokenizer.ggml.model type=string
time=2025-11-15T23:19:17.699Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32
time=2025-11-15T23:19:17.700Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32
time=2025-11-15T23:19:17.700Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.file_type default=0
time=2025-11-15T23:19:17.700Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.name default=""
time=2025-11-15T23:19:17.700Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.description default=""
time=2025-11-15T23:19:17.700Z level=INFO source=ggml.go:136 msg="" architecture=llama file_type=unknown name="" description="" num_tensors=0 num_key_values=3
time=2025-11-15T23:19:17.700Z level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/local/lib/ollama
load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-haswell.so
time=2025-11-15T23:19:17.706Z level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/local/lib/ollama/cuda_v13
time=2025-11-15T23:19:17.707Z level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc)
time=2025-11-15T23:19:17.707Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.block_count default=0
time=2025-11-15T23:19:17.707Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.pooling_type default=0
time=2025-11-15T23:19:17.707Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.expert_count default=0
time=2025-11-15T23:19:17.707Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.tokens default="&{size:0 values:[]}"
time=2025-11-15T23:19:17.707Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.scores default="&{size:0 values:[]}"
time=2025-11-15T23:19:17.707Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.token_type default="&{size:0 values:[]}"
time=2025-11-15T23:19:17.707Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.merges default="&{size:0 values:[]}"
time=2025-11-15T23:19:17.707Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.add_bos_token default=true
time=2025-11-15T23:19:17.707Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.bos_token_id default=0
time=2025-11-15T23:19:17.707Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.add_eos_token default=false
time=2025-11-15T23:19:17.707Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.eos_token_id default=0
time=2025-11-15T23:19:17.707Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}"
time=2025-11-15T23:19:17.707Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.pre default=""
time=2025-11-15T23:19:17.707Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.block_count default=0
time=2025-11-15T23:19:17.707Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.embedding_length default=0
time=2025-11-15T23:19:17.707Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.attention.head_count default=0
time=2025-11-15T23:19:17.707Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.attention.head_count_kv default=0
time=2025-11-15T23:19:17.707Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.attention.key_length default=0
time=2025-11-15T23:19:17.707Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.rope.dimension_count default=0
time=2025-11-15T23:19:17.707Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.attention.layer_norm_rms_epsilon default=0
time=2025-11-15T23:19:17.707Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.rope.freq_base default=100000
time=2025-11-15T23:19:17.707Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.rope.scaling.factor default=1
time=2025-11-15T23:19:17.707Z level=DEBUG source=runner.go:1373 msg="dummy model load took" duration=7.90195ms
time=2025-11-15T23:19:17.707Z level=DEBUG source=runner.go:1378 msg="gathering device infos took" duration=379ns
time=2025-11-15T23:19:17.708Z level=TRACE source=runner.go:448 msg="runner enumerated devices" OLLAMA_LIBRARY_PATH="[/usr/local/lib/ollama /usr/local/lib/ollama/cuda_v13]" devices=[]
time=2025-11-15T23:19:17.708Z level=DEBUG source=runner.go:418 msg="bootstrap discovery took" duration=30.3482ms OLLAMA_LIBRARY_PATH="[/usr/local/lib/ollama /usr/local/lib/ollama/cuda_v13]" extra_envs=map[]
time=2025-11-15T23:19:17.708Z level=DEBUG source=runner.go:116 msg="evluating which if any devices to filter out" initial_count=0
time=2025-11-15T23:19:17.709Z level=TRACE source=runner.go:156 msg="supported GPU library combinations before filtering" supported=map[]
time=2025-11-15T23:19:17.709Z level=DEBUG source=runner.go:40 msg="GPU bootstrap discovery took" duration=61.687113ms
time=2025-11-15T23:19:17.709Z level=INFO source=types.go:60 msg="inference compute" id=cpu library=cpu compute="" name=cpu description=cpu libdirs=ollama driver="" pci_id="" type="" total="11.7 GiB" available="11.1 GiB"
time=2025-11-15T23:19:17.710Z level=INFO source=routes.go:1638 msg="entering low vram mode" "total vram"="0 B" threshold="20.0 GiB"
<!-- gh-comment-id:3537064280 --> @baldpope commented on GitHub (Nov 15, 2025): output as requested: ``` $ export OLLAMA_DEBUG=2 $ export OLLAMA_VULKAN=1 $ sudo -E ollama serve time=2025-11-15T23:19:17.647Z level=INFO source=routes.go:1544 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:DEBUG-4 OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/home/jofficer/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false OLLAMA_VULKAN:true ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]" time=2025-11-15T23:19:17.647Z level=INFO source=images.go:522 msg="total blobs: 4" time=2025-11-15T23:19:17.647Z level=INFO source=images.go:529 msg="total unused blobs removed: 0" time=2025-11-15T23:19:17.648Z level=INFO source=routes.go:1597 msg="Listening on 127.0.0.1:11434 (version 0.12.11)" time=2025-11-15T23:19:17.648Z level=DEBUG source=sched.go:120 msg="starting llm scheduler" time=2025-11-15T23:19:17.648Z level=INFO source=runner.go:67 msg="discovering available GPUs..." time=2025-11-15T23:19:17.648Z level=TRACE source=runner.go:421 msg="starting runner for device discovery" libDirs="[/usr/local/lib/ollama /usr/local/lib/ollama/cuda_v12]" extraEnvs=map[] time=2025-11-15T23:19:17.648Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 46399" time=2025-11-15T23:19:17.649Z level=DEBUG source=server.go:393 msg=subprocess OLLAMA_VULKAN=1 OLLAMA_DEBUG=2 PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin LD_LIBRARY_PATH=/usr/local/lib/ollama:/usr/local/lib/ollama/cuda_v12 OLLAMA_LIBRARY_PATH=/usr/local/lib/ollama:/usr/local/lib/ollama/cuda_v12 time=2025-11-15T23:19:17.660Z level=INFO source=runner.go:1398 msg="starting ollama engine" time=2025-11-15T23:19:17.660Z level=INFO source=runner.go:1433 msg="Server listening on 127.0.0.1:46399" time=2025-11-15T23:19:17.670Z level=DEBUG source=gguf.go:590 msg=general.architecture type=string time=2025-11-15T23:19:17.670Z level=DEBUG source=gguf.go:590 msg=tokenizer.ggml.model type=string time=2025-11-15T23:19:17.670Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32 time=2025-11-15T23:19:17.670Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32 time=2025-11-15T23:19:17.670Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.file_type default=0 time=2025-11-15T23:19:17.670Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.name default="" time=2025-11-15T23:19:17.670Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.description default="" time=2025-11-15T23:19:17.670Z level=INFO source=ggml.go:136 msg="" architecture=llama file_type=unknown name="" description="" num_tensors=0 num_key_values=3 time=2025-11-15T23:19:17.670Z level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/local/lib/ollama load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-haswell.so time=2025-11-15T23:19:17.676Z level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/local/lib/ollama/cuda_v12 time=2025-11-15T23:19:17.677Z level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc) time=2025-11-15T23:19:17.677Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.block_count default=0 time=2025-11-15T23:19:17.677Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.pooling_type default=0 time=2025-11-15T23:19:17.677Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.expert_count default=0 time=2025-11-15T23:19:17.677Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.tokens default="&{size:0 values:[]}" time=2025-11-15T23:19:17.677Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.scores default="&{size:0 values:[]}" time=2025-11-15T23:19:17.677Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.token_type default="&{size:0 values:[]}" time=2025-11-15T23:19:17.677Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.merges default="&{size:0 values:[]}" time=2025-11-15T23:19:17.677Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.add_bos_token default=true time=2025-11-15T23:19:17.677Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.bos_token_id default=0 time=2025-11-15T23:19:17.677Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.add_eos_token default=false time=2025-11-15T23:19:17.677Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.eos_token_id default=0 time=2025-11-15T23:19:17.677Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}" time=2025-11-15T23:19:17.677Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.pre default="" time=2025-11-15T23:19:17.677Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.block_count default=0 time=2025-11-15T23:19:17.677Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.embedding_length default=0 time=2025-11-15T23:19:17.677Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.attention.head_count default=0 time=2025-11-15T23:19:17.677Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.attention.head_count_kv default=0 time=2025-11-15T23:19:17.677Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.attention.key_length default=0 time=2025-11-15T23:19:17.677Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.rope.dimension_count default=0 time=2025-11-15T23:19:17.677Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.attention.layer_norm_rms_epsilon default=0 time=2025-11-15T23:19:17.677Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.rope.freq_base default=100000 time=2025-11-15T23:19:17.677Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.rope.scaling.factor default=1 time=2025-11-15T23:19:17.677Z level=DEBUG source=runner.go:1373 msg="dummy model load took" duration=7.185285ms time=2025-11-15T23:19:17.677Z level=DEBUG source=runner.go:1378 msg="gathering device infos took" duration=446ns time=2025-11-15T23:19:17.677Z level=TRACE source=runner.go:448 msg="runner enumerated devices" OLLAMA_LIBRARY_PATH="[/usr/local/lib/ollama /usr/local/lib/ollama/cuda_v12]" devices=[] time=2025-11-15T23:19:17.677Z level=DEBUG source=runner.go:418 msg="bootstrap discovery took" duration=29.632313ms OLLAMA_LIBRARY_PATH="[/usr/local/lib/ollama /usr/local/lib/ollama/cuda_v12]" extra_envs=map[] time=2025-11-15T23:19:17.678Z level=TRACE source=runner.go:421 msg="starting runner for device discovery" libDirs="[/usr/local/lib/ollama /usr/local/lib/ollama/cuda_v13]" extraEnvs=map[] time=2025-11-15T23:19:17.678Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 37531" time=2025-11-15T23:19:17.678Z level=DEBUG source=server.go:393 msg=subprocess OLLAMA_VULKAN=1 OLLAMA_DEBUG=2 PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin LD_LIBRARY_PATH=/usr/local/lib/ollama:/usr/local/lib/ollama/cuda_v13 OLLAMA_LIBRARY_PATH=/usr/local/lib/ollama:/usr/local/lib/ollama/cuda_v13 time=2025-11-15T23:19:17.691Z level=INFO source=runner.go:1398 msg="starting ollama engine" time=2025-11-15T23:19:17.691Z level=INFO source=runner.go:1433 msg="Server listening on 127.0.0.1:37531" time=2025-11-15T23:19:17.699Z level=DEBUG source=gguf.go:590 msg=general.architecture type=string time=2025-11-15T23:19:17.699Z level=DEBUG source=gguf.go:590 msg=tokenizer.ggml.model type=string time=2025-11-15T23:19:17.699Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32 time=2025-11-15T23:19:17.700Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32 time=2025-11-15T23:19:17.700Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.file_type default=0 time=2025-11-15T23:19:17.700Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.name default="" time=2025-11-15T23:19:17.700Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.description default="" time=2025-11-15T23:19:17.700Z level=INFO source=ggml.go:136 msg="" architecture=llama file_type=unknown name="" description="" num_tensors=0 num_key_values=3 time=2025-11-15T23:19:17.700Z level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/local/lib/ollama load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-haswell.so time=2025-11-15T23:19:17.706Z level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/local/lib/ollama/cuda_v13 time=2025-11-15T23:19:17.707Z level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc) time=2025-11-15T23:19:17.707Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.block_count default=0 time=2025-11-15T23:19:17.707Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.pooling_type default=0 time=2025-11-15T23:19:17.707Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.expert_count default=0 time=2025-11-15T23:19:17.707Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.tokens default="&{size:0 values:[]}" time=2025-11-15T23:19:17.707Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.scores default="&{size:0 values:[]}" time=2025-11-15T23:19:17.707Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.token_type default="&{size:0 values:[]}" time=2025-11-15T23:19:17.707Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.merges default="&{size:0 values:[]}" time=2025-11-15T23:19:17.707Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.add_bos_token default=true time=2025-11-15T23:19:17.707Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.bos_token_id default=0 time=2025-11-15T23:19:17.707Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.add_eos_token default=false time=2025-11-15T23:19:17.707Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.eos_token_id default=0 time=2025-11-15T23:19:17.707Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}" time=2025-11-15T23:19:17.707Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.pre default="" time=2025-11-15T23:19:17.707Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.block_count default=0 time=2025-11-15T23:19:17.707Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.embedding_length default=0 time=2025-11-15T23:19:17.707Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.attention.head_count default=0 time=2025-11-15T23:19:17.707Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.attention.head_count_kv default=0 time=2025-11-15T23:19:17.707Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.attention.key_length default=0 time=2025-11-15T23:19:17.707Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.rope.dimension_count default=0 time=2025-11-15T23:19:17.707Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.attention.layer_norm_rms_epsilon default=0 time=2025-11-15T23:19:17.707Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.rope.freq_base default=100000 time=2025-11-15T23:19:17.707Z level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.rope.scaling.factor default=1 time=2025-11-15T23:19:17.707Z level=DEBUG source=runner.go:1373 msg="dummy model load took" duration=7.90195ms time=2025-11-15T23:19:17.707Z level=DEBUG source=runner.go:1378 msg="gathering device infos took" duration=379ns time=2025-11-15T23:19:17.708Z level=TRACE source=runner.go:448 msg="runner enumerated devices" OLLAMA_LIBRARY_PATH="[/usr/local/lib/ollama /usr/local/lib/ollama/cuda_v13]" devices=[] time=2025-11-15T23:19:17.708Z level=DEBUG source=runner.go:418 msg="bootstrap discovery took" duration=30.3482ms OLLAMA_LIBRARY_PATH="[/usr/local/lib/ollama /usr/local/lib/ollama/cuda_v13]" extra_envs=map[] time=2025-11-15T23:19:17.708Z level=DEBUG source=runner.go:116 msg="evluating which if any devices to filter out" initial_count=0 time=2025-11-15T23:19:17.709Z level=TRACE source=runner.go:156 msg="supported GPU library combinations before filtering" supported=map[] time=2025-11-15T23:19:17.709Z level=DEBUG source=runner.go:40 msg="GPU bootstrap discovery took" duration=61.687113ms time=2025-11-15T23:19:17.709Z level=INFO source=types.go:60 msg="inference compute" id=cpu library=cpu compute="" name=cpu description=cpu libdirs=ollama driver="" pci_id="" type="" total="11.7 GiB" available="11.1 GiB" time=2025-11-15T23:19:17.710Z level=INFO source=routes.go:1638 msg="entering low vram mode" "total vram"="0 B" threshold="20.0 GiB" ```
Author
Owner

@rick-github commented on GitHub (Nov 15, 2025):

What's the output of ls -l /usr/local/lib/ollama, and what variant of Linux are you using?

<!-- gh-comment-id:3537070674 --> @rick-github commented on GitHub (Nov 15, 2025): What's the output of `ls -l /usr/local/lib/ollama`, and what variant of Linux are you using?
Author
Owner

@baldpope commented on GitHub (Nov 15, 2025):

$ ls -l /usr/local/lib/ollama/
total 6100
drwxr-xr-x 2 root root   4096 Nov 13 22:16 cuda_v12
drwxr-xr-x 2 root root   4096 Nov 13 22:14 cuda_v13
-rwxr-xr-x 1 root root 669912 Nov 13 22:00 libggml-base.so
-rwxr-xr-x 1 root root 832784 Nov 13 22:00 libggml-cpu-alderlake.so
-rwxr-xr-x 1 root root 832784 Nov 13 22:00 libggml-cpu-haswell.so
-rwxr-xr-x 1 root root 963856 Nov 13 22:00 libggml-cpu-icelake.so
-rwxr-xr-x 1 root root 775504 Nov 13 22:00 libggml-cpu-sandybridge.so
-rwxr-xr-x 1 root root 963856 Nov 13 22:00 libggml-cpu-skylakex.so
-rwxr-xr-x 1 root root 591312 Nov 13 22:00 libggml-cpu-sse42.so
-rwxr-xr-x 1 root root 587216 Nov 13 22:00 libggml-cpu-x64.so

OS Info:

$ cat /etc/os-release
PRETTY_NAME="Ubuntu 24.04.3 LTS"
NAME="Ubuntu"
VERSION_ID="24.04"
VERSION="24.04.3 LTS (Noble Numbat)"
VERSION_CODENAME=noble
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=noble
LOGO=ubuntu-logo

$ uname -a
Linux gpu-passthru 6.8.0-87-generic #88-Ubuntu SMP PREEMPT_DYNAMIC Sat Oct 11 09:28:41 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux

<!-- gh-comment-id:3537074498 --> @baldpope commented on GitHub (Nov 15, 2025): ``` $ ls -l /usr/local/lib/ollama/ total 6100 drwxr-xr-x 2 root root 4096 Nov 13 22:16 cuda_v12 drwxr-xr-x 2 root root 4096 Nov 13 22:14 cuda_v13 -rwxr-xr-x 1 root root 669912 Nov 13 22:00 libggml-base.so -rwxr-xr-x 1 root root 832784 Nov 13 22:00 libggml-cpu-alderlake.so -rwxr-xr-x 1 root root 832784 Nov 13 22:00 libggml-cpu-haswell.so -rwxr-xr-x 1 root root 963856 Nov 13 22:00 libggml-cpu-icelake.so -rwxr-xr-x 1 root root 775504 Nov 13 22:00 libggml-cpu-sandybridge.so -rwxr-xr-x 1 root root 963856 Nov 13 22:00 libggml-cpu-skylakex.so -rwxr-xr-x 1 root root 591312 Nov 13 22:00 libggml-cpu-sse42.so -rwxr-xr-x 1 root root 587216 Nov 13 22:00 libggml-cpu-x64.so ``` OS Info: ``` $ cat /etc/os-release PRETTY_NAME="Ubuntu 24.04.3 LTS" NAME="Ubuntu" VERSION_ID="24.04" VERSION="24.04.3 LTS (Noble Numbat)" VERSION_CODENAME=noble ID=ubuntu ID_LIKE=debian HOME_URL="https://www.ubuntu.com/" SUPPORT_URL="https://help.ubuntu.com/" BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/" PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy" UBUNTU_CODENAME=noble LOGO=ubuntu-logo $ uname -a Linux gpu-passthru 6.8.0-87-generic #88-Ubuntu SMP PREEMPT_DYNAMIC Sat Oct 11 09:28:41 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux ```
Author
Owner

@rick-github commented on GitHub (Nov 15, 2025):

How did you install ollama?

<!-- gh-comment-id:3537075325 --> @rick-github commented on GitHub (Nov 15, 2025): How did you install ollama?
Author
Owner

@baldpope commented on GitHub (Nov 15, 2025):

Ran the install.sh from the Download site here: https://ollama.com/download/linux

<!-- gh-comment-id:3537077514 --> @baldpope commented on GitHub (Nov 15, 2025): Ran the install.sh from the Download site here: https://ollama.com/download/linux
Author
Owner

@rick-github commented on GitHub (Nov 15, 2025):

The install script uses the presence of AMD devices to trigger the installation of the Vulkan backend, so it will need adjusting for Intel devices. In the meantime, you can manually install the Vulkan backend with:

cd /tmp && curl -LO https://github.com/ollama/ollama/releases/download/v0.12.11/ollama-linux-amd64-rocm.tgz
sudo tar zxf /tmp/ollama-linux-amd64-rocm.tgz -C /usr/local
rm /tmp/ollama-linux-amd64-rocm.tgz
<!-- gh-comment-id:3537104841 --> @rick-github commented on GitHub (Nov 15, 2025): The install script uses the presence of AMD devices to trigger the installation of the Vulkan backend, so it will need adjusting for Intel devices. In the meantime, you can manually install the Vulkan backend with: ``` cd /tmp && curl -LO https://github.com/ollama/ollama/releases/download/v0.12.11/ollama-linux-amd64-rocm.tgz sudo tar zxf /tmp/ollama-linux-amd64-rocm.tgz -C /usr/local rm /tmp/ollama-linux-amd64-rocm.tgz ```
Author
Owner

@baldpope commented on GitHub (Nov 15, 2025):

installed the tar and not seeing a difference when I run the model

time=2025-11-15T17:49:56.181-06:00 level=INFO source=routes.go:1544 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:DEBUG-4 OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/home/jofficer/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false OLLAMA_VULKAN:true ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
time=2025-11-15T17:49:56.181-06:00 level=INFO source=images.go:522 msg="total blobs: 4"
time=2025-11-15T17:49:56.181-06:00 level=INFO source=images.go:529 msg="total unused blobs removed: 0"
time=2025-11-15T17:49:56.182-06:00 level=INFO source=routes.go:1597 msg="Listening on 127.0.0.1:11434 (version 0.12.11)"
time=2025-11-15T17:49:56.182-06:00 level=DEBUG source=sched.go:120 msg="starting llm scheduler"
time=2025-11-15T17:49:56.182-06:00 level=INFO source=runner.go:67 msg="discovering available GPUs..."
time=2025-11-15T17:49:56.182-06:00 level=TRACE source=runner.go:421 msg="starting runner for device discovery" libDirs="[/usr/local/lib/ollama /usr/local/lib/ollama/cuda_v12]" extraEnvs=map[]
time=2025-11-15T17:49:56.182-06:00 level=INFO source=server.go:392 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 34107"
time=2025-11-15T17:49:56.182-06:00 level=DEBUG source=server.go:393 msg=subprocess OLLAMA_VULKAN=1 OLLAMA_DEBUG=2 PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin LD_LIBRARY_PATH=/usr/local/lib/ollama:/usr/local/lib/ollama/cuda_v12 OLLAMA_LIBRARY_PATH=/usr/local/lib/ollama:/usr/local/lib/ollama/cuda_v12
time=2025-11-15T17:49:56.194-06:00 level=INFO source=runner.go:1398 msg="starting ollama engine"
time=2025-11-15T17:49:56.194-06:00 level=INFO source=runner.go:1433 msg="Server listening on 127.0.0.1:34107"
time=2025-11-15T17:49:56.204-06:00 level=DEBUG source=gguf.go:590 msg=general.architecture type=string
time=2025-11-15T17:49:56.204-06:00 level=DEBUG source=gguf.go:590 msg=tokenizer.ggml.model type=string
time=2025-11-15T17:49:56.204-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32
time=2025-11-15T17:49:56.204-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32
time=2025-11-15T17:49:56.204-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.file_type default=0
time=2025-11-15T17:49:56.204-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.name default=""
time=2025-11-15T17:49:56.204-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.description default=""
time=2025-11-15T17:49:56.204-06:00 level=INFO source=ggml.go:136 msg="" architecture=llama file_type=unknown name="" description="" num_tensors=0 num_key_values=3
time=2025-11-15T17:49:56.204-06:00 level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/local/lib/ollama
load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-haswell.so
time=2025-11-15T17:49:56.219-06:00 level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/local/lib/ollama/cuda_v12
time=2025-11-15T17:49:56.219-06:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc)
time=2025-11-15T17:49:56.219-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.block_count default=0
time=2025-11-15T17:49:56.219-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.pooling_type default=0
time=2025-11-15T17:49:56.219-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.expert_count default=0
time=2025-11-15T17:49:56.219-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.tokens default="&{size:0 values:[]}"
time=2025-11-15T17:49:56.219-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.scores default="&{size:0 values:[]}"
time=2025-11-15T17:49:56.219-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.token_type default="&{size:0 values:[]}"
time=2025-11-15T17:49:56.219-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.merges default="&{size:0 values:[]}"
time=2025-11-15T17:49:56.219-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.add_bos_token default=true
time=2025-11-15T17:49:56.219-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.bos_token_id default=0
time=2025-11-15T17:49:56.219-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.add_eos_token default=false
time=2025-11-15T17:49:56.219-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.eos_token_id default=0
time=2025-11-15T17:49:56.219-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}"
time=2025-11-15T17:49:56.219-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.pre default=""
time=2025-11-15T17:49:56.220-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.block_count default=0
time=2025-11-15T17:49:56.220-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.embedding_length default=0
time=2025-11-15T17:49:56.220-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.attention.head_count default=0
time=2025-11-15T17:49:56.220-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.attention.head_count_kv default=0
time=2025-11-15T17:49:56.220-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.attention.key_length default=0
time=2025-11-15T17:49:56.220-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.rope.dimension_count default=0
time=2025-11-15T17:49:56.220-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.attention.layer_norm_rms_epsilon default=0
time=2025-11-15T17:49:56.220-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.rope.freq_base default=100000
time=2025-11-15T17:49:56.220-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.rope.scaling.factor default=1
time=2025-11-15T17:49:56.220-06:00 level=DEBUG source=runner.go:1373 msg="dummy model load took" duration=15.769466ms
time=2025-11-15T17:49:56.220-06:00 level=DEBUG source=runner.go:1378 msg="gathering device infos took" duration=576ns
time=2025-11-15T17:49:56.220-06:00 level=TRACE source=runner.go:448 msg="runner enumerated devices" OLLAMA_LIBRARY_PATH="[/usr/local/lib/ollama /usr/local/lib/ollama/cuda_v12]" devices=[]
time=2025-11-15T17:49:56.220-06:00 level=DEBUG source=runner.go:418 msg="bootstrap discovery took" duration=37.884674ms OLLAMA_LIBRARY_PATH="[/usr/local/lib/ollama /usr/local/lib/ollama/cuda_v12]" extra_envs=map[]
time=2025-11-15T17:49:56.220-06:00 level=TRACE source=runner.go:421 msg="starting runner for device discovery" libDirs="[/usr/local/lib/ollama /usr/local/lib/ollama/cuda_v13]" extraEnvs=map[]
time=2025-11-15T17:49:56.220-06:00 level=INFO source=server.go:392 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 43593"
time=2025-11-15T17:49:56.220-06:00 level=DEBUG source=server.go:393 msg=subprocess OLLAMA_VULKAN=1 OLLAMA_DEBUG=2 PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin LD_LIBRARY_PATH=/usr/local/lib/ollama:/usr/local/lib/ollama/cuda_v13 OLLAMA_LIBRARY_PATH=/usr/local/lib/ollama:/usr/local/lib/ollama/cuda_v13
time=2025-11-15T17:49:56.233-06:00 level=INFO source=runner.go:1398 msg="starting ollama engine"
time=2025-11-15T17:49:56.233-06:00 level=INFO source=runner.go:1433 msg="Server listening on 127.0.0.1:43593"
time=2025-11-15T17:49:56.242-06:00 level=DEBUG source=gguf.go:590 msg=general.architecture type=string
time=2025-11-15T17:49:56.242-06:00 level=DEBUG source=gguf.go:590 msg=tokenizer.ggml.model type=string
time=2025-11-15T17:49:56.242-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32
time=2025-11-15T17:49:56.242-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32
time=2025-11-15T17:49:56.242-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.file_type default=0
time=2025-11-15T17:49:56.242-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.name default=""
time=2025-11-15T17:49:56.242-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.description default=""
time=2025-11-15T17:49:56.242-06:00 level=INFO source=ggml.go:136 msg="" architecture=llama file_type=unknown name="" description="" num_tensors=0 num_key_values=3
time=2025-11-15T17:49:56.242-06:00 level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/local/lib/ollama
load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-haswell.so
time=2025-11-15T17:49:56.248-06:00 level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/local/lib/ollama/cuda_v13
time=2025-11-15T17:49:56.253-06:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc)
time=2025-11-15T17:49:56.253-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.block_count default=0
time=2025-11-15T17:49:56.253-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.pooling_type default=0
time=2025-11-15T17:49:56.253-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.expert_count default=0
time=2025-11-15T17:49:56.253-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.tokens default="&{size:0 values:[]}"
time=2025-11-15T17:49:56.253-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.scores default="&{size:0 values:[]}"
time=2025-11-15T17:49:56.253-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.token_type default="&{size:0 values:[]}"
time=2025-11-15T17:49:56.253-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.merges default="&{size:0 values:[]}"
time=2025-11-15T17:49:56.253-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.add_bos_token default=true
time=2025-11-15T17:49:56.253-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.bos_token_id default=0
time=2025-11-15T17:49:56.253-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.add_eos_token default=false
time=2025-11-15T17:49:56.253-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.eos_token_id default=0
time=2025-11-15T17:49:56.253-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}"
time=2025-11-15T17:49:56.253-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.pre default=""
time=2025-11-15T17:49:56.253-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.block_count default=0
time=2025-11-15T17:49:56.253-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.embedding_length default=0
time=2025-11-15T17:49:56.253-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.attention.head_count default=0
time=2025-11-15T17:49:56.253-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.attention.head_count_kv default=0
time=2025-11-15T17:49:56.253-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.attention.key_length default=0
time=2025-11-15T17:49:56.253-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.rope.dimension_count default=0
time=2025-11-15T17:49:56.253-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.attention.layer_norm_rms_epsilon default=0
time=2025-11-15T17:49:56.253-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.rope.freq_base default=100000
time=2025-11-15T17:49:56.253-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.rope.scaling.factor default=1
time=2025-11-15T17:49:56.253-06:00 level=DEBUG source=runner.go:1373 msg="dummy model load took" duration=11.460398ms
time=2025-11-15T17:49:56.253-06:00 level=DEBUG source=runner.go:1378 msg="gathering device infos took" duration=473ns
time=2025-11-15T17:49:56.253-06:00 level=TRACE source=runner.go:448 msg="runner enumerated devices" OLLAMA_LIBRARY_PATH="[/usr/local/lib/ollama /usr/local/lib/ollama/cuda_v13]" devices=[]
time=2025-11-15T17:49:56.254-06:00 level=DEBUG source=runner.go:418 msg="bootstrap discovery took" duration=33.612633ms OLLAMA_LIBRARY_PATH="[/usr/local/lib/ollama /usr/local/lib/ollama/cuda_v13]" extra_envs=map[]
time=2025-11-15T17:49:56.254-06:00 level=TRACE source=runner.go:421 msg="starting runner for device discovery" libDirs="[/usr/local/lib/ollama /usr/local/lib/ollama/rocm]" extraEnvs=map[]
time=2025-11-15T17:49:56.254-06:00 level=INFO source=server.go:392 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 35429"
time=2025-11-15T17:49:56.254-06:00 level=DEBUG source=server.go:393 msg=subprocess OLLAMA_VULKAN=1 OLLAMA_DEBUG=2 PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin LD_LIBRARY_PATH=/usr/local/lib/ollama:/usr/local/lib/ollama/rocm OLLAMA_LIBRARY_PATH=/usr/local/lib/ollama:/usr/local/lib/ollama/rocm
time=2025-11-15T17:49:56.265-06:00 level=INFO source=runner.go:1398 msg="starting ollama engine"
time=2025-11-15T17:49:56.265-06:00 level=INFO source=runner.go:1433 msg="Server listening on 127.0.0.1:35429"
time=2025-11-15T17:49:56.276-06:00 level=DEBUG source=gguf.go:590 msg=general.architecture type=string
time=2025-11-15T17:49:56.276-06:00 level=DEBUG source=gguf.go:590 msg=tokenizer.ggml.model type=string
time=2025-11-15T17:49:56.276-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32
time=2025-11-15T17:49:56.276-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32
time=2025-11-15T17:49:56.276-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.file_type default=0
time=2025-11-15T17:49:56.276-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.name default=""
time=2025-11-15T17:49:56.276-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.description default=""
time=2025-11-15T17:49:56.276-06:00 level=INFO source=ggml.go:136 msg="" architecture=llama file_type=unknown name="" description="" num_tensors=0 num_key_values=3
time=2025-11-15T17:49:56.276-06:00 level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/local/lib/ollama
load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-haswell.so
time=2025-11-15T17:49:56.281-06:00 level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/local/lib/ollama/rocm
ggml_cuda_init: failed to initialize ROCm: no ROCm-capable device is detected
load_backend: loaded ROCm backend from /usr/local/lib/ollama/rocm/libggml-hip.so
time=2025-11-15T17:49:56.513-06:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc)
time=2025-11-15T17:49:56.513-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.block_count default=0
time=2025-11-15T17:49:56.514-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.pooling_type default=0
time=2025-11-15T17:49:56.514-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.expert_count default=0
time=2025-11-15T17:49:56.514-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.tokens default="&{size:0 values:[]}"
time=2025-11-15T17:49:56.514-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.scores default="&{size:0 values:[]}"
time=2025-11-15T17:49:56.514-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.token_type default="&{size:0 values:[]}"
time=2025-11-15T17:49:56.514-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.merges default="&{size:0 values:[]}"
time=2025-11-15T17:49:56.514-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.add_bos_token default=true
time=2025-11-15T17:49:56.514-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.bos_token_id default=0
time=2025-11-15T17:49:56.514-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.add_eos_token default=false
time=2025-11-15T17:49:56.514-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.eos_token_id default=0
time=2025-11-15T17:49:56.514-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}"
time=2025-11-15T17:49:56.514-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.pre default=""
time=2025-11-15T17:49:56.514-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.block_count default=0
time=2025-11-15T17:49:56.514-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.embedding_length default=0
time=2025-11-15T17:49:56.514-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.attention.head_count default=0
time=2025-11-15T17:49:56.514-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.attention.head_count_kv default=0
time=2025-11-15T17:49:56.514-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.attention.key_length default=0
time=2025-11-15T17:49:56.514-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.rope.dimension_count default=0
time=2025-11-15T17:49:56.514-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.attention.layer_norm_rms_epsilon default=0
time=2025-11-15T17:49:56.514-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.rope.freq_base default=100000
time=2025-11-15T17:49:56.514-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.rope.scaling.factor default=1
time=2025-11-15T17:49:56.514-06:00 level=DEBUG source=runner.go:1373 msg="dummy model load took" duration=238.200778ms
time=2025-11-15T17:49:56.514-06:00 level=DEBUG source=runner.go:1378 msg="gathering device infos took" duration=468ns
time=2025-11-15T17:49:56.514-06:00 level=TRACE source=runner.go:448 msg="runner enumerated devices" OLLAMA_LIBRARY_PATH="[/usr/local/lib/ollama /usr/local/lib/ollama/rocm]" devices=[]
time=2025-11-15T17:49:56.514-06:00 level=DEBUG source=runner.go:418 msg="bootstrap discovery took" duration=260.409362ms OLLAMA_LIBRARY_PATH="[/usr/local/lib/ollama /usr/local/lib/ollama/rocm]" extra_envs=map[]
time=2025-11-15T17:49:56.514-06:00 level=DEBUG source=runner.go:116 msg="evluating which if any devices to filter out" initial_count=0
time=2025-11-15T17:49:56.514-06:00 level=TRACE source=runner.go:156 msg="supported GPU library combinations before filtering" supported=map[]
time=2025-11-15T17:49:56.514-06:00 level=DEBUG source=runner.go:40 msg="GPU bootstrap discovery took" duration=332.291539ms
time=2025-11-15T17:49:56.514-06:00 level=INFO source=types.go:60 msg="inference compute" id=cpu library=cpu compute="" name=cpu description=cpu libdirs=ollama driver="" pci_id="" type="" total="11.7 GiB" available="11.1 GiB"
time=2025-11-15T17:49:56.514-06:00 level=INFO source=routes.go:1638 msg="entering low vram mode" "total vram"="0 B" threshold="20.0 GiB"
<!-- gh-comment-id:3537113737 --> @baldpope commented on GitHub (Nov 15, 2025): installed the tar and not seeing a difference when I run the model ```$ sudo -E ollama serve time=2025-11-15T17:49:56.181-06:00 level=INFO source=routes.go:1544 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:DEBUG-4 OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/home/jofficer/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false OLLAMA_VULKAN:true ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]" time=2025-11-15T17:49:56.181-06:00 level=INFO source=images.go:522 msg="total blobs: 4" time=2025-11-15T17:49:56.181-06:00 level=INFO source=images.go:529 msg="total unused blobs removed: 0" time=2025-11-15T17:49:56.182-06:00 level=INFO source=routes.go:1597 msg="Listening on 127.0.0.1:11434 (version 0.12.11)" time=2025-11-15T17:49:56.182-06:00 level=DEBUG source=sched.go:120 msg="starting llm scheduler" time=2025-11-15T17:49:56.182-06:00 level=INFO source=runner.go:67 msg="discovering available GPUs..." time=2025-11-15T17:49:56.182-06:00 level=TRACE source=runner.go:421 msg="starting runner for device discovery" libDirs="[/usr/local/lib/ollama /usr/local/lib/ollama/cuda_v12]" extraEnvs=map[] time=2025-11-15T17:49:56.182-06:00 level=INFO source=server.go:392 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 34107" time=2025-11-15T17:49:56.182-06:00 level=DEBUG source=server.go:393 msg=subprocess OLLAMA_VULKAN=1 OLLAMA_DEBUG=2 PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin LD_LIBRARY_PATH=/usr/local/lib/ollama:/usr/local/lib/ollama/cuda_v12 OLLAMA_LIBRARY_PATH=/usr/local/lib/ollama:/usr/local/lib/ollama/cuda_v12 time=2025-11-15T17:49:56.194-06:00 level=INFO source=runner.go:1398 msg="starting ollama engine" time=2025-11-15T17:49:56.194-06:00 level=INFO source=runner.go:1433 msg="Server listening on 127.0.0.1:34107" time=2025-11-15T17:49:56.204-06:00 level=DEBUG source=gguf.go:590 msg=general.architecture type=string time=2025-11-15T17:49:56.204-06:00 level=DEBUG source=gguf.go:590 msg=tokenizer.ggml.model type=string time=2025-11-15T17:49:56.204-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32 time=2025-11-15T17:49:56.204-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32 time=2025-11-15T17:49:56.204-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.file_type default=0 time=2025-11-15T17:49:56.204-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.name default="" time=2025-11-15T17:49:56.204-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.description default="" time=2025-11-15T17:49:56.204-06:00 level=INFO source=ggml.go:136 msg="" architecture=llama file_type=unknown name="" description="" num_tensors=0 num_key_values=3 time=2025-11-15T17:49:56.204-06:00 level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/local/lib/ollama load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-haswell.so time=2025-11-15T17:49:56.219-06:00 level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/local/lib/ollama/cuda_v12 time=2025-11-15T17:49:56.219-06:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc) time=2025-11-15T17:49:56.219-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.block_count default=0 time=2025-11-15T17:49:56.219-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.pooling_type default=0 time=2025-11-15T17:49:56.219-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.expert_count default=0 time=2025-11-15T17:49:56.219-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.tokens default="&{size:0 values:[]}" time=2025-11-15T17:49:56.219-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.scores default="&{size:0 values:[]}" time=2025-11-15T17:49:56.219-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.token_type default="&{size:0 values:[]}" time=2025-11-15T17:49:56.219-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.merges default="&{size:0 values:[]}" time=2025-11-15T17:49:56.219-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.add_bos_token default=true time=2025-11-15T17:49:56.219-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.bos_token_id default=0 time=2025-11-15T17:49:56.219-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.add_eos_token default=false time=2025-11-15T17:49:56.219-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.eos_token_id default=0 time=2025-11-15T17:49:56.219-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}" time=2025-11-15T17:49:56.219-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.pre default="" time=2025-11-15T17:49:56.220-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.block_count default=0 time=2025-11-15T17:49:56.220-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.embedding_length default=0 time=2025-11-15T17:49:56.220-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.attention.head_count default=0 time=2025-11-15T17:49:56.220-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.attention.head_count_kv default=0 time=2025-11-15T17:49:56.220-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.attention.key_length default=0 time=2025-11-15T17:49:56.220-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.rope.dimension_count default=0 time=2025-11-15T17:49:56.220-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.attention.layer_norm_rms_epsilon default=0 time=2025-11-15T17:49:56.220-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.rope.freq_base default=100000 time=2025-11-15T17:49:56.220-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.rope.scaling.factor default=1 time=2025-11-15T17:49:56.220-06:00 level=DEBUG source=runner.go:1373 msg="dummy model load took" duration=15.769466ms time=2025-11-15T17:49:56.220-06:00 level=DEBUG source=runner.go:1378 msg="gathering device infos took" duration=576ns time=2025-11-15T17:49:56.220-06:00 level=TRACE source=runner.go:448 msg="runner enumerated devices" OLLAMA_LIBRARY_PATH="[/usr/local/lib/ollama /usr/local/lib/ollama/cuda_v12]" devices=[] time=2025-11-15T17:49:56.220-06:00 level=DEBUG source=runner.go:418 msg="bootstrap discovery took" duration=37.884674ms OLLAMA_LIBRARY_PATH="[/usr/local/lib/ollama /usr/local/lib/ollama/cuda_v12]" extra_envs=map[] time=2025-11-15T17:49:56.220-06:00 level=TRACE source=runner.go:421 msg="starting runner for device discovery" libDirs="[/usr/local/lib/ollama /usr/local/lib/ollama/cuda_v13]" extraEnvs=map[] time=2025-11-15T17:49:56.220-06:00 level=INFO source=server.go:392 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 43593" time=2025-11-15T17:49:56.220-06:00 level=DEBUG source=server.go:393 msg=subprocess OLLAMA_VULKAN=1 OLLAMA_DEBUG=2 PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin LD_LIBRARY_PATH=/usr/local/lib/ollama:/usr/local/lib/ollama/cuda_v13 OLLAMA_LIBRARY_PATH=/usr/local/lib/ollama:/usr/local/lib/ollama/cuda_v13 time=2025-11-15T17:49:56.233-06:00 level=INFO source=runner.go:1398 msg="starting ollama engine" time=2025-11-15T17:49:56.233-06:00 level=INFO source=runner.go:1433 msg="Server listening on 127.0.0.1:43593" time=2025-11-15T17:49:56.242-06:00 level=DEBUG source=gguf.go:590 msg=general.architecture type=string time=2025-11-15T17:49:56.242-06:00 level=DEBUG source=gguf.go:590 msg=tokenizer.ggml.model type=string time=2025-11-15T17:49:56.242-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32 time=2025-11-15T17:49:56.242-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32 time=2025-11-15T17:49:56.242-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.file_type default=0 time=2025-11-15T17:49:56.242-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.name default="" time=2025-11-15T17:49:56.242-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.description default="" time=2025-11-15T17:49:56.242-06:00 level=INFO source=ggml.go:136 msg="" architecture=llama file_type=unknown name="" description="" num_tensors=0 num_key_values=3 time=2025-11-15T17:49:56.242-06:00 level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/local/lib/ollama load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-haswell.so time=2025-11-15T17:49:56.248-06:00 level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/local/lib/ollama/cuda_v13 time=2025-11-15T17:49:56.253-06:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc) time=2025-11-15T17:49:56.253-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.block_count default=0 time=2025-11-15T17:49:56.253-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.pooling_type default=0 time=2025-11-15T17:49:56.253-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.expert_count default=0 time=2025-11-15T17:49:56.253-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.tokens default="&{size:0 values:[]}" time=2025-11-15T17:49:56.253-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.scores default="&{size:0 values:[]}" time=2025-11-15T17:49:56.253-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.token_type default="&{size:0 values:[]}" time=2025-11-15T17:49:56.253-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.merges default="&{size:0 values:[]}" time=2025-11-15T17:49:56.253-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.add_bos_token default=true time=2025-11-15T17:49:56.253-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.bos_token_id default=0 time=2025-11-15T17:49:56.253-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.add_eos_token default=false time=2025-11-15T17:49:56.253-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.eos_token_id default=0 time=2025-11-15T17:49:56.253-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}" time=2025-11-15T17:49:56.253-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.pre default="" time=2025-11-15T17:49:56.253-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.block_count default=0 time=2025-11-15T17:49:56.253-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.embedding_length default=0 time=2025-11-15T17:49:56.253-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.attention.head_count default=0 time=2025-11-15T17:49:56.253-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.attention.head_count_kv default=0 time=2025-11-15T17:49:56.253-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.attention.key_length default=0 time=2025-11-15T17:49:56.253-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.rope.dimension_count default=0 time=2025-11-15T17:49:56.253-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.attention.layer_norm_rms_epsilon default=0 time=2025-11-15T17:49:56.253-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.rope.freq_base default=100000 time=2025-11-15T17:49:56.253-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.rope.scaling.factor default=1 time=2025-11-15T17:49:56.253-06:00 level=DEBUG source=runner.go:1373 msg="dummy model load took" duration=11.460398ms time=2025-11-15T17:49:56.253-06:00 level=DEBUG source=runner.go:1378 msg="gathering device infos took" duration=473ns time=2025-11-15T17:49:56.253-06:00 level=TRACE source=runner.go:448 msg="runner enumerated devices" OLLAMA_LIBRARY_PATH="[/usr/local/lib/ollama /usr/local/lib/ollama/cuda_v13]" devices=[] time=2025-11-15T17:49:56.254-06:00 level=DEBUG source=runner.go:418 msg="bootstrap discovery took" duration=33.612633ms OLLAMA_LIBRARY_PATH="[/usr/local/lib/ollama /usr/local/lib/ollama/cuda_v13]" extra_envs=map[] time=2025-11-15T17:49:56.254-06:00 level=TRACE source=runner.go:421 msg="starting runner for device discovery" libDirs="[/usr/local/lib/ollama /usr/local/lib/ollama/rocm]" extraEnvs=map[] time=2025-11-15T17:49:56.254-06:00 level=INFO source=server.go:392 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 35429" time=2025-11-15T17:49:56.254-06:00 level=DEBUG source=server.go:393 msg=subprocess OLLAMA_VULKAN=1 OLLAMA_DEBUG=2 PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin LD_LIBRARY_PATH=/usr/local/lib/ollama:/usr/local/lib/ollama/rocm OLLAMA_LIBRARY_PATH=/usr/local/lib/ollama:/usr/local/lib/ollama/rocm time=2025-11-15T17:49:56.265-06:00 level=INFO source=runner.go:1398 msg="starting ollama engine" time=2025-11-15T17:49:56.265-06:00 level=INFO source=runner.go:1433 msg="Server listening on 127.0.0.1:35429" time=2025-11-15T17:49:56.276-06:00 level=DEBUG source=gguf.go:590 msg=general.architecture type=string time=2025-11-15T17:49:56.276-06:00 level=DEBUG source=gguf.go:590 msg=tokenizer.ggml.model type=string time=2025-11-15T17:49:56.276-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32 time=2025-11-15T17:49:56.276-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.alignment default=32 time=2025-11-15T17:49:56.276-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.file_type default=0 time=2025-11-15T17:49:56.276-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.name default="" time=2025-11-15T17:49:56.276-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=general.description default="" time=2025-11-15T17:49:56.276-06:00 level=INFO source=ggml.go:136 msg="" architecture=llama file_type=unknown name="" description="" num_tensors=0 num_key_values=3 time=2025-11-15T17:49:56.276-06:00 level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/local/lib/ollama load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-haswell.so time=2025-11-15T17:49:56.281-06:00 level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/local/lib/ollama/rocm ggml_cuda_init: failed to initialize ROCm: no ROCm-capable device is detected load_backend: loaded ROCm backend from /usr/local/lib/ollama/rocm/libggml-hip.so time=2025-11-15T17:49:56.513-06:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc) time=2025-11-15T17:49:56.513-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.block_count default=0 time=2025-11-15T17:49:56.514-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.pooling_type default=0 time=2025-11-15T17:49:56.514-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.expert_count default=0 time=2025-11-15T17:49:56.514-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.tokens default="&{size:0 values:[]}" time=2025-11-15T17:49:56.514-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.scores default="&{size:0 values:[]}" time=2025-11-15T17:49:56.514-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.token_type default="&{size:0 values:[]}" time=2025-11-15T17:49:56.514-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.merges default="&{size:0 values:[]}" time=2025-11-15T17:49:56.514-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.add_bos_token default=true time=2025-11-15T17:49:56.514-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.bos_token_id default=0 time=2025-11-15T17:49:56.514-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.add_eos_token default=false time=2025-11-15T17:49:56.514-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.eos_token_id default=0 time=2025-11-15T17:49:56.514-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}" time=2025-11-15T17:49:56.514-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=tokenizer.ggml.pre default="" time=2025-11-15T17:49:56.514-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.block_count default=0 time=2025-11-15T17:49:56.514-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.embedding_length default=0 time=2025-11-15T17:49:56.514-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.attention.head_count default=0 time=2025-11-15T17:49:56.514-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.attention.head_count_kv default=0 time=2025-11-15T17:49:56.514-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.attention.key_length default=0 time=2025-11-15T17:49:56.514-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.rope.dimension_count default=0 time=2025-11-15T17:49:56.514-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.attention.layer_norm_rms_epsilon default=0 time=2025-11-15T17:49:56.514-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.rope.freq_base default=100000 time=2025-11-15T17:49:56.514-06:00 level=DEBUG source=ggml.go:276 msg="key with type not found" key=llama.rope.scaling.factor default=1 time=2025-11-15T17:49:56.514-06:00 level=DEBUG source=runner.go:1373 msg="dummy model load took" duration=238.200778ms time=2025-11-15T17:49:56.514-06:00 level=DEBUG source=runner.go:1378 msg="gathering device infos took" duration=468ns time=2025-11-15T17:49:56.514-06:00 level=TRACE source=runner.go:448 msg="runner enumerated devices" OLLAMA_LIBRARY_PATH="[/usr/local/lib/ollama /usr/local/lib/ollama/rocm]" devices=[] time=2025-11-15T17:49:56.514-06:00 level=DEBUG source=runner.go:418 msg="bootstrap discovery took" duration=260.409362ms OLLAMA_LIBRARY_PATH="[/usr/local/lib/ollama /usr/local/lib/ollama/rocm]" extra_envs=map[] time=2025-11-15T17:49:56.514-06:00 level=DEBUG source=runner.go:116 msg="evluating which if any devices to filter out" initial_count=0 time=2025-11-15T17:49:56.514-06:00 level=TRACE source=runner.go:156 msg="supported GPU library combinations before filtering" supported=map[] time=2025-11-15T17:49:56.514-06:00 level=DEBUG source=runner.go:40 msg="GPU bootstrap discovery took" duration=332.291539ms time=2025-11-15T17:49:56.514-06:00 level=INFO source=types.go:60 msg="inference compute" id=cpu library=cpu compute="" name=cpu description=cpu libdirs=ollama driver="" pci_id="" type="" total="11.7 GiB" available="11.1 GiB" time=2025-11-15T17:49:56.514-06:00 level=INFO source=routes.go:1638 msg="entering low vram mode" "total vram"="0 B" threshold="20.0 GiB" ```
Author
Owner

@rick-github commented on GitHub (Nov 15, 2025):

Output of ls -l /usr/local/lib/ollama?

<!-- gh-comment-id:3537118802 --> @rick-github commented on GitHub (Nov 15, 2025): Output of `ls -l /usr/local/lib/ollama`?
Author
Owner

@baldpope commented on GitHub (Nov 15, 2025):

$ ls -l /usr/local/lib/ollama
total 6104
drwxr-xr-x 2 root root   4096 Nov 13 16:16 cuda_v12
drwxr-xr-x 2 root root   4096 Nov 13 16:14 cuda_v13
-rwxr-xr-x 1 root root 669912 Nov 13 16:00 libggml-base.so
-rwxr-xr-x 1 root root 832784 Nov 13 16:00 libggml-cpu-alderlake.so
-rwxr-xr-x 1 root root 832784 Nov 13 16:00 libggml-cpu-haswell.so
-rwxr-xr-x 1 root root 963856 Nov 13 16:00 libggml-cpu-icelake.so
-rwxr-xr-x 1 root root 775504 Nov 13 16:00 libggml-cpu-sandybridge.so
-rwxr-xr-x 1 root root 963856 Nov 13 16:00 libggml-cpu-skylakex.so
-rwxr-xr-x 1 root root 591312 Nov 13 16:00 libggml-cpu-sse42.so
-rwxr-xr-x 1 root root 587216 Nov 13 16:00 libggml-cpu-x64.so
drwxr-xr-x 3 root root   4096 Nov 13 16:09 rocm
<!-- gh-comment-id:3537121077 --> @baldpope commented on GitHub (Nov 15, 2025): ``` $ ls -l /usr/local/lib/ollama total 6104 drwxr-xr-x 2 root root 4096 Nov 13 16:16 cuda_v12 drwxr-xr-x 2 root root 4096 Nov 13 16:14 cuda_v13 -rwxr-xr-x 1 root root 669912 Nov 13 16:00 libggml-base.so -rwxr-xr-x 1 root root 832784 Nov 13 16:00 libggml-cpu-alderlake.so -rwxr-xr-x 1 root root 832784 Nov 13 16:00 libggml-cpu-haswell.so -rwxr-xr-x 1 root root 963856 Nov 13 16:00 libggml-cpu-icelake.so -rwxr-xr-x 1 root root 775504 Nov 13 16:00 libggml-cpu-sandybridge.so -rwxr-xr-x 1 root root 963856 Nov 13 16:00 libggml-cpu-skylakex.so -rwxr-xr-x 1 root root 591312 Nov 13 16:00 libggml-cpu-sse42.so -rwxr-xr-x 1 root root 587216 Nov 13 16:00 libggml-cpu-x64.so drwxr-xr-x 3 root root 4096 Nov 13 16:09 rocm ```
Author
Owner

@rick-github commented on GitHub (Nov 16, 2025):

Oops, it looks like the Vulkan backend is not packaged into the linux tarballs. It's available in the docker image, so it looks like a packaging issue during release.

<!-- gh-comment-id:3537136309 --> @rick-github commented on GitHub (Nov 16, 2025): Oops, it looks like the Vulkan backend is not packaged into the linux tarballs. It's available in the docker image, so it looks like a packaging issue during release.
Author
Owner

@baldpope commented on GitHub (Nov 16, 2025):

looking at the Dockerfile, looks like I could grab/install the vulkan package starting here: 72ff5b9d8c/Dockerfile (L22)

Is it just a matter of grabbing and installing appropriate files?

<!-- gh-comment-id:3537144975 --> @baldpope commented on GitHub (Nov 16, 2025): looking at the Dockerfile, looks like I could grab/install the vulkan package starting here: https://github.com/ollama/ollama/blob/72ff5b9d8c7a07df46f7a7db68a42562ddab2994/Dockerfile#L22 Is it just a matter of grabbing and installing appropriate files?
Author
Owner

@rick-github commented on GitHub (Nov 16, 2025):

You want these files from the docker image:

$ docker run --rm --entrypoint bash ollama/ollama:0.12.11 -c 'ls -l /usr/lib/ollama/vulkan'
total 51760
-rwxr-xr-x 1 root root 48528200 Nov 13 22:01 libggml-vulkan.so
lrwxrwxrwx 1 root root       20 Nov 13 22:01 libvulkan.so.1 -> libvulkan.so.1.4.321
-rwxr-xr-x 1 root root  4466776 Nov 13 22:00 libvulkan.so.1.4.321
<!-- gh-comment-id:3537155359 --> @rick-github commented on GitHub (Nov 16, 2025): You want these files from the docker image: ```console $ docker run --rm --entrypoint bash ollama/ollama:0.12.11 -c 'ls -l /usr/lib/ollama/vulkan' total 51760 -rwxr-xr-x 1 root root 48528200 Nov 13 22:01 libggml-vulkan.so lrwxrwxrwx 1 root root 20 Nov 13 22:01 libvulkan.so.1 -> libvulkan.so.1.4.321 -rwxr-xr-x 1 root root 4466776 Nov 13 22:00 libvulkan.so.1.4.321 ```
Author
Owner

@rick-github commented on GitHub (Nov 16, 2025):

#13104

<!-- gh-comment-id:3537163630 --> @rick-github commented on GitHub (Nov 16, 2025): #13104
Author
Owner

@baldpope commented on GitHub (Nov 16, 2025):

instead of grabbing the files directly from the docker image (would be difficult for me currently) could/should I install the latest (or pin to 321) Vulkan SDK from https://vulkan.lunarg.com/sdk/home ? and since I don't currently have the /usr/lib/ollama/vulkan directory - would I just place the shared object files in that specific path?

additionally, I don't have libggml-vulkan.so as a file on my file system in /local/lib/ollama:

/usr$ find . -type f -name "libggml*"
./local/lib/ollama/libggml-cpu-sse42.so
./local/lib/ollama/libggml-cpu-haswell.so
./local/lib/ollama/libggml-cpu-alderlake.so
./local/lib/ollama/libggml-cpu-x64.so
./local/lib/ollama/cuda_v13/libggml-cuda.so
./local/lib/ollama/cuda_v12/libggml-cuda.so
./local/lib/ollama/libggml-cpu-skylakex.so
./local/lib/ollama/rocm/libggml-hip.so
./local/lib/ollama/libggml-cpu-sandybridge.so
./local/lib/ollama/libggml-cpu-icelake.so
./local/lib/ollama/libggml-base.so
<!-- gh-comment-id:3537201589 --> @baldpope commented on GitHub (Nov 16, 2025): instead of grabbing the files directly from the docker image (would be difficult for me currently) could/should I install the latest (or pin to 321) Vulkan SDK from https://vulkan.lunarg.com/sdk/home ? and since I don't currently have the /usr/lib/ollama/vulkan directory - would I just place the shared object files in that specific path? additionally, I don't have _**libggml-vulkan.so**_ as a file on my file system in /local/lib/ollama: ``` /usr$ find . -type f -name "libggml*" ./local/lib/ollama/libggml-cpu-sse42.so ./local/lib/ollama/libggml-cpu-haswell.so ./local/lib/ollama/libggml-cpu-alderlake.so ./local/lib/ollama/libggml-cpu-x64.so ./local/lib/ollama/cuda_v13/libggml-cuda.so ./local/lib/ollama/cuda_v12/libggml-cuda.so ./local/lib/ollama/libggml-cpu-skylakex.so ./local/lib/ollama/rocm/libggml-hip.so ./local/lib/ollama/libggml-cpu-sandybridge.so ./local/lib/ollama/libggml-cpu-icelake.so ./local/lib/ollama/libggml-base.so ```
Author
Owner

@rick-github commented on GitHub (Nov 16, 2025):

The SDK will only help if you want to compile ollama. I extracted the Vulkan backend from the docker image, you can download it from https://github.com/rick-github/assets/raw/refs/heads/main/vulkan.tgz

cd /tmp && curl -LO https://github.com/rick-github/assets/raw/refs/heads/main/vulkan.tgz
sudo tar zxf vulkan.tgz -C /usr/local/lib/ollama/

I don't know if this will actually work.

<!-- gh-comment-id:3537249841 --> @rick-github commented on GitHub (Nov 16, 2025): The SDK will only help if you want to compile ollama. I extracted the Vulkan backend from the docker image, you can download it from https://github.com/rick-github/assets/raw/refs/heads/main/vulkan.tgz ``` cd /tmp && curl -LO https://github.com/rick-github/assets/raw/refs/heads/main/vulkan.tgz sudo tar zxf vulkan.tgz -C /usr/local/lib/ollama/ ``` I don't know if this will actually work.
Author
Owner

@baldpope commented on GitHub (Nov 16, 2025):

Extracte the files as referenced, and I see the Arc cards identified:

$ OLLAMA_VULKAN=1 OLLAMA_DEBUG=1 ollama serve
time=2025-11-15T19:54:25.394-06:00 level=INFO source=routes.go:1544 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:DEBUG OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/home/jofficer/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false OLLAMA_VULKAN:true ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
time=2025-11-15T19:54:25.395-06:00 level=INFO source=images.go:522 msg="total blobs: 4"
time=2025-11-15T19:54:25.395-06:00 level=INFO source=images.go:529 msg="total unused blobs removed: 0"
time=2025-11-15T19:54:25.395-06:00 level=INFO source=routes.go:1597 msg="Listening on 127.0.0.1:11434 (version 0.12.11)"
time=2025-11-15T19:54:25.395-06:00 level=DEBUG source=sched.go:120 msg="starting llm scheduler"
time=2025-11-15T19:54:25.396-06:00 level=INFO source=runner.go:67 msg="discovering available GPUs..."
time=2025-11-15T19:54:25.396-06:00 level=INFO source=server.go:392 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 40013"
time=2025-11-15T19:54:25.396-06:00 level=DEBUG source=server.go:393 msg=subprocess OLLAMA_DEBUG=1 OLLAMA_VULKAN=1 LD_LIBRARY_PATH=/usr/local/lib/ollama:/usr/local/lib/ollama/cuda_v12:/opt/intel/oneapi/tcm/1.4/lib:/opt/intel/oneapi/umf/0.11/lib:/opt/intel/oneapi/tbb/2022.2/env/../lib/intel64/gcc4.8:/opt/intel/oneapi/pti/0.13/lib:/opt/intel/oneapi/mpi/2021.16/opt/mpi/libfabric/lib:/opt/intel/oneapi/mpi/2021.16/lib:/opt/intel/oneapi/mkl/2025.2/lib:/opt/intel/oneapi/ippcp/2025.2/lib/:/opt/intel/oneapi/ipp/2022.2/lib:/opt/intel/oneapi/dnnl/2025.2/lib:/opt/intel/oneapi/debugger/2025.2/opt/debugger/lib:/opt/intel/oneapi/dal/2025.8/lib:/opt/intel/oneapi/compiler/2025.2/opt/compiler/lib:/opt/intel/oneapi/compiler/2025.2/lib:/opt/intel/oneapi/ccl/2021.16/lib/ PATH=/opt/intel/oneapi/vtune/2025.4/bin64:/opt/intel/oneapi/mpi/2021.16/bin:/opt/intel/oneapi/mkl/2025.2/bin:/opt/intel/oneapi/dpcpp-ct/2025.2/bin:/opt/intel/oneapi/dev-utilities/2025.2/bin:/opt/intel/oneapi/debugger/2025.2/opt/debugger/bin:/opt/intel/oneapi/compiler/2025.2/bin:/opt/intel/oneapi/advisor/2025.2/bin64:/home/jofficer/.local/bin:/home/jofficer/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin OLLAMA_LIBRARY_PATH=/usr/local/lib/ollama:/usr/local/lib/ollama/cuda_v12
time=2025-11-15T19:54:25.424-06:00 level=DEBUG source=runner.go:418 msg="bootstrap discovery took" duration=28.289006ms OLLAMA_LIBRARY_PATH="[/usr/local/lib/ollama /usr/local/lib/ollama/cuda_v12]" extra_envs=map[]
time=2025-11-15T19:54:25.424-06:00 level=INFO source=server.go:392 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 43693"
time=2025-11-15T19:54:25.424-06:00 level=DEBUG source=server.go:393 msg=subprocess OLLAMA_DEBUG=1 OLLAMA_VULKAN=1 LD_LIBRARY_PATH=/usr/local/lib/ollama:/usr/local/lib/ollama/cuda_v13:/opt/intel/oneapi/tcm/1.4/lib:/opt/intel/oneapi/umf/0.11/lib:/opt/intel/oneapi/tbb/2022.2/env/../lib/intel64/gcc4.8:/opt/intel/oneapi/pti/0.13/lib:/opt/intel/oneapi/mpi/2021.16/opt/mpi/libfabric/lib:/opt/intel/oneapi/mpi/2021.16/lib:/opt/intel/oneapi/mkl/2025.2/lib:/opt/intel/oneapi/ippcp/2025.2/lib/:/opt/intel/oneapi/ipp/2022.2/lib:/opt/intel/oneapi/dnnl/2025.2/lib:/opt/intel/oneapi/debugger/2025.2/opt/debugger/lib:/opt/intel/oneapi/dal/2025.8/lib:/opt/intel/oneapi/compiler/2025.2/opt/compiler/lib:/opt/intel/oneapi/compiler/2025.2/lib:/opt/intel/oneapi/ccl/2021.16/lib/ PATH=/opt/intel/oneapi/vtune/2025.4/bin64:/opt/intel/oneapi/mpi/2021.16/bin:/opt/intel/oneapi/mkl/2025.2/bin:/opt/intel/oneapi/dpcpp-ct/2025.2/bin:/opt/intel/oneapi/dev-utilities/2025.2/bin:/opt/intel/oneapi/debugger/2025.2/opt/debugger/bin:/opt/intel/oneapi/compiler/2025.2/bin:/opt/intel/oneapi/advisor/2025.2/bin64:/home/jofficer/.local/bin:/home/jofficer/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin OLLAMA_LIBRARY_PATH=/usr/local/lib/ollama:/usr/local/lib/ollama/cuda_v13
time=2025-11-15T19:54:25.442-06:00 level=DEBUG source=runner.go:418 msg="bootstrap discovery took" duration=17.861598ms OLLAMA_LIBRARY_PATH="[/usr/local/lib/ollama /usr/local/lib/ollama/cuda_v13]" extra_envs=map[]
time=2025-11-15T19:54:25.442-06:00 level=INFO source=server.go:392 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 41529"
time=2025-11-15T19:54:25.442-06:00 level=DEBUG source=server.go:393 msg=subprocess OLLAMA_DEBUG=1 OLLAMA_VULKAN=1 LD_LIBRARY_PATH=/usr/local/lib/ollama:/usr/local/lib/ollama/rocm:/opt/intel/oneapi/tcm/1.4/lib:/opt/intel/oneapi/umf/0.11/lib:/opt/intel/oneapi/tbb/2022.2/env/../lib/intel64/gcc4.8:/opt/intel/oneapi/pti/0.13/lib:/opt/intel/oneapi/mpi/2021.16/opt/mpi/libfabric/lib:/opt/intel/oneapi/mpi/2021.16/lib:/opt/intel/oneapi/mkl/2025.2/lib:/opt/intel/oneapi/ippcp/2025.2/lib/:/opt/intel/oneapi/ipp/2022.2/lib:/opt/intel/oneapi/dnnl/2025.2/lib:/opt/intel/oneapi/debugger/2025.2/opt/debugger/lib:/opt/intel/oneapi/dal/2025.8/lib:/opt/intel/oneapi/compiler/2025.2/opt/compiler/lib:/opt/intel/oneapi/compiler/2025.2/lib:/opt/intel/oneapi/ccl/2021.16/lib/ PATH=/opt/intel/oneapi/vtune/2025.4/bin64:/opt/intel/oneapi/mpi/2021.16/bin:/opt/intel/oneapi/mkl/2025.2/bin:/opt/intel/oneapi/dpcpp-ct/2025.2/bin:/opt/intel/oneapi/dev-utilities/2025.2/bin:/opt/intel/oneapi/debugger/2025.2/opt/debugger/bin:/opt/intel/oneapi/compiler/2025.2/bin:/opt/intel/oneapi/advisor/2025.2/bin64:/home/jofficer/.local/bin:/home/jofficer/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin OLLAMA_LIBRARY_PATH=/usr/local/lib/ollama:/usr/local/lib/ollama/rocm
time=2025-11-15T19:54:25.492-06:00 level=DEBUG source=runner.go:418 msg="bootstrap discovery took" duration=49.861094ms OLLAMA_LIBRARY_PATH="[/usr/local/lib/ollama /usr/local/lib/ollama/rocm]" extra_envs=map[]
time=2025-11-15T19:54:25.492-06:00 level=INFO source=server.go:392 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 36615"
time=2025-11-15T19:54:25.492-06:00 level=DEBUG source=server.go:393 msg=subprocess OLLAMA_DEBUG=1 OLLAMA_VULKAN=1 LD_LIBRARY_PATH=/usr/local/lib/ollama:/usr/local/lib/ollama/vulkan:/opt/intel/oneapi/tcm/1.4/lib:/opt/intel/oneapi/umf/0.11/lib:/opt/intel/oneapi/tbb/2022.2/env/../lib/intel64/gcc4.8:/opt/intel/oneapi/pti/0.13/lib:/opt/intel/oneapi/mpi/2021.16/opt/mpi/libfabric/lib:/opt/intel/oneapi/mpi/2021.16/lib:/opt/intel/oneapi/mkl/2025.2/lib:/opt/intel/oneapi/ippcp/2025.2/lib/:/opt/intel/oneapi/ipp/2022.2/lib:/opt/intel/oneapi/dnnl/2025.2/lib:/opt/intel/oneapi/debugger/2025.2/opt/debugger/lib:/opt/intel/oneapi/dal/2025.8/lib:/opt/intel/oneapi/compiler/2025.2/opt/compiler/lib:/opt/intel/oneapi/compiler/2025.2/lib:/opt/intel/oneapi/ccl/2021.16/lib/ PATH=/opt/intel/oneapi/vtune/2025.4/bin64:/opt/intel/oneapi/mpi/2021.16/bin:/opt/intel/oneapi/mkl/2025.2/bin:/opt/intel/oneapi/dpcpp-ct/2025.2/bin:/opt/intel/oneapi/dev-utilities/2025.2/bin:/opt/intel/oneapi/debugger/2025.2/opt/debugger/bin:/opt/intel/oneapi/compiler/2025.2/bin:/opt/intel/oneapi/advisor/2025.2/bin64:/home/jofficer/.local/bin:/home/jofficer/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin OLLAMA_LIBRARY_PATH=/usr/local/lib/ollama:/usr/local/lib/ollama/vulkan
time=2025-11-15T19:54:25.561-06:00 level=DEBUG source=runner.go:418 msg="bootstrap discovery took" duration=69.560615ms OLLAMA_LIBRARY_PATH="[/usr/local/lib/ollama /usr/local/lib/ollama/vulkan]" extra_envs=map[]
time=2025-11-15T19:54:25.561-06:00 level=DEBUG source=runner.go:116 msg="evluating which if any devices to filter out" initial_count=2
time=2025-11-15T19:54:25.562-06:00 level=DEBUG source=runner.go:40 msg="GPU bootstrap discovery took" duration=166.072068ms
time=2025-11-15T19:54:25.562-06:00 level=INFO source=types.go:42 msg="inference compute" id=8680a556-0500-0000-0203-000000000000 filter_id="" library=Vulkan compute=0.0 name=Vulkan0 description="Intel(R) Arc(tm) A380 Graphics (DG2)" libdirs=ollama,vulkan driver=0.0 pci_id=0000:02:03.0 type=discrete total="5.9 GiB" available="5.4 GiB"
time=2025-11-15T19:54:25.562-06:00 level=INFO source=types.go:42 msg="inference compute" id=8680a556-0500-0000-0204-000000000000 filter_id="" library=Vulkan compute=0.0 name=Vulkan1 description="Intel(R) Arc(tm) A380 Graphics (DG2)" libdirs=ollama,vulkan driver=0.0 pci_id=0000:02:04.0 type=discrete total="5.9 GiB" available="5.4 GiB"
time=2025-11-15T19:54:25.562-06:00 level=INFO source=routes.go:1638 msg="entering low vram mode" "total vram"="11.9 GiB" threshold="20.0 GiB"

however when I run ollama run qwen3-vl:8b-instruct-q4_K_M the load never finishes. Full debug log attached

192.168.4.162-2025-11-15.console.log

<!-- gh-comment-id:3537359536 --> @baldpope commented on GitHub (Nov 16, 2025): Extracte the files as referenced, and I see the Arc cards identified: ``` $ OLLAMA_VULKAN=1 OLLAMA_DEBUG=1 ollama serve time=2025-11-15T19:54:25.394-06:00 level=INFO source=routes.go:1544 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:DEBUG OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/home/jofficer/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false OLLAMA_VULKAN:true ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]" time=2025-11-15T19:54:25.395-06:00 level=INFO source=images.go:522 msg="total blobs: 4" time=2025-11-15T19:54:25.395-06:00 level=INFO source=images.go:529 msg="total unused blobs removed: 0" time=2025-11-15T19:54:25.395-06:00 level=INFO source=routes.go:1597 msg="Listening on 127.0.0.1:11434 (version 0.12.11)" time=2025-11-15T19:54:25.395-06:00 level=DEBUG source=sched.go:120 msg="starting llm scheduler" time=2025-11-15T19:54:25.396-06:00 level=INFO source=runner.go:67 msg="discovering available GPUs..." time=2025-11-15T19:54:25.396-06:00 level=INFO source=server.go:392 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 40013" time=2025-11-15T19:54:25.396-06:00 level=DEBUG source=server.go:393 msg=subprocess OLLAMA_DEBUG=1 OLLAMA_VULKAN=1 LD_LIBRARY_PATH=/usr/local/lib/ollama:/usr/local/lib/ollama/cuda_v12:/opt/intel/oneapi/tcm/1.4/lib:/opt/intel/oneapi/umf/0.11/lib:/opt/intel/oneapi/tbb/2022.2/env/../lib/intel64/gcc4.8:/opt/intel/oneapi/pti/0.13/lib:/opt/intel/oneapi/mpi/2021.16/opt/mpi/libfabric/lib:/opt/intel/oneapi/mpi/2021.16/lib:/opt/intel/oneapi/mkl/2025.2/lib:/opt/intel/oneapi/ippcp/2025.2/lib/:/opt/intel/oneapi/ipp/2022.2/lib:/opt/intel/oneapi/dnnl/2025.2/lib:/opt/intel/oneapi/debugger/2025.2/opt/debugger/lib:/opt/intel/oneapi/dal/2025.8/lib:/opt/intel/oneapi/compiler/2025.2/opt/compiler/lib:/opt/intel/oneapi/compiler/2025.2/lib:/opt/intel/oneapi/ccl/2021.16/lib/ PATH=/opt/intel/oneapi/vtune/2025.4/bin64:/opt/intel/oneapi/mpi/2021.16/bin:/opt/intel/oneapi/mkl/2025.2/bin:/opt/intel/oneapi/dpcpp-ct/2025.2/bin:/opt/intel/oneapi/dev-utilities/2025.2/bin:/opt/intel/oneapi/debugger/2025.2/opt/debugger/bin:/opt/intel/oneapi/compiler/2025.2/bin:/opt/intel/oneapi/advisor/2025.2/bin64:/home/jofficer/.local/bin:/home/jofficer/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin OLLAMA_LIBRARY_PATH=/usr/local/lib/ollama:/usr/local/lib/ollama/cuda_v12 time=2025-11-15T19:54:25.424-06:00 level=DEBUG source=runner.go:418 msg="bootstrap discovery took" duration=28.289006ms OLLAMA_LIBRARY_PATH="[/usr/local/lib/ollama /usr/local/lib/ollama/cuda_v12]" extra_envs=map[] time=2025-11-15T19:54:25.424-06:00 level=INFO source=server.go:392 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 43693" time=2025-11-15T19:54:25.424-06:00 level=DEBUG source=server.go:393 msg=subprocess OLLAMA_DEBUG=1 OLLAMA_VULKAN=1 LD_LIBRARY_PATH=/usr/local/lib/ollama:/usr/local/lib/ollama/cuda_v13:/opt/intel/oneapi/tcm/1.4/lib:/opt/intel/oneapi/umf/0.11/lib:/opt/intel/oneapi/tbb/2022.2/env/../lib/intel64/gcc4.8:/opt/intel/oneapi/pti/0.13/lib:/opt/intel/oneapi/mpi/2021.16/opt/mpi/libfabric/lib:/opt/intel/oneapi/mpi/2021.16/lib:/opt/intel/oneapi/mkl/2025.2/lib:/opt/intel/oneapi/ippcp/2025.2/lib/:/opt/intel/oneapi/ipp/2022.2/lib:/opt/intel/oneapi/dnnl/2025.2/lib:/opt/intel/oneapi/debugger/2025.2/opt/debugger/lib:/opt/intel/oneapi/dal/2025.8/lib:/opt/intel/oneapi/compiler/2025.2/opt/compiler/lib:/opt/intel/oneapi/compiler/2025.2/lib:/opt/intel/oneapi/ccl/2021.16/lib/ PATH=/opt/intel/oneapi/vtune/2025.4/bin64:/opt/intel/oneapi/mpi/2021.16/bin:/opt/intel/oneapi/mkl/2025.2/bin:/opt/intel/oneapi/dpcpp-ct/2025.2/bin:/opt/intel/oneapi/dev-utilities/2025.2/bin:/opt/intel/oneapi/debugger/2025.2/opt/debugger/bin:/opt/intel/oneapi/compiler/2025.2/bin:/opt/intel/oneapi/advisor/2025.2/bin64:/home/jofficer/.local/bin:/home/jofficer/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin OLLAMA_LIBRARY_PATH=/usr/local/lib/ollama:/usr/local/lib/ollama/cuda_v13 time=2025-11-15T19:54:25.442-06:00 level=DEBUG source=runner.go:418 msg="bootstrap discovery took" duration=17.861598ms OLLAMA_LIBRARY_PATH="[/usr/local/lib/ollama /usr/local/lib/ollama/cuda_v13]" extra_envs=map[] time=2025-11-15T19:54:25.442-06:00 level=INFO source=server.go:392 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 41529" time=2025-11-15T19:54:25.442-06:00 level=DEBUG source=server.go:393 msg=subprocess OLLAMA_DEBUG=1 OLLAMA_VULKAN=1 LD_LIBRARY_PATH=/usr/local/lib/ollama:/usr/local/lib/ollama/rocm:/opt/intel/oneapi/tcm/1.4/lib:/opt/intel/oneapi/umf/0.11/lib:/opt/intel/oneapi/tbb/2022.2/env/../lib/intel64/gcc4.8:/opt/intel/oneapi/pti/0.13/lib:/opt/intel/oneapi/mpi/2021.16/opt/mpi/libfabric/lib:/opt/intel/oneapi/mpi/2021.16/lib:/opt/intel/oneapi/mkl/2025.2/lib:/opt/intel/oneapi/ippcp/2025.2/lib/:/opt/intel/oneapi/ipp/2022.2/lib:/opt/intel/oneapi/dnnl/2025.2/lib:/opt/intel/oneapi/debugger/2025.2/opt/debugger/lib:/opt/intel/oneapi/dal/2025.8/lib:/opt/intel/oneapi/compiler/2025.2/opt/compiler/lib:/opt/intel/oneapi/compiler/2025.2/lib:/opt/intel/oneapi/ccl/2021.16/lib/ PATH=/opt/intel/oneapi/vtune/2025.4/bin64:/opt/intel/oneapi/mpi/2021.16/bin:/opt/intel/oneapi/mkl/2025.2/bin:/opt/intel/oneapi/dpcpp-ct/2025.2/bin:/opt/intel/oneapi/dev-utilities/2025.2/bin:/opt/intel/oneapi/debugger/2025.2/opt/debugger/bin:/opt/intel/oneapi/compiler/2025.2/bin:/opt/intel/oneapi/advisor/2025.2/bin64:/home/jofficer/.local/bin:/home/jofficer/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin OLLAMA_LIBRARY_PATH=/usr/local/lib/ollama:/usr/local/lib/ollama/rocm time=2025-11-15T19:54:25.492-06:00 level=DEBUG source=runner.go:418 msg="bootstrap discovery took" duration=49.861094ms OLLAMA_LIBRARY_PATH="[/usr/local/lib/ollama /usr/local/lib/ollama/rocm]" extra_envs=map[] time=2025-11-15T19:54:25.492-06:00 level=INFO source=server.go:392 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 36615" time=2025-11-15T19:54:25.492-06:00 level=DEBUG source=server.go:393 msg=subprocess OLLAMA_DEBUG=1 OLLAMA_VULKAN=1 LD_LIBRARY_PATH=/usr/local/lib/ollama:/usr/local/lib/ollama/vulkan:/opt/intel/oneapi/tcm/1.4/lib:/opt/intel/oneapi/umf/0.11/lib:/opt/intel/oneapi/tbb/2022.2/env/../lib/intel64/gcc4.8:/opt/intel/oneapi/pti/0.13/lib:/opt/intel/oneapi/mpi/2021.16/opt/mpi/libfabric/lib:/opt/intel/oneapi/mpi/2021.16/lib:/opt/intel/oneapi/mkl/2025.2/lib:/opt/intel/oneapi/ippcp/2025.2/lib/:/opt/intel/oneapi/ipp/2022.2/lib:/opt/intel/oneapi/dnnl/2025.2/lib:/opt/intel/oneapi/debugger/2025.2/opt/debugger/lib:/opt/intel/oneapi/dal/2025.8/lib:/opt/intel/oneapi/compiler/2025.2/opt/compiler/lib:/opt/intel/oneapi/compiler/2025.2/lib:/opt/intel/oneapi/ccl/2021.16/lib/ PATH=/opt/intel/oneapi/vtune/2025.4/bin64:/opt/intel/oneapi/mpi/2021.16/bin:/opt/intel/oneapi/mkl/2025.2/bin:/opt/intel/oneapi/dpcpp-ct/2025.2/bin:/opt/intel/oneapi/dev-utilities/2025.2/bin:/opt/intel/oneapi/debugger/2025.2/opt/debugger/bin:/opt/intel/oneapi/compiler/2025.2/bin:/opt/intel/oneapi/advisor/2025.2/bin64:/home/jofficer/.local/bin:/home/jofficer/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin OLLAMA_LIBRARY_PATH=/usr/local/lib/ollama:/usr/local/lib/ollama/vulkan time=2025-11-15T19:54:25.561-06:00 level=DEBUG source=runner.go:418 msg="bootstrap discovery took" duration=69.560615ms OLLAMA_LIBRARY_PATH="[/usr/local/lib/ollama /usr/local/lib/ollama/vulkan]" extra_envs=map[] time=2025-11-15T19:54:25.561-06:00 level=DEBUG source=runner.go:116 msg="evluating which if any devices to filter out" initial_count=2 time=2025-11-15T19:54:25.562-06:00 level=DEBUG source=runner.go:40 msg="GPU bootstrap discovery took" duration=166.072068ms time=2025-11-15T19:54:25.562-06:00 level=INFO source=types.go:42 msg="inference compute" id=8680a556-0500-0000-0203-000000000000 filter_id="" library=Vulkan compute=0.0 name=Vulkan0 description="Intel(R) Arc(tm) A380 Graphics (DG2)" libdirs=ollama,vulkan driver=0.0 pci_id=0000:02:03.0 type=discrete total="5.9 GiB" available="5.4 GiB" time=2025-11-15T19:54:25.562-06:00 level=INFO source=types.go:42 msg="inference compute" id=8680a556-0500-0000-0204-000000000000 filter_id="" library=Vulkan compute=0.0 name=Vulkan1 description="Intel(R) Arc(tm) A380 Graphics (DG2)" libdirs=ollama,vulkan driver=0.0 pci_id=0000:02:04.0 type=discrete total="5.9 GiB" available="5.4 GiB" time=2025-11-15T19:54:25.562-06:00 level=INFO source=routes.go:1638 msg="entering low vram mode" "total vram"="11.9 GiB" threshold="20.0 GiB" ``` however when I run **ollama run qwen3-vl:8b-instruct-q4_K_M** the load never finishes. Full debug log attached [192.168.4.162-2025-11-15.console.log](https://github.com/user-attachments/files/23565252/192.168.4.162-2025-11-15.console.log)
Author
Owner

@rick-github commented on GitHub (Nov 16, 2025):

ggml_vulkan: Device memory allocation of size 4199040000 failed.
ggml_vulkan: vk::Device::allocateMemory: ErrorOutOfDeviceMemory
ggml_gallocr_reserve_n: failed to allocate Vulkan1 buffer of size 4561861680

Looks like the memory calculations are wrong. You can try some of the OOM mitigations shown here to see if you can get the model to load.

<!-- gh-comment-id:3537369393 --> @rick-github commented on GitHub (Nov 16, 2025): ``` ggml_vulkan: Device memory allocation of size 4199040000 failed. ggml_vulkan: vk::Device::allocateMemory: ErrorOutOfDeviceMemory ggml_gallocr_reserve_n: failed to allocate Vulkan1 buffer of size 4561861680 ``` Looks like the memory calculations are wrong. You can try some of the OOM mitigations shown [here](https://github.com/ollama/ollama/issues/8597#issuecomment-2614533288) to see if you can get the model to load.
Author
Owner

@rick-github commented on GitHub (Nov 16, 2025):

Actually, most of those failures may just be ollama trying various configurations to fit the model into available memory.

<!-- gh-comment-id:3537387653 --> @rick-github commented on GitHub (Nov 16, 2025): Actually, most of those failures may just be ollama trying various configurations to fit the model into available memory.
Author
Owner

@baldpope commented on GitHub (Nov 16, 2025):

it's quite possible that I simply don't have enough resources to load the model, though I had success (just sharing) loading the model and mmproj files with llama.cpp. Is it possible it's failing because it's identifying the on-board GPU (technically the CPU) when it should ONLY use the two Arc cards?

for reference:

Devices:
========
GPU0:
        apiVersion         = 1.4.305
        driverVersion      = 25.0.7
        vendorID           = 0x8086
        deviceID           = 0x56a5
        deviceType         = PHYSICAL_DEVICE_TYPE_DISCRETE_GPU
        deviceName         = Intel(R) Arc(tm) A380 Graphics (DG2)
        driverID           = DRIVER_ID_INTEL_OPEN_SOURCE_MESA
        driverName         = Intel open-source Mesa driver
        driverInfo         = Mesa 25.0.7-0ubuntu0.24.04.2
        conformanceVersion = 1.4.0.0
        deviceUUID         = 8680a556-0500-0000-0203-000000000000
        driverUUID         = 802b0057-40c2-aed9-e538-d78b797f04f4
GPU1:
        apiVersion         = 1.4.305
        driverVersion      = 25.0.7
        vendorID           = 0x8086
        deviceID           = 0x56a5
        deviceType         = PHYSICAL_DEVICE_TYPE_DISCRETE_GPU
        deviceName         = Intel(R) Arc(tm) A380 Graphics (DG2)
        driverID           = DRIVER_ID_INTEL_OPEN_SOURCE_MESA
        driverName         = Intel open-source Mesa driver
        driverInfo         = Mesa 25.0.7-0ubuntu0.24.04.2
        conformanceVersion = 1.4.0.0
        deviceUUID         = 8680a556-0500-0000-0204-000000000000
        driverUUID         = 802b0057-40c2-aed9-e538-d78b797f04f4
GPU2:
        apiVersion         = 1.4.305
        driverVersion      = 0.0.1
        vendorID           = 0x10005
        deviceID           = 0x0000
        deviceType         = PHYSICAL_DEVICE_TYPE_CPU
        deviceName         = llvmpipe (LLVM 20.1.2, 256 bits)
        driverID           = DRIVER_ID_MESA_LLVMPIPE
        driverName         = llvmpipe
        driverInfo         = Mesa 25.0.7-0ubuntu0.24.04.2 (LLVM 20.1.2)
        conformanceVersion = 1.3.1.1
        deviceUUID         = 6d657361-3235-2e30-2e37-2d3075627500
        driverUUID         = 6c6c766d-7069-7065-5555-494400000000

so GPU0 and GPU1 are my arc cards, I want to exclude the GPU2 (CPU) from use.

<!-- gh-comment-id:3537425910 --> @baldpope commented on GitHub (Nov 16, 2025): it's quite possible that I simply don't have enough resources to load the model, though I had success (just sharing) loading the model and mmproj files with llama.cpp. Is it possible it's failing because it's identifying the on-board GPU (technically the CPU) when it should ONLY use the two Arc cards? for reference: ``` Devices: ======== GPU0: apiVersion = 1.4.305 driverVersion = 25.0.7 vendorID = 0x8086 deviceID = 0x56a5 deviceType = PHYSICAL_DEVICE_TYPE_DISCRETE_GPU deviceName = Intel(R) Arc(tm) A380 Graphics (DG2) driverID = DRIVER_ID_INTEL_OPEN_SOURCE_MESA driverName = Intel open-source Mesa driver driverInfo = Mesa 25.0.7-0ubuntu0.24.04.2 conformanceVersion = 1.4.0.0 deviceUUID = 8680a556-0500-0000-0203-000000000000 driverUUID = 802b0057-40c2-aed9-e538-d78b797f04f4 GPU1: apiVersion = 1.4.305 driverVersion = 25.0.7 vendorID = 0x8086 deviceID = 0x56a5 deviceType = PHYSICAL_DEVICE_TYPE_DISCRETE_GPU deviceName = Intel(R) Arc(tm) A380 Graphics (DG2) driverID = DRIVER_ID_INTEL_OPEN_SOURCE_MESA driverName = Intel open-source Mesa driver driverInfo = Mesa 25.0.7-0ubuntu0.24.04.2 conformanceVersion = 1.4.0.0 deviceUUID = 8680a556-0500-0000-0204-000000000000 driverUUID = 802b0057-40c2-aed9-e538-d78b797f04f4 GPU2: apiVersion = 1.4.305 driverVersion = 0.0.1 vendorID = 0x10005 deviceID = 0x0000 deviceType = PHYSICAL_DEVICE_TYPE_CPU deviceName = llvmpipe (LLVM 20.1.2, 256 bits) driverID = DRIVER_ID_MESA_LLVMPIPE driverName = llvmpipe driverInfo = Mesa 25.0.7-0ubuntu0.24.04.2 (LLVM 20.1.2) conformanceVersion = 1.3.1.1 deviceUUID = 6d657361-3235-2e30-2e37-2d3075627500 driverUUID = 6c6c766d-7069-7065-5555-494400000000 ``` so GPU0 and GPU1 are my arc cards, I want to exclude the GPU2 (CPU) from use.
Author
Owner

@rick-github commented on GitHub (Nov 20, 2025):

time=2025-11-15T19:54:25.562-06:00 level=INFO source=types.go:42 msg="inference compute"
 id=8680a556-0500-0000-0203-000000000000 filter_id="" library=Vulkan compute=0.0 name=Vulkan0
 description="Intel(R) Arc(tm) A380 Graphics (DG2)" libdirs=ollama,vulkan driver=0.0 pci_id=0000:02:03.0 type=discrete
 total="5.9 GiB" available="5.4 GiB"
time=2025-11-15T19:54:25.562-06:00 level=INFO source=types.go:42 msg="inference compute"
 id=8680a556-0500-0000-0204-000000000000 filter_id="" library=Vulkan compute=0.0 name=Vulkan1
 description="Intel(R) Arc(tm) A380 Graphics (DG2)" libdirs=ollama,vulkan driver=0.0 pci_id=0000:02:04.0 type=discrete
 total="5.9 GiB" available="5.4 GiB"

Logs indicate that only the A380s are being used.

<!-- gh-comment-id:3558360790 --> @rick-github commented on GitHub (Nov 20, 2025): ``` time=2025-11-15T19:54:25.562-06:00 level=INFO source=types.go:42 msg="inference compute" id=8680a556-0500-0000-0203-000000000000 filter_id="" library=Vulkan compute=0.0 name=Vulkan0 description="Intel(R) Arc(tm) A380 Graphics (DG2)" libdirs=ollama,vulkan driver=0.0 pci_id=0000:02:03.0 type=discrete total="5.9 GiB" available="5.4 GiB" time=2025-11-15T19:54:25.562-06:00 level=INFO source=types.go:42 msg="inference compute" id=8680a556-0500-0000-0204-000000000000 filter_id="" library=Vulkan compute=0.0 name=Vulkan1 description="Intel(R) Arc(tm) A380 Graphics (DG2)" libdirs=ollama,vulkan driver=0.0 pci_id=0000:02:04.0 type=discrete total="5.9 GiB" available="5.4 GiB" ``` Logs indicate that only the A380s are being used.
Author
Owner

@dhiltgen commented on GitHub (Nov 21, 2025):

@baldpope do things work correctly if you load a smaller model that fits on 1 or both GPUs? If you haven't tried yet, please give 0.13.0 a try and see if it behaves better.

<!-- gh-comment-id:3565022557 --> @dhiltgen commented on GitHub (Nov 21, 2025): @baldpope do things work correctly if you load a smaller model that fits on 1 or both GPUs? If you haven't tried yet, please give [0.13.0](https://github.com/ollama/ollama/releases) a try and see if it behaves better.
Author
Owner

@baldpope commented on GitHub (Dec 1, 2025):

I tried running Ollama 0.13 and the start-up appears to recognize the GPUs but the client (ollama run $model) doesn't return a prompt:

running the server in one console:

OLLAMA_VULKAN=1 OLLAMA_MODELS=/opt/ollama/models/ OLLAMA_HOST=0.0.0.0 ollama serve

time=2025-12-01T03:18:21.493Z level=INFO source=routes.go:1544 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:INFO OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/opt/ollama/models/ OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false OLLAMA_VULKAN:true ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
time=2025-12-01T03:18:21.495Z level=INFO source=images.go:522 msg="total blobs: 14"
time=2025-12-01T03:18:21.496Z level=INFO source=images.go:529 msg="total unused blobs removed: 0"
time=2025-12-01T03:18:21.496Z level=INFO source=routes.go:1597 msg="Listening on [::]:11434 (version 0.13.0)"
time=2025-12-01T03:18:21.496Z level=INFO source=runner.go:67 msg="discovering available GPUs..."
time=2025-12-01T03:18:21.497Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 34965"
time=2025-12-01T03:18:21.555Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 34871"
time=2025-12-01T03:18:21.584Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 37047"
time=2025-12-01T03:18:21.612Z level=INFO source=types.go:42 msg="inference compute" id=8680a556-0500-0000-0203-000000000000 filter_id="" library=Vulkan compute=0.0 name=Vulkan0 description="Intel(R) Arc(tm) A380 Graphics (DG2)" libdirs=ollama,vulkan driver=0.0 pci_id=0000:02:03.0 type=discrete total="5.9 GiB" available="5.4 GiB"
time=2025-12-01T03:18:21.613Z level=INFO source=types.go:42 msg="inference compute" id=8680a556-0500-0000-0204-000000000000 filter_id="" library=Vulkan compute=0.0 name=Vulkan1 description="Intel(R) Arc(tm) A380 Graphics (DG2)" libdirs=ollama,vulkan driver=0.0 pci_id=0000:02:04.0 type=discrete total="5.9 GiB" available="5.4 GiB"
time=2025-12-01T03:18:21.613Z level=INFO source=routes.go:1638 msg="entering low vram mode" "total vram"="11.9 GiB" threshold="20.0 GiB"
[GIN] 2025/12/01 - 03:18:34 | 200 |      37.773µs |       127.0.0.1 | HEAD     "/"
[GIN] 2025/12/01 - 03:18:34 | 200 |  132.899515ms |       127.0.0.1 | POST     "/api/show"
[GIN] 2025/12/01 - 03:18:34 | 200 |  121.494889ms |       127.0.0.1 | POST     "/api/show"
time=2025-12-01T03:18:35.200Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 42175"
time=2025-12-01T03:18:35.480Z level=INFO source=server.go:209 msg="enabling flash attention"
time=2025-12-01T03:18:35.480Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --model /opt/ollama/models/blobs/sha256-aeda25e63ebd698fab8638ffb778e68bed908b960d39d0becc650fa981609d25 --port 41277"
time=2025-12-01T03:18:35.481Z level=INFO source=sched.go:443 msg="system memory" total="7.8 GiB" free="7.1 GiB" free_swap="2.0 GiB"
time=2025-12-01T03:18:35.481Z level=INFO source=sched.go:450 msg="gpu memory" id=8680a556-0500-0000-0203-000000000000 library=Vulkan available="4.9 GiB" free="5.4 GiB" minimum="457.0 MiB" overhead="0 B"
time=2025-12-01T03:18:35.481Z level=INFO source=sched.go:450 msg="gpu memory" id=8680a556-0500-0000-0204-000000000000 library=Vulkan available="4.9 GiB" free="5.4 GiB" minimum="457.0 MiB" overhead="0 B"
time=2025-12-01T03:18:35.481Z level=INFO source=server.go:702 msg="loading model" "model layers"=35 requested=-1
time=2025-12-01T03:18:35.492Z level=INFO source=runner.go:1398 msg="starting ollama engine"
time=2025-12-01T03:18:35.492Z level=INFO source=runner.go:1433 msg="Server listening on 127.0.0.1:41277"
time=2025-12-01T03:18:35.509Z level=INFO source=runner.go:1271 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:1 GPULayers:35[ID:8680a556-0500-0000-0203-000000000000 Layers:35(0..34)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2025-12-01T03:18:35.593Z level=INFO source=ggml.go:136 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=883 num_key_values=36
load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-haswell.so
ggml_vulkan: Found 2 Vulkan devices:
ggml_vulkan: 0 = Intel(R) Arc(tm) A380 Graphics (DG2) (Intel open-source Mesa driver) | uma: 0 | fp16: 1 | bf16: 0 | warp size: 32 | shared memory: 65536 | int dot: 1 | matrix cores: none
ggml_vulkan: 1 = Intel(R) Arc(tm) A380 Graphics (DG2) (Intel open-source Mesa driver) | uma: 0 | fp16: 1 | bf16: 0 | warp size: 32 | shared memory: 65536 | int dot: 1 | matrix cores: none
load_backend: loaded Vulkan backend from /usr/local/lib/ollama/vulkan/libggml-vulkan.so
time=2025-12-01T03:18:35.631Z level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc)
ggml_backend_vk_get_device_memory called: uuid 8680a556-0500-0000-0203-000000000000
ggml_backend_vk_get_device_memory called: luid 0x0000000000000000
ggml_backend_vk_get_device_memory called: uuid 8680a556-0500-0000-0204-000000000000
ggml_backend_vk_get_device_memory called: luid 0x0000000000000000
time=2025-12-01T03:18:35.959Z level=INFO source=runner.go:1271 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:1 GPULayers:35[ID:8680a556-0500-0000-0203-000000000000 Layers:35(0..34)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
ggml_backend_vk_get_device_memory called: uuid 8680a556-0500-0000-0203-000000000000
ggml_backend_vk_get_device_memory called: luid 0x0000000000000000
ggml_backend_vk_get_device_memory called: uuid 8680a556-0500-0000-0204-000000000000
ggml_backend_vk_get_device_memory called: luid 0x0000000000000000
time=2025-12-01T03:18:36.532Z level=INFO source=device.go:240 msg="model weights" device=Vulkan0 size="3.1 GiB"
time=2025-12-01T03:18:36.532Z level=INFO source=device.go:245 msg="model weights" device=CPU size="525.0 MiB"
time=2025-12-01T03:18:36.532Z level=INFO source=device.go:251 msg="kv cache" device=Vulkan0 size="254.0 MiB"
time=2025-12-01T03:18:36.532Z level=INFO source=device.go:262 msg="compute graph" device=Vulkan0 size="1.1 GiB"
time=2025-12-01T03:18:36.532Z level=INFO source=device.go:267 msg="compute graph" device=CPU size="5.0 MiB"
time=2025-12-01T03:18:36.532Z level=INFO source=device.go:272 msg="total memory" size="5.0 GiB"
time=2025-12-01T03:18:36.532Z level=INFO source=sched.go:517 msg="loaded runners" count=1
time=2025-12-01T03:18:36.532Z level=INFO source=server.go:1294 msg="waiting for llama runner to start responding"
time=2025-12-01T03:18:36.531Z level=INFO source=runner.go:1271 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:1 GPULayers:35[ID:8680a556-0500-0000-0203-000000000000 Layers:35(0..34)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2025-12-01T03:18:36.531Z level=INFO source=ggml.go:482 msg="offloading 34 repeating layers to GPU"
time=2025-12-01T03:18:36.531Z level=INFO source=ggml.go:489 msg="offloading output layer to GPU"
time=2025-12-01T03:18:36.531Z level=INFO source=ggml.go:494 msg="offloaded 35/35 layers to GPU"
time=2025-12-01T03:18:36.532Z level=INFO source=server.go:1328 msg="waiting for server to become available" status="llm server loading model"
time=2025-12-01T03:18:50.832Z level=WARN source=server.go:1301 msg="client connection closed before server finished loading, aborting load"
time=2025-12-01T03:18:50.832Z level=ERROR source=sched.go:523 msg="error loading llama server" error="timed out waiting for llama runner to start: context canceled"
[GIN] 2025/12/01 - 03:18:50 | 499 | 15.891827166s |       127.0.0.1 | POST     "/api/generate"
time=2025-12-01T03:18:50.844Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 42351"
ggml_backend_vk_get_device_memory called: uuid 8680a556-0500-0000-0203-000000000000
ggml_backend_vk_get_device_memory called: luid 0x0000000000000000
ggml_backend_vk_get_device_memory called: uuid 8680a556-0500-0000-0203-000000000000
ggml_backend_vk_get_device_memory called: luid 0x0000000000000000
time=2025-12-01T03:18:51.213Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 44325"
time=2025-12-01T03:18:51.463Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 43437"
time=2025-12-01T03:18:51.713Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 34233"
time=2025-12-01T03:18:51.963Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 36063"
time=2025-12-01T03:18:52.214Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 42453"
time=2025-12-01T03:18:52.464Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 36837"
time=2025-12-01T03:18:52.714Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 44539"
time=2025-12-01T03:18:52.963Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 36927"
time=2025-12-01T03:18:53.214Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 43439"
time=2025-12-01T03:18:53.463Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 42885"
time=2025-12-01T03:18:53.714Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 44249"
time=2025-12-01T03:18:53.964Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 42991"
time=2025-12-01T03:18:54.213Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 32853"
time=2025-12-01T03:18:54.463Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 45235"
time=2025-12-01T03:18:54.713Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 41279"
time=2025-12-01T03:18:54.963Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 46387"
time=2025-12-01T03:18:55.213Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 40799"
time=2025-12-01T03:18:55.464Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 35777"
time=2025-12-01T03:18:55.713Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 40851"
time=2025-12-01T03:18:55.964Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 39377"
time=2025-12-01T03:18:55.965Z level=INFO source=runner.go:449 msg="failure during GPU discovery" OLLAMA_LIBRARY_PATH="[/usr/local/lib/ollama /usr/local/lib/ollama/vulkan]" extra_envs=map[] error="failed to finish discovery before timeout"
time=2025-12-01T03:18:55.965Z level=WARN source=runner.go:341 msg="unable to refresh free memory, using old values"

running the client:

ollama run gemma3:latest

*** UPDATE ***

I enabled debug (OLLAMA_DEBUG=1) while running the server and I now see that it's taking a really long time to actually load the model (could be specific to my system). I was able to load and use gemma3:270m

<!-- gh-comment-id:3594277723 --> @baldpope commented on GitHub (Dec 1, 2025): I tried running Ollama 0.13 and the start-up appears to recognize the GPUs but the client (ollama run $model) doesn't return a prompt: running the server in one console: ``` OLLAMA_VULKAN=1 OLLAMA_MODELS=/opt/ollama/models/ OLLAMA_HOST=0.0.0.0 ollama serve time=2025-12-01T03:18:21.493Z level=INFO source=routes.go:1544 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:INFO OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/opt/ollama/models/ OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false OLLAMA_VULKAN:true ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]" time=2025-12-01T03:18:21.495Z level=INFO source=images.go:522 msg="total blobs: 14" time=2025-12-01T03:18:21.496Z level=INFO source=images.go:529 msg="total unused blobs removed: 0" time=2025-12-01T03:18:21.496Z level=INFO source=routes.go:1597 msg="Listening on [::]:11434 (version 0.13.0)" time=2025-12-01T03:18:21.496Z level=INFO source=runner.go:67 msg="discovering available GPUs..." time=2025-12-01T03:18:21.497Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 34965" time=2025-12-01T03:18:21.555Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 34871" time=2025-12-01T03:18:21.584Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 37047" time=2025-12-01T03:18:21.612Z level=INFO source=types.go:42 msg="inference compute" id=8680a556-0500-0000-0203-000000000000 filter_id="" library=Vulkan compute=0.0 name=Vulkan0 description="Intel(R) Arc(tm) A380 Graphics (DG2)" libdirs=ollama,vulkan driver=0.0 pci_id=0000:02:03.0 type=discrete total="5.9 GiB" available="5.4 GiB" time=2025-12-01T03:18:21.613Z level=INFO source=types.go:42 msg="inference compute" id=8680a556-0500-0000-0204-000000000000 filter_id="" library=Vulkan compute=0.0 name=Vulkan1 description="Intel(R) Arc(tm) A380 Graphics (DG2)" libdirs=ollama,vulkan driver=0.0 pci_id=0000:02:04.0 type=discrete total="5.9 GiB" available="5.4 GiB" time=2025-12-01T03:18:21.613Z level=INFO source=routes.go:1638 msg="entering low vram mode" "total vram"="11.9 GiB" threshold="20.0 GiB" [GIN] 2025/12/01 - 03:18:34 | 200 | 37.773µs | 127.0.0.1 | HEAD  "/" [GIN] 2025/12/01 - 03:18:34 | 200 | 132.899515ms | 127.0.0.1 | POST  "/api/show" [GIN] 2025/12/01 - 03:18:34 | 200 | 121.494889ms | 127.0.0.1 | POST  "/api/show" time=2025-12-01T03:18:35.200Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 42175" time=2025-12-01T03:18:35.480Z level=INFO source=server.go:209 msg="enabling flash attention" time=2025-12-01T03:18:35.480Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --model /opt/ollama/models/blobs/sha256-aeda25e63ebd698fab8638ffb778e68bed908b960d39d0becc650fa981609d25 --port 41277" time=2025-12-01T03:18:35.481Z level=INFO source=sched.go:443 msg="system memory" total="7.8 GiB" free="7.1 GiB" free_swap="2.0 GiB" time=2025-12-01T03:18:35.481Z level=INFO source=sched.go:450 msg="gpu memory" id=8680a556-0500-0000-0203-000000000000 library=Vulkan available="4.9 GiB" free="5.4 GiB" minimum="457.0 MiB" overhead="0 B" time=2025-12-01T03:18:35.481Z level=INFO source=sched.go:450 msg="gpu memory" id=8680a556-0500-0000-0204-000000000000 library=Vulkan available="4.9 GiB" free="5.4 GiB" minimum="457.0 MiB" overhead="0 B" time=2025-12-01T03:18:35.481Z level=INFO source=server.go:702 msg="loading model" "model layers"=35 requested=-1 time=2025-12-01T03:18:35.492Z level=INFO source=runner.go:1398 msg="starting ollama engine" time=2025-12-01T03:18:35.492Z level=INFO source=runner.go:1433 msg="Server listening on 127.0.0.1:41277" time=2025-12-01T03:18:35.509Z level=INFO source=runner.go:1271 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:1 GPULayers:35[ID:8680a556-0500-0000-0203-000000000000 Layers:35(0..34)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2025-12-01T03:18:35.593Z level=INFO source=ggml.go:136 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=883 num_key_values=36 load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-haswell.so ggml_vulkan: Found 2 Vulkan devices: ggml_vulkan: 0 = Intel(R) Arc(tm) A380 Graphics (DG2) (Intel open-source Mesa driver) | uma: 0 | fp16: 1 | bf16: 0 | warp size: 32 | shared memory: 65536 | int dot: 1 | matrix cores: none ggml_vulkan: 1 = Intel(R) Arc(tm) A380 Graphics (DG2) (Intel open-source Mesa driver) | uma: 0 | fp16: 1 | bf16: 0 | warp size: 32 | shared memory: 65536 | int dot: 1 | matrix cores: none load_backend: loaded Vulkan backend from /usr/local/lib/ollama/vulkan/libggml-vulkan.so time=2025-12-01T03:18:35.631Z level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc) ggml_backend_vk_get_device_memory called: uuid 8680a556-0500-0000-0203-000000000000 ggml_backend_vk_get_device_memory called: luid 0x0000000000000000 ggml_backend_vk_get_device_memory called: uuid 8680a556-0500-0000-0204-000000000000 ggml_backend_vk_get_device_memory called: luid 0x0000000000000000 time=2025-12-01T03:18:35.959Z level=INFO source=runner.go:1271 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:1 GPULayers:35[ID:8680a556-0500-0000-0203-000000000000 Layers:35(0..34)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" ggml_backend_vk_get_device_memory called: uuid 8680a556-0500-0000-0203-000000000000 ggml_backend_vk_get_device_memory called: luid 0x0000000000000000 ggml_backend_vk_get_device_memory called: uuid 8680a556-0500-0000-0204-000000000000 ggml_backend_vk_get_device_memory called: luid 0x0000000000000000 time=2025-12-01T03:18:36.532Z level=INFO source=device.go:240 msg="model weights" device=Vulkan0 size="3.1 GiB" time=2025-12-01T03:18:36.532Z level=INFO source=device.go:245 msg="model weights" device=CPU size="525.0 MiB" time=2025-12-01T03:18:36.532Z level=INFO source=device.go:251 msg="kv cache" device=Vulkan0 size="254.0 MiB" time=2025-12-01T03:18:36.532Z level=INFO source=device.go:262 msg="compute graph" device=Vulkan0 size="1.1 GiB" time=2025-12-01T03:18:36.532Z level=INFO source=device.go:267 msg="compute graph" device=CPU size="5.0 MiB" time=2025-12-01T03:18:36.532Z level=INFO source=device.go:272 msg="total memory" size="5.0 GiB" time=2025-12-01T03:18:36.532Z level=INFO source=sched.go:517 msg="loaded runners" count=1 time=2025-12-01T03:18:36.532Z level=INFO source=server.go:1294 msg="waiting for llama runner to start responding" time=2025-12-01T03:18:36.531Z level=INFO source=runner.go:1271 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:1 GPULayers:35[ID:8680a556-0500-0000-0203-000000000000 Layers:35(0..34)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2025-12-01T03:18:36.531Z level=INFO source=ggml.go:482 msg="offloading 34 repeating layers to GPU" time=2025-12-01T03:18:36.531Z level=INFO source=ggml.go:489 msg="offloading output layer to GPU" time=2025-12-01T03:18:36.531Z level=INFO source=ggml.go:494 msg="offloaded 35/35 layers to GPU" time=2025-12-01T03:18:36.532Z level=INFO source=server.go:1328 msg="waiting for server to become available" status="llm server loading model" time=2025-12-01T03:18:50.832Z level=WARN source=server.go:1301 msg="client connection closed before server finished loading, aborting load" time=2025-12-01T03:18:50.832Z level=ERROR source=sched.go:523 msg="error loading llama server" error="timed out waiting for llama runner to start: context canceled" [GIN] 2025/12/01 - 03:18:50 | 499 | 15.891827166s | 127.0.0.1 | POST  "/api/generate" time=2025-12-01T03:18:50.844Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 42351" ggml_backend_vk_get_device_memory called: uuid 8680a556-0500-0000-0203-000000000000 ggml_backend_vk_get_device_memory called: luid 0x0000000000000000 ggml_backend_vk_get_device_memory called: uuid 8680a556-0500-0000-0203-000000000000 ggml_backend_vk_get_device_memory called: luid 0x0000000000000000 time=2025-12-01T03:18:51.213Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 44325" time=2025-12-01T03:18:51.463Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 43437" time=2025-12-01T03:18:51.713Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 34233" time=2025-12-01T03:18:51.963Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 36063" time=2025-12-01T03:18:52.214Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 42453" time=2025-12-01T03:18:52.464Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 36837" time=2025-12-01T03:18:52.714Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 44539" time=2025-12-01T03:18:52.963Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 36927" time=2025-12-01T03:18:53.214Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 43439" time=2025-12-01T03:18:53.463Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 42885" time=2025-12-01T03:18:53.714Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 44249" time=2025-12-01T03:18:53.964Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 42991" time=2025-12-01T03:18:54.213Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 32853" time=2025-12-01T03:18:54.463Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 45235" time=2025-12-01T03:18:54.713Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 41279" time=2025-12-01T03:18:54.963Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 46387" time=2025-12-01T03:18:55.213Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 40799" time=2025-12-01T03:18:55.464Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 35777" time=2025-12-01T03:18:55.713Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 40851" time=2025-12-01T03:18:55.964Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 39377" time=2025-12-01T03:18:55.965Z level=INFO source=runner.go:449 msg="failure during GPU discovery" OLLAMA_LIBRARY_PATH="[/usr/local/lib/ollama /usr/local/lib/ollama/vulkan]" extra_envs=map[] error="failed to finish discovery before timeout" time=2025-12-01T03:18:55.965Z level=WARN source=runner.go:341 msg="unable to refresh free memory, using old values" ``` running the _client_: `ollama run gemma3:latest` *** UPDATE *** I enabled debug (OLLAMA_DEBUG=1) while running the server and I now see that it's taking a really long time to actually load the model (could be specific to my system). I was able to load and use _gemma3:270m_
Author
Owner

@dhiltgen commented on GitHub (Dec 5, 2025):

@baldpope so it sounds like it's working, just really slow to load. Is that correct? How long does it take to load a model that fits roughly on 1 of your GPUs? Once it's loaded, what sort of token rate are you seeing?

<!-- gh-comment-id:3614888473 --> @dhiltgen commented on GitHub (Dec 5, 2025): @baldpope so it sounds like it's working, just really slow to load. Is that correct? How long does it take to load a model that fits roughly on 1 of your GPUs? Once it's loaded, what sort of token rate are you seeing?
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#70733