[GH-ISSUE #13086] Vulkan on intel iGPU results in gibberish #8662

Closed
opened 2026-04-12 21:25:24 -05:00 by GiteaMirror · 6 comments
Owner

Originally created by @cibernox on GitHub (Nov 14, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/13086

What is the issue?

I have a proxmox server in which I have an intel 12th gen i3 and a RTX 3060.

In that server I have an LXC where I run portainer with and I passthrough both the nvidia and the intel iGPU to the LXC container.

I've been running ollama with cuda on the nvidia card for months flawlessly.

Now that ollama supports vulkan I wanted to also run a second ollama container with vulkan using the iGPU for smaller models.

When I run that second instance without vulkan (meaning it runs on the CPU) models work fine. When I enable vulkan I see no errors anywhere and the models work but they output absolute garbage. I'm using qwen3:4B in Q4_K_M for my tests.

For completion, this is the config entry on my docker-compose, where I added a few flags just to be extra sure ollama was using the right GPU, but I think it was using it even without those:

  ollama-vulkan:
    image: ollama/ollama
    container_name: ollama-vulkan
    restart: unless-stopped
    ports:
      - "11436:11434"
    volumes:
      - ./models:/root/.ollama # Same volume as the Cuda-based ollama, save HDD space.
    devices:
      - /dev/dri:/dev/dri
    environment:
      - OLLAMA_DEBUG=1
      - OLLAMA_VULKAN=1  
      - GGML_VK_VISIBLE_DEVICES=0  # Force GPU0 (Intel)
      - GGML_VULKAN_DEVICE=0        # Alternative variable
      - VK_DEVICE_SELECT_FORCE_DEFAULT_DEVICE=1  # Force first physical GPU  

Relevant log output

Nothing on the ollama server logs seems particularly odd and ollama seems to detect the
"Intel(R) Graphics (ADL GT2)" iGPU.
There are some mentions of cuda but I assume those libraries are not used when vulkan is enabled.

time=2025-11-14T12:55:52.272Z level=INFO source=routes.go:1544 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES:0 GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:DEBUG OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false OLLAMA_VULKAN:true ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"

time=2025-11-14T12:55:52.272Z level=INFO source=images.go:522 msg="total blobs: 13"

time=2025-11-14T12:55:52.273Z level=INFO source=images.go:529 msg="total unused blobs removed: 0"

time=2025-11-14T12:55:52.273Z level=INFO source=routes.go:1597 msg="Listening on [::]:11434 (version 0.12.11)"

time=2025-11-14T12:55:52.273Z level=DEBUG source=sched.go:120 msg="starting llm scheduler"

time=2025-11-14T12:55:52.273Z level=INFO source=runner.go:67 msg="discovering available GPUs..."

time=2025-11-14T12:55:52.273Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 38321"

time=2025-11-14T12:55:52.273Z level=DEBUG source=server.go:393 msg=subprocess PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin OLLAMA_DEBUG=1 OLLAMA_VULKAN=1 GGML_VK_VISIBLE_DEVICES=0 GGML_VULKAN_DEVICE=0 LD_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/cuda_v13:/usr/local/nvidia/lib:/usr/local/nvidia/lib64 OLLAMA_HOST=0.0.0.0:11434 OLLAMA_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/cuda_v13

time=2025-11-14T12:55:52.305Z level=DEBUG source=runner.go:418 msg="bootstrap discovery took" duration=31.776715ms OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/cuda_v13]" extra_envs=map[]

time=2025-11-14T12:55:52.305Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 35489"

time=2025-11-14T12:55:52.305Z level=DEBUG source=server.go:393 msg=subprocess PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin OLLAMA_DEBUG=1 OLLAMA_VULKAN=1 GGML_VK_VISIBLE_DEVICES=0 GGML_VULKAN_DEVICE=0 LD_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/vulkan:/usr/local/nvidia/lib:/usr/local/nvidia/lib64 OLLAMA_HOST=0.0.0.0:11434 OLLAMA_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/vulkan

time=2025-11-14T12:55:52.379Z level=DEBUG source=runner.go:418 msg="bootstrap discovery took" duration=73.997807ms OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/vulkan]" extra_envs=map[]

time=2025-11-14T12:55:52.379Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 40445"

time=2025-11-14T12:55:52.379Z level=DEBUG source=server.go:393 msg=subprocess PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin OLLAMA_DEBUG=1 OLLAMA_VULKAN=1 GGML_VK_VISIBLE_DEVICES=0 GGML_VULKAN_DEVICE=0 LD_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/cuda_v12:/usr/local/nvidia/lib:/usr/local/nvidia/lib64 OLLAMA_HOST=0.0.0.0:11434 OLLAMA_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/cuda_v12

time=2025-11-14T12:55:52.396Z level=DEBUG source=runner.go:418 msg="bootstrap discovery took" duration=17.406299ms OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/cuda_v12]" extra_envs=map[]

time=2025-11-14T12:55:52.396Z level=DEBUG source=runner.go:116 msg="evluating which if any devices to filter out" initial_count=1

time=2025-11-14T12:55:52.396Z level=DEBUG source=runner.go:40 msg="GPU bootstrap discovery took" duration=123.565523ms

time=2025-11-14T12:55:52.396Z level=INFO source=types.go:42 msg="inference compute" id=8680b346-0c00-0000-0002-000000000000 filter_id="" library=Vulkan compute=0.0 name=Vulkan0 description="Intel(R) Graphics (ADL GT2)" libdirs=ollama,vulkan driver=0.0 pci_id=0000:00:02.0 type=iGPU total="15.4 GiB" available="13.5 GiB"

time=2025-11-14T12:55:52.396Z level=INFO source=routes.go:1638 msg="entering low vram mode" "total vram"="15.4 GiB" threshold="20.0 GiB"

I ssh'ed into the docker container and I ran vulkaninfo (which I had to install) to be sure it all looked well, and to me it does:

Vulkan Instance Version: 1.3.275


Instance Extensions: count = 24
-------------------------------
VK_EXT_acquire_drm_display             : extension revision 1
VK_EXT_acquire_xlib_display            : extension revision 1
VK_EXT_debug_report                    : extension revision 10
VK_EXT_debug_utils                     : extension revision 2
VK_EXT_direct_mode_display             : extension revision 1
VK_EXT_display_surface_counter         : extension revision 1
VK_EXT_headless_surface                : extension revision 1
VK_EXT_surface_maintenance1            : extension revision 1
VK_EXT_swapchain_colorspace            : extension revision 5
VK_KHR_device_group_creation           : extension revision 1
VK_KHR_display                         : extension revision 23
VK_KHR_external_fence_capabilities     : extension revision 1
VK_KHR_external_memory_capabilities    : extension revision 1
VK_KHR_external_semaphore_capabilities : extension revision 1
VK_KHR_get_display_properties2         : extension revision 1
VK_KHR_get_physical_device_properties2 : extension revision 2
VK_KHR_get_surface_capabilities2       : extension revision 1
VK_KHR_portability_enumeration         : extension revision 1
VK_KHR_surface                         : extension revision 25
VK_KHR_surface_protected_capabilities  : extension revision 1
VK_KHR_wayland_surface                 : extension revision 6
VK_KHR_xcb_surface                     : extension revision 6
VK_KHR_xlib_surface                    : extension revision 6
VK_LUNARG_direct_driver_loading        : extension revision 1

Instance Layers: count = 3
--------------------------
VK_LAYER_INTEL_nullhw       INTEL NULL HW                1.1.73   version 1
VK_LAYER_MESA_device_select Linux device selection layer 1.4.303  version 1
VK_LAYER_MESA_overlay       Mesa Overlay layer           1.4.303  version 1

Devices:
========
GPU0:
        apiVersion         = 1.4.305
        driverVersion      = 25.0.7
        vendorID           = 0x8086
        deviceID           = 0x46b3
        deviceType         = PHYSICAL_DEVICE_TYPE_INTEGRATED_GPU
        deviceName         = Intel(R) Graphics (ADL GT2)
        driverID           = DRIVER_ID_INTEL_OPEN_SOURCE_MESA
        driverName         = Intel open-source Mesa driver
        driverInfo         = Mesa 25.0.7-0ubuntu0.24.04.2
        conformanceVersion = 1.4.0.0
        deviceUUID         = 8680b346-0c00-0000-0002-000000000000
        driverUUID         = 802b0057-40c2-aed9-e538-d78b797f04f4
GPU1:
        apiVersion         = 1.4.305
        driverVersion      = 0.0.1
        vendorID           = 0x10005
        deviceID           = 0x0000
        deviceType         = PHYSICAL_DEVICE_TYPE_CPU
        deviceName         = llvmpipe (LLVM 20.1.2, 256 bits)
        driverID           = DRIVER_ID_MESA_LLVMPIPE
        driverName         = llvmpipe
        driverInfo         = Mesa 25.0.7-0ubuntu0.24.04.2 (LLVM 20.1.2)
        conformanceVersion = 1.3.1.1
        deviceUUID         = 6d657361-3235-2e30-2e37-2d3075627500
        driverUUID         = 6c6c766d-7069-7065-5555-494400000000

Is this a know issue? Any other logs that could help track this?

OS

Docker

GPU

Intel

CPU

Intel

Ollama version

v0.12.11

Originally created by @cibernox on GitHub (Nov 14, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/13086 ### What is the issue? I have a proxmox server in which I have an intel 12th gen i3 and a RTX 3060. In that server I have an LXC where I run portainer with and I passthrough both the nvidia and the intel iGPU to the LXC container. I've been running ollama with cuda on the nvidia card for months flawlessly. Now that ollama supports vulkan I wanted to also run a second ollama container with vulkan using the iGPU for smaller models. When I run that second instance without vulkan (meaning it runs on the CPU) models work fine. When I enable vulkan I see no errors anywhere and the models work but they output absolute garbage. I'm using qwen3:4B in Q4_K_M for my tests. For completion, this is the config entry on my docker-compose, where I added a few flags just to be extra sure ollama was using the right GPU, but I think it was using it even without those: ```yml ollama-vulkan: image: ollama/ollama container_name: ollama-vulkan restart: unless-stopped ports: - "11436:11434" volumes: - ./models:/root/.ollama # Same volume as the Cuda-based ollama, save HDD space. devices: - /dev/dri:/dev/dri environment: - OLLAMA_DEBUG=1 - OLLAMA_VULKAN=1 - GGML_VK_VISIBLE_DEVICES=0 # Force GPU0 (Intel) - GGML_VULKAN_DEVICE=0 # Alternative variable - VK_DEVICE_SELECT_FORCE_DEFAULT_DEVICE=1 # Force first physical GPU ``` ### Relevant log output Nothing on the ollama server logs seems particularly odd and ollama seems to detect the `"Intel(R) Graphics (ADL GT2)"` iGPU. There are some mentions of cuda but I assume those libraries are not used when vulkan is enabled. ```shell time=2025-11-14T12:55:52.272Z level=INFO source=routes.go:1544 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES:0 GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:DEBUG OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false OLLAMA_VULKAN:true ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]" time=2025-11-14T12:55:52.272Z level=INFO source=images.go:522 msg="total blobs: 13" time=2025-11-14T12:55:52.273Z level=INFO source=images.go:529 msg="total unused blobs removed: 0" time=2025-11-14T12:55:52.273Z level=INFO source=routes.go:1597 msg="Listening on [::]:11434 (version 0.12.11)" time=2025-11-14T12:55:52.273Z level=DEBUG source=sched.go:120 msg="starting llm scheduler" time=2025-11-14T12:55:52.273Z level=INFO source=runner.go:67 msg="discovering available GPUs..." time=2025-11-14T12:55:52.273Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 38321" time=2025-11-14T12:55:52.273Z level=DEBUG source=server.go:393 msg=subprocess PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin OLLAMA_DEBUG=1 OLLAMA_VULKAN=1 GGML_VK_VISIBLE_DEVICES=0 GGML_VULKAN_DEVICE=0 LD_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/cuda_v13:/usr/local/nvidia/lib:/usr/local/nvidia/lib64 OLLAMA_HOST=0.0.0.0:11434 OLLAMA_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/cuda_v13 time=2025-11-14T12:55:52.305Z level=DEBUG source=runner.go:418 msg="bootstrap discovery took" duration=31.776715ms OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/cuda_v13]" extra_envs=map[] time=2025-11-14T12:55:52.305Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 35489" time=2025-11-14T12:55:52.305Z level=DEBUG source=server.go:393 msg=subprocess PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin OLLAMA_DEBUG=1 OLLAMA_VULKAN=1 GGML_VK_VISIBLE_DEVICES=0 GGML_VULKAN_DEVICE=0 LD_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/vulkan:/usr/local/nvidia/lib:/usr/local/nvidia/lib64 OLLAMA_HOST=0.0.0.0:11434 OLLAMA_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/vulkan time=2025-11-14T12:55:52.379Z level=DEBUG source=runner.go:418 msg="bootstrap discovery took" duration=73.997807ms OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/vulkan]" extra_envs=map[] time=2025-11-14T12:55:52.379Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 40445" time=2025-11-14T12:55:52.379Z level=DEBUG source=server.go:393 msg=subprocess PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin OLLAMA_DEBUG=1 OLLAMA_VULKAN=1 GGML_VK_VISIBLE_DEVICES=0 GGML_VULKAN_DEVICE=0 LD_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/cuda_v12:/usr/local/nvidia/lib:/usr/local/nvidia/lib64 OLLAMA_HOST=0.0.0.0:11434 OLLAMA_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/cuda_v12 time=2025-11-14T12:55:52.396Z level=DEBUG source=runner.go:418 msg="bootstrap discovery took" duration=17.406299ms OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/cuda_v12]" extra_envs=map[] time=2025-11-14T12:55:52.396Z level=DEBUG source=runner.go:116 msg="evluating which if any devices to filter out" initial_count=1 time=2025-11-14T12:55:52.396Z level=DEBUG source=runner.go:40 msg="GPU bootstrap discovery took" duration=123.565523ms time=2025-11-14T12:55:52.396Z level=INFO source=types.go:42 msg="inference compute" id=8680b346-0c00-0000-0002-000000000000 filter_id="" library=Vulkan compute=0.0 name=Vulkan0 description="Intel(R) Graphics (ADL GT2)" libdirs=ollama,vulkan driver=0.0 pci_id=0000:00:02.0 type=iGPU total="15.4 GiB" available="13.5 GiB" time=2025-11-14T12:55:52.396Z level=INFO source=routes.go:1638 msg="entering low vram mode" "total vram"="15.4 GiB" threshold="20.0 GiB" ``` I ssh'ed into the docker container and I ran `vulkaninfo` (which I had to install) to be sure it all looked well, and to me it does: ``` Vulkan Instance Version: 1.3.275 Instance Extensions: count = 24 ------------------------------- VK_EXT_acquire_drm_display : extension revision 1 VK_EXT_acquire_xlib_display : extension revision 1 VK_EXT_debug_report : extension revision 10 VK_EXT_debug_utils : extension revision 2 VK_EXT_direct_mode_display : extension revision 1 VK_EXT_display_surface_counter : extension revision 1 VK_EXT_headless_surface : extension revision 1 VK_EXT_surface_maintenance1 : extension revision 1 VK_EXT_swapchain_colorspace : extension revision 5 VK_KHR_device_group_creation : extension revision 1 VK_KHR_display : extension revision 23 VK_KHR_external_fence_capabilities : extension revision 1 VK_KHR_external_memory_capabilities : extension revision 1 VK_KHR_external_semaphore_capabilities : extension revision 1 VK_KHR_get_display_properties2 : extension revision 1 VK_KHR_get_physical_device_properties2 : extension revision 2 VK_KHR_get_surface_capabilities2 : extension revision 1 VK_KHR_portability_enumeration : extension revision 1 VK_KHR_surface : extension revision 25 VK_KHR_surface_protected_capabilities : extension revision 1 VK_KHR_wayland_surface : extension revision 6 VK_KHR_xcb_surface : extension revision 6 VK_KHR_xlib_surface : extension revision 6 VK_LUNARG_direct_driver_loading : extension revision 1 Instance Layers: count = 3 -------------------------- VK_LAYER_INTEL_nullhw INTEL NULL HW 1.1.73 version 1 VK_LAYER_MESA_device_select Linux device selection layer 1.4.303 version 1 VK_LAYER_MESA_overlay Mesa Overlay layer 1.4.303 version 1 Devices: ======== GPU0: apiVersion = 1.4.305 driverVersion = 25.0.7 vendorID = 0x8086 deviceID = 0x46b3 deviceType = PHYSICAL_DEVICE_TYPE_INTEGRATED_GPU deviceName = Intel(R) Graphics (ADL GT2) driverID = DRIVER_ID_INTEL_OPEN_SOURCE_MESA driverName = Intel open-source Mesa driver driverInfo = Mesa 25.0.7-0ubuntu0.24.04.2 conformanceVersion = 1.4.0.0 deviceUUID = 8680b346-0c00-0000-0002-000000000000 driverUUID = 802b0057-40c2-aed9-e538-d78b797f04f4 GPU1: apiVersion = 1.4.305 driverVersion = 0.0.1 vendorID = 0x10005 deviceID = 0x0000 deviceType = PHYSICAL_DEVICE_TYPE_CPU deviceName = llvmpipe (LLVM 20.1.2, 256 bits) driverID = DRIVER_ID_MESA_LLVMPIPE driverName = llvmpipe driverInfo = Mesa 25.0.7-0ubuntu0.24.04.2 (LLVM 20.1.2) conformanceVersion = 1.3.1.1 deviceUUID = 6d657361-3235-2e30-2e37-2d3075627500 driverUUID = 6c6c766d-7069-7065-5555-494400000000 ``` Is this a know issue? Any other logs that could help track this? ### OS Docker ### GPU Intel ### CPU Intel ### Ollama version v0.12.11
GiteaMirror added the vulkanbug labels 2026-04-12 21:25:24 -05:00
Author
Owner

@umutd3401 commented on GitHub (Nov 15, 2025):

See https://github.com/ggml-org/llama.cpp/issues/17056 and https://github.com/ggml-org/llama.cpp/issues/17106

I tested with Intel(R) Iris(R) Xe Graphics (TGL GT2), enabling GGML_VK_DISABLE_INTEGER_DOT_PRODUCT makes it work correctly. I don't know about the downsides.

<!-- gh-comment-id:3536296852 --> @umutd3401 commented on GitHub (Nov 15, 2025): See https://github.com/ggml-org/llama.cpp/issues/17056 and https://github.com/ggml-org/llama.cpp/issues/17106 I tested with Intel(R) Iris(R) Xe Graphics (TGL GT2), enabling `GGML_VK_DISABLE_INTEGER_DOT_PRODUCT` makes it work correctly. I don't know about the downsides.
Author
Owner

@cibernox commented on GitHub (Nov 15, 2025):

@umutd3401 thanks! That was indeed the same error. That flag fixed it.
Performance is still quite disappointing. It's essentially the same as using pure CPU (+5% in my tests). It does use 30% less power than using the CPU and leaves all the cores free for other stuff, so it's still a win on that regard.

<!-- gh-comment-id:3536899527 --> @cibernox commented on GitHub (Nov 15, 2025): @umutd3401 thanks! That was indeed the same error. That flag fixed it. Performance is still quite disappointing. It's essentially the same as using pure CPU (+5% in my tests). It does use 30% less power than using the CPU and leaves all the cores free for other stuff, so it's still a win on that regard.
Author
Owner

@cipriancraciun commented on GitHub (Nov 20, 2025):

I can confirm that using an Intel Core i7 12700 iGPU, with Ollama v0.13.0, and with Gemma3 4b and 12b, outputs gibberish, and by uisng GGML_VK_DISABLE_INTEGER_DOT_PRODUCT=1 solves the issue.

The (truncated) output of vulkaninfo is:

VK_LAYER_MESA_device_select (Linux device selection layer) Vulkan version 1.3.211, layer version 1:
    Layer Extensions: count = 0
    Devices: count = 1
        GPU id = 0 (Intel(R) UHD Graphics 770 (ADL-S GT1))
        Layer-Device Extensions: count = 0

I can also confirm @cibernox observation:

It's essentially the same as using pure CPU (+5% in my tests). It does use 30% less power than using the CPU a[...]

In my case (on the hardware described above) the figures are:

  • as compared to "idle", using iGPU with Vulkan, it uses 21 W more power;
  • as compared to "idle", using CPU (8 cores), it uses 77 W more power;
  • thus, the CPU uses 3.6 times more power than iGPU;
  • (these figures are taken based on the UPS readings while Ollama is running;)

However, the performance in tokens per second is quite large:

  • with iGPU with Vulkan, Gemma3 12b, yields 2.48 tps;
  • with CPU (8 cores), Gemma3 12b, yields 4.58 tps;
  • thus, the CPU yields 1.846 times more tps than the iGPU;

Computing tokens per second per wat, the figures look like this:

  • with iGPU, 0.118 tps per wat;
  • with CPU, 0.059 tps per wat;
  • thus, the iGPU is about 1.98 times more efficient in terms of power consumption;

I hope these figures help other choosing between "speed" (which with pure CPU it isn't) or "power efficiency" (which given the difference isn't that astonishing), but as the other user pointed out, at least you are keeping the CPU available for other tasks.

<!-- gh-comment-id:3559174313 --> @cipriancraciun commented on GitHub (Nov 20, 2025): I can confirm that using an Intel Core i7 12700 iGPU, with Ollama v0.13.0, and with Gemma3 4b and 12b, outputs gibberish, and by uisng `GGML_VK_DISABLE_INTEGER_DOT_PRODUCT=1` solves the issue. The (truncated) output of `vulkaninfo` is: ~~~~ VK_LAYER_MESA_device_select (Linux device selection layer) Vulkan version 1.3.211, layer version 1: Layer Extensions: count = 0 Devices: count = 1 GPU id = 0 (Intel(R) UHD Graphics 770 (ADL-S GT1)) Layer-Device Extensions: count = 0 ~~~~ ---- I can also confirm @cibernox observation: > It's essentially the same as using pure CPU (+5% in my tests). It does use 30% less power than using the CPU a[...] In my case (on the hardware described above) the figures are: * as compared to "idle", using iGPU with Vulkan, it uses 21 W more power; * as compared to "idle", using CPU (8 cores), it uses 77 W more power; * thus, the CPU uses 3.6 times more power than iGPU; * (these figures are taken based on the UPS readings while Ollama is running;) However, the performance in tokens per second is quite large: * with iGPU with Vulkan, Gemma3 12b, yields 2.48 tps; * with CPU (8 cores), Gemma3 12b, yields 4.58 tps; * thus, the CPU yields 1.846 times more tps than the iGPU; Computing tokens per second per wat, the figures look like this: * with iGPU, 0.118 tps per wat; * with CPU, 0.059 tps per wat; * thus, the iGPU is about 1.98 times more efficient in terms of power consumption; I hope these figures help other choosing between "speed" (which with pure CPU it isn't) or "power efficiency" (which given the difference isn't that astonishing), but as the other user pointed out, at least you are keeping the CPU available for other tasks.
Author
Owner

@cibernox commented on GitHub (Nov 20, 2025):

@cipriancraciun I guess that you iGPU are almost identical in performance to mine, while your CPU, being an i7 instead of my i3 it's significantly more capable.

It's also possible that I'm being limited by the DDR4 memory bandwidth here.

<!-- gh-comment-id:3559236559 --> @cibernox commented on GitHub (Nov 20, 2025): @cipriancraciun I guess that you iGPU are almost identical in performance to mine, while your CPU, being an i7 instead of my i3 it's significantly more capable. It's also possible that I'm being limited by the DDR4 memory bandwidth here.
Author
Owner

@PaulEins commented on GitHub (Mar 22, 2026):

hi, i have this problem too. i cant install Ollama on my uGreen 4800 plus and config with blinko note from gitHub. Ollama Modell dont work on my ahrdware. Cloude say, you need to fix this software bug. please can you make this? are you working on this? i have to wait? ok?

<!-- gh-comment-id:4106698501 --> @PaulEins commented on GitHub (Mar 22, 2026): hi, i have this problem too. i cant install Ollama on my uGreen 4800 plus and config with blinko note from gitHub. Ollama Modell dont work on my ahrdware. Cloude say, you need to fix this software bug. please can you make this? are you working on this? i have to wait? ok?
Author
Owner

@jmonroynieto commented on GitHub (Apr 9, 2026):

same issue here.

<!-- gh-comment-id:4211815893 --> @jmonroynieto commented on GitHub (Apr 9, 2026): same issue here.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#8662