[GH-ISSUE #13086] Vulkan on intel iGPU results in gibberish #8662

New Issue

GiteaMirror · 2026-04-12T21:25:24-05:00

GiteaMirror commented

2026-04-12 21:25:24 -05:00

Originally created by @cibernox on GitHub (Nov 14, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/13086

What is the issue?

I have a proxmox server in which I have an intel 12th gen i3 and a RTX 3060.

In that server I have an LXC where I run portainer with and I passthrough both the nvidia and the intel iGPU to the LXC container.

I've been running ollama with cuda on the nvidia card for months flawlessly.

Now that ollama supports vulkan I wanted to also run a second ollama container with vulkan using the iGPU for smaller models.

When I run that second instance without vulkan (meaning it runs on the CPU) models work fine. When I enable vulkan I see no errors anywhere and the models work but they output absolute garbage. I'm using qwen3:4B in Q4_K_M for my tests.

For completion, this is the config entry on my docker-compose, where I added a few flags just to be extra sure ollama was using the right GPU, but I think it was using it even without those:

  ollama-vulkan:
    image: ollama/ollama
    container_name: ollama-vulkan
    restart: unless-stopped
    ports:
      - "11436:11434"
    volumes:
      - ./models:/root/.ollama # Same volume as the Cuda-based ollama, save HDD space.
    devices:
      - /dev/dri:/dev/dri
    environment:
      - OLLAMA_DEBUG=1
      - OLLAMA_VULKAN=1  
      - GGML_VK_VISIBLE_DEVICES=0  # Force GPU0 (Intel)
      - GGML_VULKAN_DEVICE=0        # Alternative variable
      - VK_DEVICE_SELECT_FORCE_DEFAULT_DEVICE=1  # Force first physical GPU

Relevant log output

Nothing on the ollama server logs seems particularly odd and ollama seems to detect the
"Intel(R) Graphics (ADL GT2)" iGPU.
There are some mentions of cuda but I assume those libraries are not used when vulkan is enabled.

time=2025-11-14T12:55:52.272Z level=INFO source=routes.go:1544 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES:0 GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:DEBUG OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false OLLAMA_VULKAN:true ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"

time=2025-11-14T12:55:52.272Z level=INFO source=images.go:522 msg="total blobs: 13"

time=2025-11-14T12:55:52.273Z level=INFO source=images.go:529 msg="total unused blobs removed: 0"

time=2025-11-14T12:55:52.273Z level=INFO source=routes.go:1597 msg="Listening on [::]:11434 (version 0.12.11)"

time=2025-11-14T12:55:52.273Z level=DEBUG source=sched.go:120 msg="starting llm scheduler"

time=2025-11-14T12:55:52.273Z level=INFO source=runner.go:67 msg="discovering available GPUs..."

time=2025-11-14T12:55:52.273Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 38321"

time=2025-11-14T12:55:52.273Z level=DEBUG source=server.go:393 msg=subprocess PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin OLLAMA_DEBUG=1 OLLAMA_VULKAN=1 GGML_VK_VISIBLE_DEVICES=0 GGML_VULKAN_DEVICE=0 LD_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/cuda_v13:/usr/local/nvidia/lib:/usr/local/nvidia/lib64 OLLAMA_HOST=0.0.0.0:11434 OLLAMA_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/cuda_v13

time=2025-11-14T12:55:52.305Z level=DEBUG source=runner.go:418 msg="bootstrap discovery took" duration=31.776715ms OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/cuda_v13]" extra_envs=map[]

time=2025-11-14T12:55:52.305Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 35489"

time=2025-11-14T12:55:52.305Z level=DEBUG source=server.go:393 msg=subprocess PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin OLLAMA_DEBUG=1 OLLAMA_VULKAN=1 GGML_VK_VISIBLE_DEVICES=0 GGML_VULKAN_DEVICE=0 LD_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/vulkan:/usr/local/nvidia/lib:/usr/local/nvidia/lib64 OLLAMA_HOST=0.0.0.0:11434 OLLAMA_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/vulkan

time=2025-11-14T12:55:52.379Z level=DEBUG source=runner.go:418 msg="bootstrap discovery took" duration=73.997807ms OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/vulkan]" extra_envs=map[]

time=2025-11-14T12:55:52.379Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 40445"

time=2025-11-14T12:55:52.379Z level=DEBUG source=server.go:393 msg=subprocess PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin OLLAMA_DEBUG=1 OLLAMA_VULKAN=1 GGML_VK_VISIBLE_DEVICES=0 GGML_VULKAN_DEVICE=0 LD_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/cuda_v12:/usr/local/nvidia/lib:/usr/local/nvidia/lib64 OLLAMA_HOST=0.0.0.0:11434 OLLAMA_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/cuda_v12

time=2025-11-14T12:55:52.396Z level=DEBUG source=runner.go:418 msg="bootstrap discovery took" duration=17.406299ms OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/cuda_v12]" extra_envs=map[]

time=2025-11-14T12:55:52.396Z level=DEBUG source=runner.go:116 msg="evluating which if any devices to filter out" initial_count=1

time=2025-11-14T12:55:52.396Z level=DEBUG source=runner.go:40 msg="GPU bootstrap discovery took" duration=123.565523ms

time=2025-11-14T12:55:52.396Z level=INFO source=types.go:42 msg="inference compute" id=8680b346-0c00-0000-0002-000000000000 filter_id="" library=Vulkan compute=0.0 name=Vulkan0 description="Intel(R) Graphics (ADL GT2)" libdirs=ollama,vulkan driver=0.0 pci_id=0000:00:02.0 type=iGPU total="15.4 GiB" available="13.5 GiB"

time=2025-11-14T12:55:52.396Z level=INFO source=routes.go:1638 msg="entering low vram mode" "total vram"="15.4 GiB" threshold="20.0 GiB"

I ssh'ed into the docker container and I ran vulkaninfo (which I had to install) to be sure it all looked well, and to me it does:

Vulkan Instance Version: 1.3.275


Instance Extensions: count = 24
-------------------------------
VK_EXT_acquire_drm_display             : extension revision 1
VK_EXT_acquire_xlib_display            : extension revision 1
VK_EXT_debug_report                    : extension revision 10
VK_EXT_debug_utils                     : extension revision 2
VK_EXT_direct_mode_display             : extension revision 1
VK_EXT_display_surface_counter         : extension revision 1
VK_EXT_headless_surface                : extension revision 1
VK_EXT_surface_maintenance1            : extension revision 1
VK_EXT_swapchain_colorspace            : extension revision 5
VK_KHR_device_group_creation           : extension revision 1
VK_KHR_display                         : extension revision 23
VK_KHR_external_fence_capabilities     : extension revision 1
VK_KHR_external_memory_capabilities    : extension revision 1
VK_KHR_external_semaphore_capabilities : extension revision 1
VK_KHR_get_display_properties2         : extension revision 1
VK_KHR_get_physical_device_properties2 : extension revision 2
VK_KHR_get_surface_capabilities2       : extension revision 1
VK_KHR_portability_enumeration         : extension revision 1
VK_KHR_surface                         : extension revision 25
VK_KHR_surface_protected_capabilities  : extension revision 1
VK_KHR_wayland_surface                 : extension revision 6
VK_KHR_xcb_surface                     : extension revision 6
VK_KHR_xlib_surface                    : extension revision 6
VK_LUNARG_direct_driver_loading        : extension revision 1

Instance Layers: count = 3
--------------------------
VK_LAYER_INTEL_nullhw       INTEL NULL HW                1.1.73   version 1
VK_LAYER_MESA_device_select Linux device selection layer 1.4.303  version 1
VK_LAYER_MESA_overlay       Mesa Overlay layer           1.4.303  version 1

Devices:
========
GPU0:
        apiVersion         = 1.4.305
        driverVersion      = 25.0.7
        vendorID           = 0x8086
        deviceID           = 0x46b3
        deviceType         = PHYSICAL_DEVICE_TYPE_INTEGRATED_GPU
        deviceName         = Intel(R) Graphics (ADL GT2)
        driverID           = DRIVER_ID_INTEL_OPEN_SOURCE_MESA
        driverName         = Intel open-source Mesa driver
        driverInfo         = Mesa 25.0.7-0ubuntu0.24.04.2
        conformanceVersion = 1.4.0.0
        deviceUUID         = 8680b346-0c00-0000-0002-000000000000
        driverUUID         = 802b0057-40c2-aed9-e538-d78b797f04f4
GPU1:
        apiVersion         = 1.4.305
        driverVersion      = 0.0.1
        vendorID           = 0x10005
        deviceID           = 0x0000
        deviceType         = PHYSICAL_DEVICE_TYPE_CPU
        deviceName         = llvmpipe (LLVM 20.1.2, 256 bits)
        driverID           = DRIVER_ID_MESA_LLVMPIPE
        driverName         = llvmpipe
        driverInfo         = Mesa 25.0.7-0ubuntu0.24.04.2 (LLVM 20.1.2)
        conformanceVersion = 1.3.1.1
        deviceUUID         = 6d657361-3235-2e30-2e37-2d3075627500
        driverUUID         = 6c6c766d-7069-7065-5555-494400000000

Is this a know issue? Any other logs that could help track this?

OS

Docker

GPU

Intel

CPU

Intel

Ollama version

v0.12.11

Originally created by @cibernox on GitHub (Nov 14, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/13086 ### What is the issue? I have a proxmox server in which I have an intel 12th gen i3 and a RTX 3060. In that server I have an LXC where I run portainer with and I passthrough both the nvidia and the intel iGPU to the LXC container. I've been running ollama with cuda on the nvidia card for months flawlessly. Now that ollama supports vulkan I wanted to also run a second ollama container with vulkan using the iGPU for smaller models. When I run that second instance without vulkan (meaning it runs on the CPU) models work fine. When I enable vulkan I see no errors anywhere and the models work but they output absolute garbage. I'm using qwen3:4B in Q4_K_M for my tests. For completion, this is the config entry on my docker-compose, where I added a few flags just to be extra sure ollama was using the right GPU, but I think it was using it even without those: ```yml ollama-vulkan: image: ollama/ollama container_name: ollama-vulkan restart: unless-stopped ports: - "11436:11434" volumes: - ./models:/root/.ollama # Same volume as the Cuda-based ollama, save HDD space. devices: - /dev/dri:/dev/dri environment: - OLLAMA_DEBUG=1 - OLLAMA_VULKAN=1 - GGML_VK_VISIBLE_DEVICES=0 # Force GPU0 (Intel) - GGML_VULKAN_DEVICE=0 # Alternative variable - VK_DEVICE_SELECT_FORCE_DEFAULT_DEVICE=1 # Force first physical GPU ``` ### Relevant log output Nothing on the ollama server logs seems particularly odd and ollama seems to detect the `"Intel(R) Graphics (ADL GT2)"` iGPU. There are some mentions of cuda but I assume those libraries are not used when vulkan is enabled. ```shell time=2025-11-14T12:55:52.272Z level=INFO source=routes.go:1544 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES:0 GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:DEBUG OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false OLLAMA_VULKAN:true ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]" time=2025-11-14T12:55:52.272Z level=INFO source=images.go:522 msg="total blobs: 13" time=2025-11-14T12:55:52.273Z level=INFO source=images.go:529 msg="total unused blobs removed: 0" time=2025-11-14T12:55:52.273Z level=INFO source=routes.go:1597 msg="Listening on [::]:11434 (version 0.12.11)" time=2025-11-14T12:55:52.273Z level=DEBUG source=sched.go:120 msg="starting llm scheduler" time=2025-11-14T12:55:52.273Z level=INFO source=runner.go:67 msg="discovering available GPUs..." time=2025-11-14T12:55:52.273Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 38321" time=2025-11-14T12:55:52.273Z level=DEBUG source=server.go:393 msg=subprocess PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin OLLAMA_DEBUG=1 OLLAMA_VULKAN=1 GGML_VK_VISIBLE_DEVICES=0 GGML_VULKAN_DEVICE=0 LD_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/cuda_v13:/usr/local/nvidia/lib:/usr/local/nvidia/lib64 OLLAMA_HOST=0.0.0.0:11434 OLLAMA_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/cuda_v13 time=2025-11-14T12:55:52.305Z level=DEBUG source=runner.go:418 msg="bootstrap discovery took" duration=31.776715ms OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/cuda_v13]" extra_envs=map[] time=2025-11-14T12:55:52.305Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 35489" time=2025-11-14T12:55:52.305Z level=DEBUG source=server.go:393 msg=subprocess PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin OLLAMA_DEBUG=1 OLLAMA_VULKAN=1 GGML_VK_VISIBLE_DEVICES=0 GGML_VULKAN_DEVICE=0 LD_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/vulkan:/usr/local/nvidia/lib:/usr/local/nvidia/lib64 OLLAMA_HOST=0.0.0.0:11434 OLLAMA_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/vulkan time=2025-11-14T12:55:52.379Z level=DEBUG source=runner.go:418 msg="bootstrap discovery took" duration=73.997807ms OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/vulkan]" extra_envs=map[] time=2025-11-14T12:55:52.379Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 40445" time=2025-11-14T12:55:52.379Z level=DEBUG source=server.go:393 msg=subprocess PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin OLLAMA_DEBUG=1 OLLAMA_VULKAN=1 GGML_VK_VISIBLE_DEVICES=0 GGML_VULKAN_DEVICE=0 LD_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/cuda_v12:/usr/local/nvidia/lib:/usr/local/nvidia/lib64 OLLAMA_HOST=0.0.0.0:11434 OLLAMA_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/cuda_v12 time=2025-11-14T12:55:52.396Z level=DEBUG source=runner.go:418 msg="bootstrap discovery took" duration=17.406299ms OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/cuda_v12]" extra_envs=map[] time=2025-11-14T12:55:52.396Z level=DEBUG source=runner.go:116 msg="evluating which if any devices to filter out" initial_count=1 time=2025-11-14T12:55:52.396Z level=DEBUG source=runner.go:40 msg="GPU bootstrap discovery took" duration=123.565523ms time=2025-11-14T12:55:52.396Z level=INFO source=types.go:42 msg="inference compute" id=8680b346-0c00-0000-0002-000000000000 filter_id="" library=Vulkan compute=0.0 name=Vulkan0 description="Intel(R) Graphics (ADL GT2)" libdirs=ollama,vulkan driver=0.0 pci_id=0000:00:02.0 type=iGPU total="15.4 GiB" available="13.5 GiB" time=2025-11-14T12:55:52.396Z level=INFO source=routes.go:1638 msg="entering low vram mode" "total vram"="15.4 GiB" threshold="20.0 GiB" ``` I ssh'ed into the docker container and I ran `vulkaninfo` (which I had to install) to be sure it all looked well, and to me it does: ``` Vulkan Instance Version: 1.3.275 Instance Extensions: count = 24 ------------------------------- VK_EXT_acquire_drm_display : extension revision 1 VK_EXT_acquire_xlib_display : extension revision 1 VK_EXT_debug_report : extension revision 10 VK_EXT_debug_utils : extension revision 2 VK_EXT_direct_mode_display : extension revision 1 VK_EXT_display_surface_counter : extension revision 1 VK_EXT_headless_surface : extension revision 1 VK_EXT_surface_maintenance1 : extension revision 1 VK_EXT_swapchain_colorspace : extension revision 5 VK_KHR_device_group_creation : extension revision 1 VK_KHR_display : extension revision 23 VK_KHR_external_fence_capabilities : extension revision 1 VK_KHR_external_memory_capabilities : extension revision 1 VK_KHR_external_semaphore_capabilities : extension revision 1 VK_KHR_get_display_properties2 : extension revision 1 VK_KHR_get_physical_device_properties2 : extension revision 2 VK_KHR_get_surface_capabilities2 : extension revision 1 VK_KHR_portability_enumeration : extension revision 1 VK_KHR_surface : extension revision 25 VK_KHR_surface_protected_capabilities : extension revision 1 VK_KHR_wayland_surface : extension revision 6 VK_KHR_xcb_surface : extension revision 6 VK_KHR_xlib_surface : extension revision 6 VK_LUNARG_direct_driver_loading : extension revision 1 Instance Layers: count = 3 -------------------------- VK_LAYER_INTEL_nullhw INTEL NULL HW 1.1.73 version 1 VK_LAYER_MESA_device_select Linux device selection layer 1.4.303 version 1 VK_LAYER_MESA_overlay Mesa Overlay layer 1.4.303 version 1 Devices: ======== GPU0: apiVersion = 1.4.305 driverVersion = 25.0.7 vendorID = 0x8086 deviceID = 0x46b3 deviceType = PHYSICAL_DEVICE_TYPE_INTEGRATED_GPU deviceName = Intel(R) Graphics (ADL GT2) driverID = DRIVER_ID_INTEL_OPEN_SOURCE_MESA driverName = Intel open-source Mesa driver driverInfo = Mesa 25.0.7-0ubuntu0.24.04.2 conformanceVersion = 1.4.0.0 deviceUUID = 8680b346-0c00-0000-0002-000000000000 driverUUID = 802b0057-40c2-aed9-e538-d78b797f04f4 GPU1: apiVersion = 1.4.305 driverVersion = 0.0.1 vendorID = 0x10005 deviceID = 0x0000 deviceType = PHYSICAL_DEVICE_TYPE_CPU deviceName = llvmpipe (LLVM 20.1.2, 256 bits) driverID = DRIVER_ID_MESA_LLVMPIPE driverName = llvmpipe driverInfo = Mesa 25.0.7-0ubuntu0.24.04.2 (LLVM 20.1.2) conformanceVersion = 1.3.1.1 deviceUUID = 6d657361-3235-2e30-2e37-2d3075627500 driverUUID = 6c6c766d-7069-7065-5555-494400000000 ``` Is this a know issue? Any other logs that could help track this? ### OS Docker ### GPU Intel ### CPU Intel ### Ollama version v0.12.11

GiteaMirror added the vulkan bug labels 2026-04-12 21:25:24 -05:00

GiteaMirror closed this issue

2026-04-12 21:25:26 -05:00

GiteaMirror commented

2026-04-12 21:25:28 -05:00

@umutd3401 commented on GitHub (Nov 15, 2025):

See https://github.com/ggml-org/llama.cpp/issues/17056 and https://github.com/ggml-org/llama.cpp/issues/17106

I tested with Intel(R) Iris(R) Xe Graphics (TGL GT2), enabling GGML_VK_DISABLE_INTEGER_DOT_PRODUCT makes it work correctly. I don't know about the downsides.

@umutd3401 commented on GitHub (Nov 15, 2025): See https://github.com/ggml-org/llama.cpp/issues/17056 and https://github.com/ggml-org/llama.cpp/issues/17106 I tested with Intel(R) Iris(R) Xe Graphics (TGL GT2), enabling `GGML_VK_DISABLE_INTEGER_DOT_PRODUCT` makes it work correctly. I don't know about the downsides.

GiteaMirror commented

2026-04-12 21:25:29 -05:00

@cibernox commented on GitHub (Nov 15, 2025):

@umutd3401 thanks! That was indeed the same error. That flag fixed it.
Performance is still quite disappointing. It's essentially the same as using pure CPU (+5% in my tests). It does use 30% less power than using the CPU and leaves all the cores free for other stuff, so it's still a win on that regard.

@cibernox commented on GitHub (Nov 15, 2025): @umutd3401 thanks! That was indeed the same error. That flag fixed it. Performance is still quite disappointing. It's essentially the same as using pure CPU (+5% in my tests). It does use 30% less power than using the CPU and leaves all the cores free for other stuff, so it's still a win on that regard.

GiteaMirror commented

2026-04-12 21:25:31 -05:00

@cipriancraciun commented on GitHub (Nov 20, 2025):

I can confirm that using an Intel Core i7 12700 iGPU, with Ollama v0.13.0, and with Gemma3 4b and 12b, outputs gibberish, and by uisng GGML_VK_DISABLE_INTEGER_DOT_PRODUCT=1 solves the issue.

The (truncated) output of vulkaninfo is:

VK_LAYER_MESA_device_select (Linux device selection layer) Vulkan version 1.3.211, layer version 1:
    Layer Extensions: count = 0
    Devices: count = 1
        GPU id = 0 (Intel(R) UHD Graphics 770 (ADL-S GT1))
        Layer-Device Extensions: count = 0

I can also confirm @cibernox observation:

It's essentially the same as using pure CPU (+5% in my tests). It does use 30% less power than using the CPU a[...]

In my case (on the hardware described above) the figures are:

as compared to "idle", using iGPU with Vulkan, it uses 21 W more power;
as compared to "idle", using CPU (8 cores), it uses 77 W more power;
thus, the CPU uses 3.6 times more power than iGPU;
(these figures are taken based on the UPS readings while Ollama is running;)

However, the performance in tokens per second is quite large:

with iGPU with Vulkan, Gemma3 12b, yields 2.48 tps;
with CPU (8 cores), Gemma3 12b, yields 4.58 tps;
thus, the CPU yields 1.846 times more tps than the iGPU;

Computing tokens per second per wat, the figures look like this:

with iGPU, 0.118 tps per wat;
with CPU, 0.059 tps per wat;
thus, the iGPU is about 1.98 times more efficient in terms of power consumption;

I hope these figures help other choosing between "speed" (which with pure CPU it isn't) or "power efficiency" (which given the difference isn't that astonishing), but as the other user pointed out, at least you are keeping the CPU available for other tasks.

@cipriancraciun commented on GitHub (Nov 20, 2025): I can confirm that using an Intel Core i7 12700 iGPU, with Ollama v0.13.0, and with Gemma3 4b and 12b, outputs gibberish, and by uisng `GGML_VK_DISABLE_INTEGER_DOT_PRODUCT=1` solves the issue. The (truncated) output of `vulkaninfo` is: ~~~~ VK_LAYER_MESA_device_select (Linux device selection layer) Vulkan version 1.3.211, layer version 1: Layer Extensions: count = 0 Devices: count = 1 GPU id = 0 (Intel(R) UHD Graphics 770 (ADL-S GT1)) Layer-Device Extensions: count = 0 ~~~~ ---- I can also confirm @cibernox observation: > It's essentially the same as using pure CPU (+5% in my tests). It does use 30% less power than using the CPU a[...] In my case (on the hardware described above) the figures are: * as compared to "idle", using iGPU with Vulkan, it uses 21 W more power; * as compared to "idle", using CPU (8 cores), it uses 77 W more power; * thus, the CPU uses 3.6 times more power than iGPU; * (these figures are taken based on the UPS readings while Ollama is running;) However, the performance in tokens per second is quite large: * with iGPU with Vulkan, Gemma3 12b, yields 2.48 tps; * with CPU (8 cores), Gemma3 12b, yields 4.58 tps; * thus, the CPU yields 1.846 times more tps than the iGPU; Computing tokens per second per wat, the figures look like this: * with iGPU, 0.118 tps per wat; * with CPU, 0.059 tps per wat; * thus, the iGPU is about 1.98 times more efficient in terms of power consumption; I hope these figures help other choosing between "speed" (which with pure CPU it isn't) or "power efficiency" (which given the difference isn't that astonishing), but as the other user pointed out, at least you are keeping the CPU available for other tasks.

GiteaMirror commented

2026-04-12 21:25:31 -05:00

@cibernox commented on GitHub (Nov 20, 2025):

@cipriancraciun I guess that you iGPU are almost identical in performance to mine, while your CPU, being an i7 instead of my i3 it's significantly more capable.

It's also possible that I'm being limited by the DDR4 memory bandwidth here.

@cibernox commented on GitHub (Nov 20, 2025): @cipriancraciun I guess that you iGPU are almost identical in performance to mine, while your CPU, being an i7 instead of my i3 it's significantly more capable. It's also possible that I'm being limited by the DDR4 memory bandwidth here.

GiteaMirror commented

2026-04-12 21:25:32 -05:00

@PaulEins commented on GitHub (Mar 22, 2026):

hi, i have this problem too. i cant install Ollama on my uGreen 4800 plus and config with blinko note from gitHub. Ollama Modell dont work on my ahrdware. Cloude say, you need to fix this software bug. please can you make this? are you working on this? i have to wait? ok?

@PaulEins commented on GitHub (Mar 22, 2026): hi, i have this problem too. i cant install Ollama on my uGreen 4800 plus and config with blinko note from gitHub. Ollama Modell dont work on my ahrdware. Cloude say, you need to fix this software bug. please can you make this? are you working on this? i have to wait? ok?

GiteaMirror commented

2026-04-12 21:25:33 -05:00

@jmonroynieto commented on GitHub (Apr 9, 2026):

same issue here.

@jmonroynieto commented on GitHub (Apr 9, 2026): same issue here.

GiteaMirror referenced this issue

2026-04-13 00:08:36 -05:00

[PR #8662] [CLOSED] Update README.md Adding DeepSeek to the table of models #12741

GiteaMirror referenced this issue

2026-04-16 06:22:04 -05:00

[PR #8662] [CLOSED] Update README.md Adding DeepSeek to the table of models #18012

GiteaMirror referenced this issue

2026-04-19 16:53:20 -05:00

[PR #8662] [CLOSED] Update README.md Adding DeepSeek to the table of models #23281

GiteaMirror referenced this issue

2026-04-22 23:17:12 -05:00

[PR #8662] [CLOSED] Update README.md Adding DeepSeek to the table of models #38614

GiteaMirror referenced this issue

2026-04-24 23:32:44 -05:00

[PR #8662] [CLOSED] Update README.md Adding DeepSeek to the table of models #43989

GiteaMirror referenced this issue

2026-04-29 14:22:29 -05:00

[PR #8662] [CLOSED] Update README.md Adding DeepSeek to the table of models #59438

GiteaMirror referenced this issue

2026-05-05 07:23:12 -05:00

[PR #8662] [CLOSED] Update README.md Adding DeepSeek to the table of models #75035

Sign in to join this conversation.

Branches Tags

main

dhiltgen/ci

dhiltgen/llama-runner

parth-launch-codex-app

hoyyeva/anthropic-local-image-path

hoyyeva/anthropic-reference-images-path

parth-anthropic-reference-images-path

brucemacd/download-before-remove

hoyyeva/editor-config-repair

parth-mlx-decode-checkpoints

hoyyeva/fix-codex-model-metadata-warning

hoyyeva/qwen

parth/hide-claude-desktop-till-release

hoyyeva/opencode-image-modality

parth-add-claude-code-autoinstall

release_v0.22.0

pdevine/manifest-list

codex/fix-codex-model-metadata-warning

pdevine/addressable-manifest

brucemacd/launch-fetch-reccomended

jmorganca/llama-compat

launch-copilot-cli

hoyyeva/opencode-thinking

release_v0.20.7

parth-auto-save-backup

parth-test

jmorganca/gemma4-audio-replacements

fix-manifest-digest-on-pull

hoyyeva/vscode-improve

brucemacd/install-server-wait

parth/update-claude-docs

brucemac/start-ap-install

pdevine/mlx-update

pdevine/qwen35_vision

drifkin/api-show-fallback

mintlify/image-generation-1773352582

hoyyeva/server-context-length-local-config

jmorganca/faster-reptition-penalties

jmorganca/convert-nemotron

parth-pi-thinking

pdevine/sampling-penalties

jmorganca/fix-create-quantization-memory

dongchen/resumable_transfer_fix

pdevine/sampling-cache-error

jessegross/mlx-usage

hoyyeva/openclaw-config

hoyyeva/app-html

pdevine/qwen3next

brucemacd/sign-sh-install

brucemacd/tui-update

brucemacd/usage-api

jmorganca/launch-empty

fix-app-dist-embed

mxyng/mlx-compile

mxyng/mlx-quant

mxyng/mlx-glm4.7

mxyng/mlx

brucemacd/simplify-model-picker

jmorganca/qwen3-concurrent

fix-glm-4.7-flash-mla-config

drifkin/qwen3-coder-opening-tag

brucemacd/usage-cli

fix-cuda12-fattn-shmem

ollama-imagegen-docs

parth/fix-multiline-inputs

brucemacd/config-docs

mxyng/model-files

mxyng/simple-execute

fix-imagegen-ollama-models

mxyng/async-upload

jmorganca/lazy-no-dtype-changes

imagegen-auto-detect-create

parth/decrease-concurrent-download-hf

fix-mlx-quantize-init

jmorganca/x-cleanup

usage

imagegen-readme

jmorganca/glm-image

mlx-gpu-cd

jmorganca/imagegen-modelfile

parth/agent-skills

parth/agent-allowlist

parth/signed-in-offline

parth/agents

parth/fix-context-chopping

improve-cloud-flow

parth/add-models-websearch

parth/prompt-renderer-mcp

jmorganca/native-settings

jmorganca/download-stream-hash

jmorganca/client2-rebased

brucemacd/oai-chat-req-multipart

jessegross/multi_chunk_reserve

grace/additional-omit-empty

grace/mistral-3-large

mxyng/tokenizer2

mxyng/tokenizer

jessegross/flash

hoyyeva/windows-nacked-app

mxyng/cleanup-attention

grace/deepseek-parser

hoyyeva/remember-unsent-prompt

parth/add-lfs-pointer-error-conversion

parth/olmo2-test2

hoyyeva/ollama-launchagent-plist

nicole/olmo-model

parth/olmo-test

mxyng/remove-embedded

parth/render-template

jmorganca/intellect-3

parth/remove-prealloc-linter

jmorganca/cmd-eval

nicole/nomic-embed-text-fix

mxyng/lint-2

hoyyeva/add-gemini-3-pro-preview

hoyyeva/load-model-list

mxyng/expand-path

mxyng/environ-2

hoyyeva/deeplink-json-encoding

parth/improve-tool-calling-tests

hoyyeva/conversation

hoyyeva/assistant-edit-response

hoyyeva/thinking

origin/brucemacd/invalid-char-i-err

parth/improve-tool-calling

jmorganca/required-omitempty

grace/qwen3-vl-tests

mxyng/iter-client

parth/docs-readme

nicole/embed-test

pdevine/integration-benchstat

parth/remove-generate-cmd

parth/add-toolcall-id

mxyng/server-tests

jmorganca/glm-4.6

jmorganca/gin-h-compat

drifkin/stable-tool-args

pdevine/qwen3-more-thinking

parth/add-websearch-client

nicole/websearch_local

jmorganca/qwen3-coder-updates

grace/deepseek-v3-migration-tests

mxyng/fix-create

jmorganca/cloud-errors

pdevine/parser-tidy

revert-12233-parth/simplify-entrypoints-runner

parth/enable-so-gpt-oss

brucemacd/qwen3vl

jmorganca/readme-simplify

parth/gpt-oss-structured-outputs

revert-12039-jmorganca/tools-braces

mxyng/embeddings

mxyng/gguf

mxyng/benchmark

mxyng/types-null

parth/move-parsing

mxyng/gemma2

jmorganca/docs

mxyng/16-bit

mxyng/create-stdin

pdevine/authorizedkeys

mxyng/quant

parth/opt-in-error-context-window

brucemacd/cache-models

brucemacd/runner-completion

jmorganca/llama-update-6

brucemacd/benchmark-list

brucemacd/partial-read-caps

parth/deepseek-r1-tools

mxyng/omit-array

parth/tool-prefix-temp

brucemacd/runner-test

jmorganca/qwen25vl

brucemacd/model-forward-test-ext

parth/python-function-parsing

jmorganca/cuda-compression-none

drifkin/num-parallel

drifkin/chat-truncation-fix

jmorganca/sync

parth/python-tools-calling

drifkin/array-head-count

brucemacd/create-no-loop

parth/server-enable-content-stream-with-tools

qwen25omni

mxyng/v3

brucemacd/ropeconfig

jmorganca/silence-tokenizer

parth/sample-so-test

parth/sampling-structured-outputs

brucemacd/doc-go-engine

parth/constrained-sampling-json

jmorganca/mistral-wip

brucemacd/mistral-small-convert

parth/sample-unmarshal-json-for-params

brucemacd/jomorganca/mistral

pdevine/bfloat16

jmorganca/mistral

brucemacd/mistral

pdevine/logging

parth/sample-correctness-fix

parth/sample-fix-sorting

jmorgan/sample-fix-sorting-extras

jmorganca/temp-0-images

brucemacd/parallel-embed-models

brucemacd/shim-grammar

jmorganca/fix-gguf-error

bmizerany/nameswork

jmorganca/faster-releases

bmizerany/validatenames

brucemacd/err-no-vocab

brucemacd/rope-config

brucemacd/err-hint

brucemacd/qwen2_5

brucemacd/logprobs

brucemacd/new_runner_graph_bench

progress-flicker

brucemacd/forward-test

brucemacd/go_qwen2

pdevine/gemma2

jmorganca/add-missing-symlink-eval

mxyng/next-debug

parth/set-context-size-openai

brucemacd/next-bpe-bench

brucemacd/next-bpe-test

brucemacd/new_runner_e2e

brucemacd/new_runner_qwen2

pdevine/convert-cohere2

brucemacd/convert-cli

parth/log-probs

mxyng/next-mlx

mxyng/cmd-history

parth/templating

parth/tokenize-detokenize

brucemacd/check-key-register

bmizerany/grammar

jmorganca/vendor-081b29bd

mxyng/func-checks

jmorganca/fix-null-format

parth/fix-default-to-warn-json

jmorganca/qwen2vl

jmorganca/no-concat

parth/cmd-cleanup-SO

brucemacd/check-key-register-structured-err

parth/openai-stream-usage

parth/fix-referencing-so

stream-tools-stop

jmorganca/degin-1

brucemacd/install-path-clean

brucemacd/push-name-validation

brucemacd/browser-key-register

jmorganca/openai-fix-first-message

jmorganca/fix-proxy

jessegross/sample

parth/disallow-streaming-tools

dhiltgen/remove_submodule

jmorganca/ga

jmorganca/mllama

pdevine/newlines

pdevine/geems-2b

jmorganca/llama-bump

mxyng/modelname-7

mxyng/gin-slog

mxyng/modelname-6

jyan/convert-prog

jyan/quant5

paligemma-support

pdevine/import-docs

jmorganca/openai-context

jyan/paligemma

jyan/p2

jyan/palitest

bmizerany/embedspeedup

jmorganca/llama-vit

brucemacd/allow-ollama

royh/ep-methods

royh/whisper

mxyng/api-models

mxyng/fix-memory

jyan/q4_4/8

jyan/ollama-v

royh/stream-tools

roy-embed-parallel

bmizerany/hrm

revert-5963-revert-5924-mxyng/llama3.1-rope

royh/embed-viz

jyan/local2

jyan/auth

jyan/local

jyan/parse-temp

jmorganca/template-mistral

jyan/reord-g

royh-openai-suffixdocs

royh-imgembed

royh-embed-parallel

jyan/quant4

royh-precision

jyan/progress

pdevine/fix-template

jyan/quant3

pdevine/ggla

mxyng/update-registry-domain

jmorganca/ggml-static

mxyng/create-context

jyan/v0.146

mxyng/layers-from-files

build_dist

bmizerany/noseek

royh-ls

royh-name

timeout

mxyng/server-timestamp

bmizerany/nosillyggufslurps

royh-params

jmorganca/llama-cpp-7c26775

royh-openai-delete

royh-show-rigid

jmorganca/enable-fa

jmorganca/no-error-template

jyan/format

royh-testdelete

bmizerany/fastverify

language_support

pdevine/ps-glitches

brucemacd/tokenize

bruce/iq-quants

bmizerany/filepathwithcoloninhost

mxyng/split-bin

bmizerany/client-registry

jmorganca/if-none-match

native

jmorganca/native

jmorganca/batch-embeddings

jmorganca/initcmake

jmorganca/mm

pdevine/showggmlinfo

modenameenforcealphanum

bmizerany/modenameenforcealphanum

jmorganca/done-reason

jmorganca/llama-cpp-8960fe8

ollama.com

bmizerany/filepathnobuild

bmizerany/types/model/defaultfix

rmdisplaylong

nogogen

bmizerany/x

modelfile-readme

bmizerany/replacecolon

jmorganca/limit

jmorganca/execstack

jmorganca/replace-assets

mxyng/tune-concurrency

jmorganca/testing

whitespace-detection

jmorganca/options

upgrade-all

scratch

cuda-search

mattw/airenamer

mattw/allmodelsonhuggingface

mattw/quantcontext

mattw/whatneedstorun

brucemacd/llama-mem-calc

mattw/faq-context

mattw/communitylinks

mattw/noprune

mattw/python-functioncalling

rename

mxyng/install

pulse

remove-first

editor

mattw/selfqueryingretrieval

cgo

mattw/howtoquant

api

matt/streamingapi

format-config

mxyng/extra-args

shell

update-nous-hermes

cp-model

upload-progress

fix-unknown-model

fix-model-names

delete-fix

insecure-registry

ls

deletemodels

progressbar

readme-updates

license-layers

skip-list

list-models

modelpath

matt/examplemodelfiles

distribution

go-opts

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/ollama#8662