[GH-ISSUE #13130] Documentation about Vulkan enablement in docker for Intel iGPU/GPU missing? #8691

Open
opened 2026-04-12 21:27:44 -05:00 by GiteaMirror · 8 comments
Owner

Originally created by @lukaszsobala on GitHub (Nov 18, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/13130

Originally assigned to: @dhiltgen on GitHub.

What is the issue?

Hello,
I enabled Vulkan processing for an Intel iGPU (140V). The command in the documentation says that for AMD GPUs you should run:

docker run -d --device /dev/kfd --device /dev/dri -v ollama:/root/.ollama -p 11434:11434 -e OLLAMA_VULKAN=1 --name ollama ollama/ollama

but /dev/kfd does not exist for this iGPU and it can be omitted without harm:

docker run -d --device /dev/dri -v ollama:/root/.ollama -p 11434:11434 -e OLLAMA_VULKAN=1 --name ollama ollama/ollama

and Vulkan runs OK.

Relevant log output


OS

Linux

GPU

Intel

CPU

Intel

Ollama version

0.12.11

Originally created by @lukaszsobala on GitHub (Nov 18, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/13130 Originally assigned to: @dhiltgen on GitHub. ### What is the issue? Hello, I enabled Vulkan processing for an Intel iGPU (140V). The command in the [documentation](https://docs.ollama.com/docker#vulkan-support) says that for AMD GPUs you should run: ``` docker run -d --device /dev/kfd --device /dev/dri -v ollama:/root/.ollama -p 11434:11434 -e OLLAMA_VULKAN=1 --name ollama ollama/ollama ``` but `/dev/kfd` does not exist for this iGPU and it can be omitted without harm: ``` docker run -d --device /dev/dri -v ollama:/root/.ollama -p 11434:11434 -e OLLAMA_VULKAN=1 --name ollama ollama/ollama ``` and Vulkan runs OK. ### Relevant log output ```shell ``` ### OS Linux ### GPU Intel ### CPU Intel ### Ollama version 0.12.11
GiteaMirror added the gpudockervulkanbug labels 2026-04-12 21:27:44 -05:00
Author
Owner

@rmeissn commented on GitHub (Nov 19, 2025):

I've also tried this on an Intel i7-1260P but with Vulkan enabled, both gemma3 and qwen3-vl produced garbage only, while running correctly on the CPU (no vulkan support).

podman run --rm --device /dev/dri -v ~/.ollama:/root/.ollama:z -p 11434:11434 -e OLLAMA_VULKAN=1 --name ollama ollama/ollama

System: Fedora 43, Framework 13, Intel Media Driver, vulkan, mesa-vulkan & OpenVino installed

Logs:

time=2025-11-20T09:03:24.603Z level=INFO source=routes.go:1544 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:INFO OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false OLLAMA_VULKAN:true ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
time=2025-11-20T09:03:24.604Z level=INFO source=images.go:522 msg="total blobs: 9"
time=2025-11-20T09:03:24.604Z level=INFO source=images.go:529 msg="total unused blobs removed: 0"
time=2025-11-20T09:03:24.604Z level=INFO source=routes.go:1597 msg="Listening on [::]:11434 (version 0.12.11)"
time=2025-11-20T09:03:24.604Z level=INFO source=runner.go:67 msg="discovering available GPUs..."
time=2025-11-20T09:03:24.605Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 39621"
time=2025-11-20T09:03:24.793Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 36389"
time=2025-11-20T09:03:24.815Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 37059"
time=2025-11-20T09:03:24.852Z level=INFO source=types.go:42 msg="inference compute" id=8680a646-0c00-0000-0002-000000000000 filter_id="" library=Vulkan compute=0.0 name=Vulkan0 description="Intel(R) Iris(R) Xe Graphics (ADL GT2)" libdirs=ollama,vulkan driver=0.0 pci_id=0000:00:02.0 type=iGPU total="15.5 GiB" available="14.0 GiB"
time=2025-11-20T09:03:24.852Z level=INFO source=routes.go:1638 msg="entering low vram mode" "total vram"="15.5 GiB" threshold="20.0 GiB"
[GIN] 2025/11/20 - 09:03:36 | 200 |     118.277µs |       127.0.0.1 | HEAD     "/"
[GIN] 2025/11/20 - 09:03:36 | 200 |   84.656707ms |       127.0.0.1 | POST     "/api/show"
[GIN] 2025/11/20 - 09:03:37 | 200 |   69.590799ms |       127.0.0.1 | POST     "/api/show"
time=2025-11-20T09:03:37.167Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 45303"
time=2025-11-20T09:03:37.321Z level=INFO source=server.go:209 msg="enabling flash attention"
time=2025-11-20T09:03:37.321Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --model /root/.ollama/models/blobs/sha256-7cd4618c1faf8b7233c6c906dac1694b6a47684b37b8895d470ac688520b9c01 --port 35477"
time=2025-11-20T09:03:37.321Z level=INFO source=sched.go:443 msg="system memory" total="31.0 GiB" free="25.7 GiB" free_swap="31.5 GiB"
time=2025-11-20T09:03:37.321Z level=INFO source=sched.go:450 msg="gpu memory" id=8680a646-0c00-0000-0002-000000000000 library=Vulkan available="13.5 GiB" free="14.0 GiB" minimum="457.0 MiB" overhead="0 B"
time=2025-11-20T09:03:37.321Z level=INFO source=server.go:702 msg="loading model" "model layers"=27 requested=-1
time=2025-11-20T09:03:37.331Z level=INFO source=runner.go:1398 msg="starting ollama engine"
time=2025-11-20T09:03:37.332Z level=INFO source=runner.go:1433 msg="Server listening on 127.0.0.1:35477"
time=2025-11-20T09:03:37.343Z level=INFO source=runner.go:1271 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:4 GPULayers:27[ID:8680a646-0c00-0000-0002-000000000000 Layers:27(0..26)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2025-11-20T09:03:37.391Z level=INFO source=ggml.go:136 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=340 num_key_values=32
load_backend: loaded CPU backend from /usr/lib/ollama/libggml-cpu-alderlake.so
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = Intel(R) Iris(R) Xe Graphics (ADL GT2) (Intel open-source Mesa driver) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 32 | shared memory: 65536 | int dot: 1 | matrix cores: none
load_backend: loaded Vulkan backend from /usr/lib/ollama/vulkan/libggml-vulkan.so
time=2025-11-20T09:03:37.409Z level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX_VNNI=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc)
ggml_backend_vk_get_device_memory called: uuid 8680a646-0c00-0000-0002-000000000000
ggml_backend_vk_get_device_memory called: luid 0x0000000000000000
time=2025-11-20T09:03:37.429Z level=INFO source=runner.go:1271 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:4 GPULayers:27[ID:8680a646-0c00-0000-0002-000000000000 Layers:27(0..26)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
ggml_backend_vk_get_device_memory called: uuid 8680a646-0c00-0000-0002-000000000000
ggml_backend_vk_get_device_memory called: luid 0x0000000000000000
time=2025-11-20T09:03:37.618Z level=INFO source=device.go:240 msg="model weights" device=Vulkan0 size="762.5 MiB"
time=2025-11-20T09:03:37.618Z level=INFO source=device.go:245 msg="model weights" device=CPU size="306.0 MiB"
time=2025-11-20T09:03:37.618Z level=INFO source=device.go:251 msg="kv cache" device=Vulkan0 size="38.0 MiB"
time=2025-11-20T09:03:37.618Z level=INFO source=device.go:262 msg="compute graph" device=Vulkan0 size="62.0 MiB"
time=2025-11-20T09:03:37.618Z level=INFO source=device.go:267 msg="compute graph" device=CPU size="2.2 MiB"
time=2025-11-20T09:03:37.618Z level=INFO source=device.go:272 msg="total memory" size="1.1 GiB"
time=2025-11-20T09:03:37.618Z level=INFO source=runner.go:1271 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:4 GPULayers:27[ID:8680a646-0c00-0000-0002-000000000000 Layers:27(0..26)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2025-11-20T09:03:37.618Z level=INFO source=ggml.go:482 msg="offloading 26 repeating layers to GPU"
time=2025-11-20T09:03:37.618Z level=INFO source=ggml.go:489 msg="offloading output layer to GPU"
time=2025-11-20T09:03:37.618Z level=INFO source=ggml.go:494 msg="offloaded 27/27 layers to GPU"
time=2025-11-20T09:03:37.618Z level=INFO source=sched.go:517 msg="loaded runners" count=1
time=2025-11-20T09:03:37.618Z level=INFO source=server.go:1294 msg="waiting for llama runner to start responding"
time=2025-11-20T09:03:37.619Z level=INFO source=server.go:1328 msg="waiting for server to become available" status="llm server loading model"
time=2025-11-20T09:03:41.391Z level=INFO source=server.go:1332 msg="llama runner started in 4.07 seconds"
[GIN] 2025/11/20 - 09:03:41 | 200 |  4.375345027s |       127.0.0.1 | POST     "/api/generate"
[GIN] 2025/11/20 - 09:03:48 | 200 |  1.553890921s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/11/20 - 09:03:57 | 200 |  1.390181221s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/11/20 - 09:04:10 | 200 |      20.399µs |       127.0.0.1 | HEAD     "/"
podman exec -it ollama ollama run gemma3:1b
>>> Hello
</strong></strong></strong></strong></strong></strong></strong></strong></strong></strong></strong></strong></strong></strong></strong></strong></strong></strong></strong></strong></strong></strong></strong</strong></strong></strong></strong></strong></strong></strong></strong></strong></strong></strong></strong></strong></strong></strong></strong></strong></strong></strong></strong></strong></strong></strong></strong></strong></strong></strong></strong></strong></strong></strong>

>>> Who are you?
 Development





 

 

 

 

 

 

 

 
 
 
 
 
 
 


>>>
<!-- gh-comment-id:3554601966 --> @rmeissn commented on GitHub (Nov 19, 2025): I've also tried this on an Intel i7-1260P but with Vulkan enabled, both gemma3 and qwen3-vl produced garbage only, while running correctly on the CPU (no vulkan support). `podman run --rm --device /dev/dri -v ~/.ollama:/root/.ollama:z -p 11434:11434 -e OLLAMA_VULKAN=1 --name ollama ollama/ollama` System: Fedora 43, Framework 13, Intel Media Driver, vulkan, mesa-vulkan & OpenVino installed Logs: ``` time=2025-11-20T09:03:24.603Z level=INFO source=routes.go:1544 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:INFO OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false OLLAMA_VULKAN:true ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]" time=2025-11-20T09:03:24.604Z level=INFO source=images.go:522 msg="total blobs: 9" time=2025-11-20T09:03:24.604Z level=INFO source=images.go:529 msg="total unused blobs removed: 0" time=2025-11-20T09:03:24.604Z level=INFO source=routes.go:1597 msg="Listening on [::]:11434 (version 0.12.11)" time=2025-11-20T09:03:24.604Z level=INFO source=runner.go:67 msg="discovering available GPUs..." time=2025-11-20T09:03:24.605Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 39621" time=2025-11-20T09:03:24.793Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 36389" time=2025-11-20T09:03:24.815Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 37059" time=2025-11-20T09:03:24.852Z level=INFO source=types.go:42 msg="inference compute" id=8680a646-0c00-0000-0002-000000000000 filter_id="" library=Vulkan compute=0.0 name=Vulkan0 description="Intel(R) Iris(R) Xe Graphics (ADL GT2)" libdirs=ollama,vulkan driver=0.0 pci_id=0000:00:02.0 type=iGPU total="15.5 GiB" available="14.0 GiB" time=2025-11-20T09:03:24.852Z level=INFO source=routes.go:1638 msg="entering low vram mode" "total vram"="15.5 GiB" threshold="20.0 GiB" [GIN] 2025/11/20 - 09:03:36 | 200 | 118.277µs | 127.0.0.1 | HEAD "/" [GIN] 2025/11/20 - 09:03:36 | 200 | 84.656707ms | 127.0.0.1 | POST "/api/show" [GIN] 2025/11/20 - 09:03:37 | 200 | 69.590799ms | 127.0.0.1 | POST "/api/show" time=2025-11-20T09:03:37.167Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 45303" time=2025-11-20T09:03:37.321Z level=INFO source=server.go:209 msg="enabling flash attention" time=2025-11-20T09:03:37.321Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --model /root/.ollama/models/blobs/sha256-7cd4618c1faf8b7233c6c906dac1694b6a47684b37b8895d470ac688520b9c01 --port 35477" time=2025-11-20T09:03:37.321Z level=INFO source=sched.go:443 msg="system memory" total="31.0 GiB" free="25.7 GiB" free_swap="31.5 GiB" time=2025-11-20T09:03:37.321Z level=INFO source=sched.go:450 msg="gpu memory" id=8680a646-0c00-0000-0002-000000000000 library=Vulkan available="13.5 GiB" free="14.0 GiB" minimum="457.0 MiB" overhead="0 B" time=2025-11-20T09:03:37.321Z level=INFO source=server.go:702 msg="loading model" "model layers"=27 requested=-1 time=2025-11-20T09:03:37.331Z level=INFO source=runner.go:1398 msg="starting ollama engine" time=2025-11-20T09:03:37.332Z level=INFO source=runner.go:1433 msg="Server listening on 127.0.0.1:35477" time=2025-11-20T09:03:37.343Z level=INFO source=runner.go:1271 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:4 GPULayers:27[ID:8680a646-0c00-0000-0002-000000000000 Layers:27(0..26)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2025-11-20T09:03:37.391Z level=INFO source=ggml.go:136 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=340 num_key_values=32 load_backend: loaded CPU backend from /usr/lib/ollama/libggml-cpu-alderlake.so ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = Intel(R) Iris(R) Xe Graphics (ADL GT2) (Intel open-source Mesa driver) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 32 | shared memory: 65536 | int dot: 1 | matrix cores: none load_backend: loaded Vulkan backend from /usr/lib/ollama/vulkan/libggml-vulkan.so time=2025-11-20T09:03:37.409Z level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX_VNNI=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc) ggml_backend_vk_get_device_memory called: uuid 8680a646-0c00-0000-0002-000000000000 ggml_backend_vk_get_device_memory called: luid 0x0000000000000000 time=2025-11-20T09:03:37.429Z level=INFO source=runner.go:1271 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:4 GPULayers:27[ID:8680a646-0c00-0000-0002-000000000000 Layers:27(0..26)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" ggml_backend_vk_get_device_memory called: uuid 8680a646-0c00-0000-0002-000000000000 ggml_backend_vk_get_device_memory called: luid 0x0000000000000000 time=2025-11-20T09:03:37.618Z level=INFO source=device.go:240 msg="model weights" device=Vulkan0 size="762.5 MiB" time=2025-11-20T09:03:37.618Z level=INFO source=device.go:245 msg="model weights" device=CPU size="306.0 MiB" time=2025-11-20T09:03:37.618Z level=INFO source=device.go:251 msg="kv cache" device=Vulkan0 size="38.0 MiB" time=2025-11-20T09:03:37.618Z level=INFO source=device.go:262 msg="compute graph" device=Vulkan0 size="62.0 MiB" time=2025-11-20T09:03:37.618Z level=INFO source=device.go:267 msg="compute graph" device=CPU size="2.2 MiB" time=2025-11-20T09:03:37.618Z level=INFO source=device.go:272 msg="total memory" size="1.1 GiB" time=2025-11-20T09:03:37.618Z level=INFO source=runner.go:1271 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:4 GPULayers:27[ID:8680a646-0c00-0000-0002-000000000000 Layers:27(0..26)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2025-11-20T09:03:37.618Z level=INFO source=ggml.go:482 msg="offloading 26 repeating layers to GPU" time=2025-11-20T09:03:37.618Z level=INFO source=ggml.go:489 msg="offloading output layer to GPU" time=2025-11-20T09:03:37.618Z level=INFO source=ggml.go:494 msg="offloaded 27/27 layers to GPU" time=2025-11-20T09:03:37.618Z level=INFO source=sched.go:517 msg="loaded runners" count=1 time=2025-11-20T09:03:37.618Z level=INFO source=server.go:1294 msg="waiting for llama runner to start responding" time=2025-11-20T09:03:37.619Z level=INFO source=server.go:1328 msg="waiting for server to become available" status="llm server loading model" time=2025-11-20T09:03:41.391Z level=INFO source=server.go:1332 msg="llama runner started in 4.07 seconds" [GIN] 2025/11/20 - 09:03:41 | 200 | 4.375345027s | 127.0.0.1 | POST "/api/generate" [GIN] 2025/11/20 - 09:03:48 | 200 | 1.553890921s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/11/20 - 09:03:57 | 200 | 1.390181221s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/11/20 - 09:04:10 | 200 | 20.399µs | 127.0.0.1 | HEAD "/" ``` ``` podman exec -it ollama ollama run gemma3:1b >>> Hello </strong></strong></strong></strong></strong></strong></strong></strong></strong></strong></strong></strong></strong></strong></strong></strong></strong></strong></strong></strong></strong></strong></strong</strong></strong></strong></strong></strong></strong></strong></strong></strong></strong></strong></strong></strong></strong></strong></strong></strong></strong></strong></strong></strong></strong></strong></strong></strong></strong></strong></strong></strong></strong></strong> >>> Who are you? Development >>> ```
Author
Owner

@lukaszsobala commented on GitHub (Nov 20, 2025):

Maybe some more context is necessary.

The full run command is:

docker run -d --device /dev/dri --cpuset-cpus="0-3" -v ollama:/root/.ollama -e OLLAMA_FLASH_ATTENTION=1 -e OLLAMA_VULKAN=1 -v /somefolder/fake_cpuinfo:/proc/cpuinfo -p 11434:11434 --name ollama --restart=unless-stopped ollama/ollama

as per this gist, where I show how to fake the CPU info, so that Ollama previously ran on the CPU properly. "somefolder" is obviously something else.
The operating system is Ubuntu 25.10. The kernel version is 6.17.0-6-generic. Intel OneAPI installed (probably irrelevant) from their repository.

@rmeissn is this intentional? -v ~/.ollama:/root/.ollama:z

vulkaninfo excerpt:

Vulkan Instance Version: 1.4.321


Instance Extensions: count = 24
===============================
        VK_EXT_acquire_drm_display             : extension revision 1
        VK_EXT_acquire_xlib_display            : extension revision 1
        VK_EXT_debug_report                    : extension revision 10
        VK_EXT_debug_utils                     : extension revision 2
        VK_EXT_direct_mode_display             : extension revision 1
        VK_EXT_display_surface_counter         : extension revision 1
        VK_EXT_headless_surface                : extension revision 1
        VK_EXT_surface_maintenance1            : extension revision 1
        VK_EXT_swapchain_colorspace            : extension revision 5
        VK_KHR_device_group_creation           : extension revision 1
        VK_KHR_display                         : extension revision 23
        VK_KHR_external_fence_capabilities     : extension revision 1
        VK_KHR_external_memory_capabilities    : extension revision 1
        VK_KHR_external_semaphore_capabilities : extension revision 1
        VK_KHR_get_display_properties2         : extension revision 1
        VK_KHR_get_physical_device_properties2 : extension revision 2
        VK_KHR_get_surface_capabilities2       : extension revision 1
        VK_KHR_portability_enumeration         : extension revision 1
        VK_KHR_surface                         : extension revision 25
        VK_KHR_surface_protected_capabilities  : extension revision 1
        VK_KHR_wayland_surface                 : extension revision 6
        VK_KHR_xcb_surface                     : extension revision 6
        VK_KHR_xlib_surface                    : extension revision 6
        VK_LUNARG_direct_driver_loading        : extension revision 1
<!-- gh-comment-id:3558326219 --> @lukaszsobala commented on GitHub (Nov 20, 2025): Maybe some more context is necessary. The full run command is: ``` docker run -d --device /dev/dri --cpuset-cpus="0-3" -v ollama:/root/.ollama -e OLLAMA_FLASH_ATTENTION=1 -e OLLAMA_VULKAN=1 -v /somefolder/fake_cpuinfo:/proc/cpuinfo -p 11434:11434 --name ollama --restart=unless-stopped ollama/ollama ``` as per [this gist](https://gist.github.com/lukaszsobala/9d985c4cd294dcaecd68328fcc068935), where I show how to fake the CPU info, so that Ollama previously ran on the CPU properly. "somefolder" is obviously something else. The operating system is Ubuntu 25.10. The kernel version is `6.17.0-6-generic`. Intel OneAPI installed (probably irrelevant) from their repository. @rmeissn is this intentional? `-v ~/.ollama:/root/.ollama:z` `vulkaninfo` excerpt: ``` Vulkan Instance Version: 1.4.321 Instance Extensions: count = 24 =============================== VK_EXT_acquire_drm_display : extension revision 1 VK_EXT_acquire_xlib_display : extension revision 1 VK_EXT_debug_report : extension revision 10 VK_EXT_debug_utils : extension revision 2 VK_EXT_direct_mode_display : extension revision 1 VK_EXT_display_surface_counter : extension revision 1 VK_EXT_headless_surface : extension revision 1 VK_EXT_surface_maintenance1 : extension revision 1 VK_EXT_swapchain_colorspace : extension revision 5 VK_KHR_device_group_creation : extension revision 1 VK_KHR_display : extension revision 23 VK_KHR_external_fence_capabilities : extension revision 1 VK_KHR_external_memory_capabilities : extension revision 1 VK_KHR_external_semaphore_capabilities : extension revision 1 VK_KHR_get_display_properties2 : extension revision 1 VK_KHR_get_physical_device_properties2 : extension revision 2 VK_KHR_get_surface_capabilities2 : extension revision 1 VK_KHR_portability_enumeration : extension revision 1 VK_KHR_surface : extension revision 25 VK_KHR_surface_protected_capabilities : extension revision 1 VK_KHR_wayland_surface : extension revision 6 VK_KHR_xcb_surface : extension revision 6 VK_KHR_xlib_surface : extension revision 6 VK_LUNARG_direct_driver_loading : extension revision 1 ```
Author
Owner

@rmeissn commented on GitHub (Nov 21, 2025):

@lu@lukaszsobala Yes, this is intentional. I'm saving the downloaded models in a specific place and the :z is required due to SELinux being activated.

I tried adding up your OLLAMA_FLASH_ATTENTION=1 parameter, but the models still produce garbage. I don't expect that restricting the CPUs in any way will help with my problem of the model producing garbage as output. But it might help in avoiding efficiency cores.

vulkaninfo excerpt on my system:

==========
VULKANINFO
==========

Vulkan Instance Version: 1.4.321


Instance Extensions: count = 24
===============================
        VK_EXT_acquire_drm_display             : extension revision 1
        VK_EXT_acquire_xlib_display            : extension revision 1
        VK_EXT_debug_report                    : extension revision 10
        VK_EXT_debug_utils                     : extension revision 2
        VK_EXT_direct_mode_display             : extension revision 1
        VK_EXT_display_surface_counter         : extension revision 1
        VK_EXT_headless_surface                : extension revision 1
        VK_EXT_surface_maintenance1            : extension revision 1
        VK_EXT_swapchain_colorspace            : extension revision 5
        VK_KHR_device_group_creation           : extension revision 1
        VK_KHR_display                         : extension revision 23
        VK_KHR_external_fence_capabilities     : extension revision 1
        VK_KHR_external_memory_capabilities    : extension revision 1
        VK_KHR_external_semaphore_capabilities : extension revision 1
        VK_KHR_get_display_properties2         : extension revision 1
        VK_KHR_get_physical_device_properties2 : extension revision 2
        VK_KHR_get_surface_capabilities2       : extension revision 1
        VK_KHR_portability_enumeration         : extension revision 1
        VK_KHR_surface                         : extension revision 25
        VK_KHR_surface_protected_capabilities  : extension revision 1
        VK_KHR_wayland_surface                 : extension revision 6
        VK_KHR_xcb_surface                     : extension revision 6
        VK_KHR_xlib_surface                    : extension revision 6
        VK_LUNARG_direct_driver_loading        : extension revision 1
<!-- gh-comment-id:3563012954 --> @rmeissn commented on GitHub (Nov 21, 2025): @lu@lukaszsobala Yes, this is intentional. I'm saving the downloaded models in a specific place and the `:z` is required due to SELinux being activated. I tried adding up your `OLLAMA_FLASH_ATTENTION=1` parameter, but the models still produce garbage. I don't expect that restricting the CPUs in any way will help with my problem of the model producing garbage as output. But it might help in avoiding efficiency cores. vulkaninfo excerpt on my system: ``` ========== VULKANINFO ========== Vulkan Instance Version: 1.4.321 Instance Extensions: count = 24 =============================== VK_EXT_acquire_drm_display : extension revision 1 VK_EXT_acquire_xlib_display : extension revision 1 VK_EXT_debug_report : extension revision 10 VK_EXT_debug_utils : extension revision 2 VK_EXT_direct_mode_display : extension revision 1 VK_EXT_display_surface_counter : extension revision 1 VK_EXT_headless_surface : extension revision 1 VK_EXT_surface_maintenance1 : extension revision 1 VK_EXT_swapchain_colorspace : extension revision 5 VK_KHR_device_group_creation : extension revision 1 VK_KHR_display : extension revision 23 VK_KHR_external_fence_capabilities : extension revision 1 VK_KHR_external_memory_capabilities : extension revision 1 VK_KHR_external_semaphore_capabilities : extension revision 1 VK_KHR_get_display_properties2 : extension revision 1 VK_KHR_get_physical_device_properties2 : extension revision 2 VK_KHR_get_surface_capabilities2 : extension revision 1 VK_KHR_portability_enumeration : extension revision 1 VK_KHR_surface : extension revision 25 VK_KHR_surface_protected_capabilities : extension revision 1 VK_KHR_wayland_surface : extension revision 6 VK_KHR_xcb_surface : extension revision 6 VK_KHR_xlib_surface : extension revision 6 VK_LUNARG_direct_driver_loading : extension revision 1 ```
Author
Owner

@retblast commented on GitHub (Nov 23, 2025):

@rmeissn you need GGML_VK_DISABLE_INTEGER_DOT_PRODUCT=1, relevant: https://github.com/ollama/ollama/issues/13086

<!-- gh-comment-id:3568250012 --> @retblast commented on GitHub (Nov 23, 2025): @rmeissn you need GGML_VK_DISABLE_INTEGER_DOT_PRODUCT=1, relevant: https://github.com/ollama/ollama/issues/13086
Author
Owner

@rmeissn commented on GitHub (Dec 11, 2025):

@rmeissn you need GGML_VK_DISABLE_INTEGER_DOT_PRODUCT=1, relevant: #13086

Using this as an environment variable, ollama successfully executes models.
Side-note: Unfortunately, vulkan is slower than CPU execution on my system (i7-1260p).

<!-- gh-comment-id:3641461623 --> @rmeissn commented on GitHub (Dec 11, 2025): > [@rmeissn](https://github.com/rmeissn) you need GGML_VK_DISABLE_INTEGER_DOT_PRODUCT=1, relevant: [#13086](https://github.com/ollama/ollama/issues/13086) Using this as an environment variable, ollama successfully executes models. Side-note: Unfortunately, vulkan is slower than CPU execution on my system (i7-1260p).
Author
Owner

@retblast commented on GitHub (Dec 12, 2025):

yeah. Vulkan performance for this on intel platforms is just uh... super bad. Relevant: https://www.phoronix.com/review/llama-cpp-vulkan-eoy2025

<!-- gh-comment-id:3647595630 --> @retblast commented on GitHub (Dec 12, 2025): yeah. Vulkan performance for this on intel platforms is just uh... super bad. Relevant: https://www.phoronix.com/review/llama-cpp-vulkan-eoy2025
Author
Owner

@lukaszsobala commented on GitHub (Dec 12, 2025):

yeah. Vulkan performance for this on intel platforms is just uh... super bad.

I do not agree. The Vulkan performance on 258V is quite a bit better than on CPU for inference. It's just that many iGPUs are underpowered.
I am wondering then whether the documentation is planned to be updated. It seems that this GGML_VK_DISABLE_INTEGER_DOT_PRODUCT=1 option should be advised for some Intel GPUs, and some don't need it. Battlemage (140V) surely does not but we would need to determine which do.

<!-- gh-comment-id:3648060134 --> @lukaszsobala commented on GitHub (Dec 12, 2025): > yeah. Vulkan performance for this on intel platforms is just uh... super bad. I do not agree. The Vulkan performance on 258V is quite a bit better than on CPU for inference. It's just that many iGPUs are underpowered. I am wondering then whether the documentation is planned to be updated. It seems that this `GGML_VK_DISABLE_INTEGER_DOT_PRODUCT=1` option should be advised for some Intel GPUs, and some don't need it. Battlemage (140V) surely does not but we would need to determine which do.
Author
Owner

@retblast commented on GitHub (Dec 13, 2025):

fair enough, i don't have MTL or LNL ha.

<!-- gh-comment-id:3648912355 --> @retblast commented on GitHub (Dec 13, 2025): fair enough, i don't have MTL or LNL ha.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#8691