[GH-ISSUE #12889] Ollama 0.12.6 running perfectly intel gpu but 0.12.7 and 0.12.8 are not working perfectly please solve it and fully support ollama in intel gpu quickly #8542

Closed
opened 2026-04-12 21:14:57 -05:00 by GiteaMirror · 9 comments
Owner

Originally created by @Swagatade on GitHub (Oct 31, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/12889

Originally created by @Swagatade on GitHub (Oct 31, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/12889
GiteaMirror added the feature requestneeds more info labels 2026-04-12 21:14:57 -05:00
Author
Owner

@abcbarryn commented on GitHub (Nov 1, 2025):

Additionally, any version of Ollama later than 0.11.2 Has issues with building/compiling against CUDA 11.8. See bug https://github.com/ollama/ollama/issues/12872
Also, 1800+ open issues? It looks like this software is becoming VERY unstable. :(

<!-- gh-comment-id:3475468399 --> @abcbarryn commented on GitHub (Nov 1, 2025): Additionally, any version of Ollama later than 0.11.2 Has issues with building/compiling against CUDA 11.8. See bug https://github.com/ollama/ollama/issues/12872 Also, 1800+ open issues? It looks like this software is becoming VERY unstable. :(
Author
Owner

@2570165831 commented on GitHub (Nov 1, 2025):

The same issue exists on Mac (Apple Silicon).

<!-- gh-comment-id:3475787107 --> @2570165831 commented on GitHub (Nov 1, 2025): The same issue exists on Mac (Apple Silicon).
Author
Owner

@dhiltgen commented on GitHub (Nov 4, 2025):

@Swagatade can you clarify? Intel GPUs are not yet supported, so I'm not sure how you are able to run inference on an Intel GPU on 0.12.6. Vulkan support is coming soon, which will enable Intel GPUs.

<!-- gh-comment-id:3487614939 --> @dhiltgen commented on GitHub (Nov 4, 2025): @Swagatade can you clarify? Intel GPUs are not yet supported, so I'm not sure how you are able to run inference on an Intel GPU on 0.12.6. Vulkan support is coming soon, which will enable Intel GPUs.
Author
Owner

@Swagatade commented on GitHub (Nov 4, 2025):

@Swagatade can you clarify? Intel GPUs are not yet supported, so I'm not sure how you are able to run inference on an Intel GPU on 0.12.6. Vulkan support is coming soon, which will enable Intel GPUs.

I just installed and ran ollama. My cpu is Intel ultra 5 228v so I used Igpu not dgpu. Intel Igpu is ARC so automatically runs effectively ollama 20b gpt-oss model. I am used lenevo yoga model....

<!-- gh-comment-id:3487643892 --> @Swagatade commented on GitHub (Nov 4, 2025): > @Swagatade can you clarify? Intel GPUs are not yet supported, so I'm not sure how you are able to run inference on an Intel GPU on 0.12.6. Vulkan support is coming soon, which will enable Intel GPUs. I just installed and ran ollama. My cpu is Intel ultra 5 228v so I used Igpu not dgpu. Intel Igpu is ARC so automatically runs effectively ollama 20b gpt-oss model. I am used lenevo yoga model....
Author
Owner

@dhiltgen commented on GitHub (Nov 4, 2025):

@Swagatade I believe it's running on your CPU - can you share your server log?

<!-- gh-comment-id:3488146412 --> @dhiltgen commented on GitHub (Nov 4, 2025): @Swagatade I believe it's running on your CPU - can you share your server log?
Author
Owner

@Swagatade commented on GitHub (Nov 5, 2025):

@Swagatade I believe it's running on your CPU - can you share your server log?

@dhiltgen
C:\Users\swaga_nsppntr>ollama serve
time=2025-11-05T06:44:18.198+05:30 level=INFO source=routes.go:1524 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:INFO OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:C:\Users\swaga_nsppntr\.ollama\models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES:]"
time=2025-11-05T06:44:18.354+05:30 level=INFO source=images.go:522 msg="total blobs: 51"
time=2025-11-05T06:44:18.357+05:30 level=INFO source=images.go:529 msg="total unused blobs removed: 0"
time=2025-11-05T06:44:18.361+05:30 level=INFO source=routes.go:1577 msg="Listening on 127.0.0.1:11434 (version 0.12.9)"
time=2025-11-05T06:44:18.363+05:30 level=INFO source=runner.go:76 msg="discovering available GPUs..."
time=2025-11-05T06:44:18.372+05:30 level=INFO source=server.go:400 msg="starting runner" cmd="C:\Users\swaga_nsppntr\AppData\Local\Programs\Ollama\ollama.exe runner --ollama-engine --port 55453"
time=2025-11-05T06:44:21.018+05:30 level=INFO source=server.go:400 msg="starting runner" cmd="C:\Users\swaga_nsppntr\AppData\Local\Programs\Ollama\ollama.exe runner --ollama-engine --port 55466"
time=2025-11-05T06:44:21.339+05:30 level=INFO source=server.go:400 msg="starting runner" cmd="C:\Users\swaga_nsppntr\AppData\Local\Programs\Ollama\ollama.exe runner --ollama-engine --port 55474"
time=2025-11-05T06:44:21.622+05:30 level=INFO source=types.go:60 msg="inference compute" id=cpu library=cpu compute="" name=cpu description=cpu libdirs=ollama driver="" pci_id="" type="" total="31.5 GiB" available="16.4 GiB"
time=2025-11-05T06:44:21.623+05:30 level=INFO source=routes.go:1618 msg="entering low vram mode" "total vram"="0 B" threshold="20.0 GiB"
[GIN] 2025/11/05 - 06:44:41 | 200 | 2.2735ms | 127.0.0.1 | HEAD "/"
[GIN] 2025/11/05 - 06:44:41 | 200 | 197.4992ms | 127.0.0.1 | GET "/api/tags"
[GIN] 2025/11/05 - 06:45:10 | 200 | 0s | 127.0.0.1 | HEAD "/"
[GIN] 2025/11/05 - 06:45:10 | 200 | 203.4733ms | 127.0.0.1 | POST "/api/show"
[GIN] 2025/11/05 - 06:45:11 | 200 | 102.8877ms | 127.0.0.1 | POST "/api/show"
time=2025-11-05T06:45:11.257+05:30 level=INFO source=cpu_windows.go:139 msg=packages count=1
time=2025-11-05T06:45:11.257+05:30 level=INFO source=cpu_windows.go:155 msg="efficiency cores detected" maxEfficiencyClass=1
time=2025-11-05T06:45:11.257+05:30 level=INFO source=cpu_windows.go:186 msg="" package=0 cores=8 efficiency=4 threads=8
time=2025-11-05T06:45:11.412+05:30 level=INFO source=server.go:215 msg="enabling flash attention"
time=2025-11-05T06:45:11.420+05:30 level=INFO source=server.go:400 msg="starting runner" cmd="C:\Users\swaga_nsppntr\AppData\Local\Programs\Ollama\ollama.exe runner --ollama-engine --model C:\Users\swaga_nsppntr\.ollama\models\blobs\sha256-c4016c9e54d0a9218b5911790579e58284a9ed57c48b7e87607125c6307f9da1 --port 61014"
time=2025-11-05T06:45:11.426+05:30 level=INFO source=server.go:653 msg="loading model" "model layers"=25 requested=-1
time=2025-11-05T06:45:11.426+05:30 level=INFO source=server.go:658 msg="system memory" total="31.5 GiB" free="16.7 GiB" free_swap="34.9 GiB"
time=2025-11-05T06:45:11.477+05:30 level=INFO source=runner.go:1349 msg="starting ollama engine"
time=2025-11-05T06:45:11.478+05:30 level=INFO source=runner.go:1384 msg="Server listening on 127.0.0.1:61014"
time=2025-11-05T06:45:11.486+05:30 level=INFO source=runner.go:1222 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:4 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2025-11-05T06:45:11.527+05:30 level=INFO source=ggml.go:136 msg="" architecture=gptoss file_type=MXFP4 name="" description="" num_tensors=459 num_key_values=32
load_backend: loaded CPU backend from C:\Users\swaga_nsppntr\AppData\Local\Programs\Ollama\lib\ollama\ggml-cpu-alderlake.dll
time=2025-11-05T06:45:11.545+05:30 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX_VNNI=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(clang)
time=2025-11-05T06:45:11.570+05:30 level=INFO source=runner.go:1222 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:4 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2025-11-05T06:45:11.691+05:30 level=INFO source=runner.go:1222 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:4 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2025-11-05T06:45:11.692+05:30 level=INFO source=device.go:217 msg="model weights" device=CPU size="12.8 GiB"
time=2025-11-05T06:45:11.692+05:30 level=INFO source=ggml.go:482 msg="offloading 0 repeating layers to GPU"
time=2025-11-05T06:45:11.693+05:30 level=INFO source=ggml.go:486 msg="offloading output layer to CPU"
time=2025-11-05T06:45:11.693+05:30 level=INFO source=ggml.go:494 msg="offloaded 0/25 layers to GPU"
time=2025-11-05T06:45:11.693+05:30 level=INFO source=device.go:228 msg="kv cache" device=CPU size="192.0 MiB"
time=2025-11-05T06:45:11.694+05:30 level=INFO source=device.go:239 msg="compute graph" device=CPU size="94.8 MiB"
time=2025-11-05T06:45:11.694+05:30 level=INFO source=device.go:244 msg="total memory" size="13.1 GiB"
time=2025-11-05T06:45:11.694+05:30 level=INFO source=sched.go:493 msg="loaded runners" count=1
time=2025-11-05T06:45:11.695+05:30 level=INFO source=server.go:1251 msg="waiting for llama runner to start responding"
time=2025-11-05T06:45:11.696+05:30 level=INFO source=server.go:1285 msg="waiting for server to become available" status="llm server loading model"
time=2025-11-05T06:45:17.300+05:30 level=INFO source=server.go:1285 msg="waiting for server to become available" status="llm server not responding"
time=2025-11-05T06:45:17.935+05:30 level=INFO source=server.go:1285 msg="waiting for server to become available" status="llm server loading model"
time=2025-11-05T06:45:20.201+05:30 level=INFO source=server.go:1285 msg="waiting for server to become available" status="llm server not responding"
time=2025-11-05T06:45:20.461+05:30 level=INFO source=server.go:1285 msg="waiting for server to become available" status="llm server loading model"
time=2025-11-05T06:45:21.966+05:30 level=INFO source=server.go:1289 msg="llama runner started in 10.54 seconds"
[GIN] 2025/11/05 - 06:45:21 | 200 | 10.8840579s | 127.0.0.1 | POST "/api/generate"
[GIN] 2025/11/05 - 06:45:46 | 200 | 9.886854s | 127.0.0.1 | POST "/api/chat"

<!-- gh-comment-id:3488678629 --> @Swagatade commented on GitHub (Nov 5, 2025): > [@Swagatade](https://github.com/Swagatade) I believe it's running on your CPU - can you share your server log? @dhiltgen C:\Users\swaga_nsppntr>ollama serve time=2025-11-05T06:44:18.198+05:30 level=INFO source=routes.go:1524 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:INFO OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:C:\\Users\\swaga_nsppntr\\.ollama\\models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES:]" time=2025-11-05T06:44:18.354+05:30 level=INFO source=images.go:522 msg="total blobs: 51" time=2025-11-05T06:44:18.357+05:30 level=INFO source=images.go:529 msg="total unused blobs removed: 0" time=2025-11-05T06:44:18.361+05:30 level=INFO source=routes.go:1577 msg="Listening on 127.0.0.1:11434 (version 0.12.9)" time=2025-11-05T06:44:18.363+05:30 level=INFO source=runner.go:76 msg="discovering available GPUs..." time=2025-11-05T06:44:18.372+05:30 level=INFO source=server.go:400 msg="starting runner" cmd="C:\\Users\\swaga_nsppntr\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 55453" time=2025-11-05T06:44:21.018+05:30 level=INFO source=server.go:400 msg="starting runner" cmd="C:\\Users\\swaga_nsppntr\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 55466" time=2025-11-05T06:44:21.339+05:30 level=INFO source=server.go:400 msg="starting runner" cmd="C:\\Users\\swaga_nsppntr\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 55474" time=2025-11-05T06:44:21.622+05:30 level=INFO source=types.go:60 msg="inference compute" id=cpu library=cpu compute="" name=cpu description=cpu libdirs=ollama driver="" pci_id="" type="" total="31.5 GiB" available="16.4 GiB" time=2025-11-05T06:44:21.623+05:30 level=INFO source=routes.go:1618 msg="entering low vram mode" "total vram"="0 B" threshold="20.0 GiB" [GIN] 2025/11/05 - 06:44:41 | 200 | 2.2735ms | 127.0.0.1 | HEAD "/" [GIN] 2025/11/05 - 06:44:41 | 200 | 197.4992ms | 127.0.0.1 | GET "/api/tags" [GIN] 2025/11/05 - 06:45:10 | 200 | 0s | 127.0.0.1 | HEAD "/" [GIN] 2025/11/05 - 06:45:10 | 200 | 203.4733ms | 127.0.0.1 | POST "/api/show" [GIN] 2025/11/05 - 06:45:11 | 200 | 102.8877ms | 127.0.0.1 | POST "/api/show" time=2025-11-05T06:45:11.257+05:30 level=INFO source=cpu_windows.go:139 msg=packages count=1 time=2025-11-05T06:45:11.257+05:30 level=INFO source=cpu_windows.go:155 msg="efficiency cores detected" maxEfficiencyClass=1 time=2025-11-05T06:45:11.257+05:30 level=INFO source=cpu_windows.go:186 msg="" package=0 cores=8 efficiency=4 threads=8 time=2025-11-05T06:45:11.412+05:30 level=INFO source=server.go:215 msg="enabling flash attention" time=2025-11-05T06:45:11.420+05:30 level=INFO source=server.go:400 msg="starting runner" cmd="C:\\Users\\swaga_nsppntr\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --model C:\\Users\\swaga_nsppntr\\.ollama\\models\\blobs\\sha256-c4016c9e54d0a9218b5911790579e58284a9ed57c48b7e87607125c6307f9da1 --port 61014" time=2025-11-05T06:45:11.426+05:30 level=INFO source=server.go:653 msg="loading model" "model layers"=25 requested=-1 time=2025-11-05T06:45:11.426+05:30 level=INFO source=server.go:658 msg="system memory" total="31.5 GiB" free="16.7 GiB" free_swap="34.9 GiB" time=2025-11-05T06:45:11.477+05:30 level=INFO source=runner.go:1349 msg="starting ollama engine" time=2025-11-05T06:45:11.478+05:30 level=INFO source=runner.go:1384 msg="Server listening on 127.0.0.1:61014" time=2025-11-05T06:45:11.486+05:30 level=INFO source=runner.go:1222 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:4 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2025-11-05T06:45:11.527+05:30 level=INFO source=ggml.go:136 msg="" architecture=gptoss file_type=MXFP4 name="" description="" num_tensors=459 num_key_values=32 load_backend: loaded CPU backend from C:\Users\swaga_nsppntr\AppData\Local\Programs\Ollama\lib\ollama\ggml-cpu-alderlake.dll time=2025-11-05T06:45:11.545+05:30 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX_VNNI=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(clang) time=2025-11-05T06:45:11.570+05:30 level=INFO source=runner.go:1222 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:4 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2025-11-05T06:45:11.691+05:30 level=INFO source=runner.go:1222 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:4 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2025-11-05T06:45:11.692+05:30 level=INFO source=device.go:217 msg="model weights" device=CPU size="12.8 GiB" time=2025-11-05T06:45:11.692+05:30 level=INFO source=ggml.go:482 msg="offloading 0 repeating layers to GPU" time=2025-11-05T06:45:11.693+05:30 level=INFO source=ggml.go:486 msg="offloading output layer to CPU" time=2025-11-05T06:45:11.693+05:30 level=INFO source=ggml.go:494 msg="offloaded 0/25 layers to GPU" time=2025-11-05T06:45:11.693+05:30 level=INFO source=device.go:228 msg="kv cache" device=CPU size="192.0 MiB" time=2025-11-05T06:45:11.694+05:30 level=INFO source=device.go:239 msg="compute graph" device=CPU size="94.8 MiB" time=2025-11-05T06:45:11.694+05:30 level=INFO source=device.go:244 msg="total memory" size="13.1 GiB" time=2025-11-05T06:45:11.694+05:30 level=INFO source=sched.go:493 msg="loaded runners" count=1 time=2025-11-05T06:45:11.695+05:30 level=INFO source=server.go:1251 msg="waiting for llama runner to start responding" time=2025-11-05T06:45:11.696+05:30 level=INFO source=server.go:1285 msg="waiting for server to become available" status="llm server loading model" time=2025-11-05T06:45:17.300+05:30 level=INFO source=server.go:1285 msg="waiting for server to become available" status="llm server not responding" time=2025-11-05T06:45:17.935+05:30 level=INFO source=server.go:1285 msg="waiting for server to become available" status="llm server loading model" time=2025-11-05T06:45:20.201+05:30 level=INFO source=server.go:1285 msg="waiting for server to become available" status="llm server not responding" time=2025-11-05T06:45:20.461+05:30 level=INFO source=server.go:1285 msg="waiting for server to become available" status="llm server loading model" time=2025-11-05T06:45:21.966+05:30 level=INFO source=server.go:1289 msg="llama runner started in 10.54 seconds" [GIN] 2025/11/05 - 06:45:21 | 200 | 10.8840579s | 127.0.0.1 | POST "/api/generate" [GIN] 2025/11/05 - 06:45:46 | 200 | 9.886854s | 127.0.0.1 | POST "/api/chat"
Author
Owner

@Swagatade commented on GitHub (Nov 5, 2025):

@dhiltgen #New update is ollama version is 0.12.9 properly work intel cpu

https://github.com/user-attachments/assets/378981ba-fafc-4ccb-b245-559fc21ec8f7

<!-- gh-comment-id:3488766529 --> @Swagatade commented on GitHub (Nov 5, 2025): @dhiltgen #New update is ollama version is 0.12.9 properly work intel cpu https://github.com/user-attachments/assets/378981ba-fafc-4ccb-b245-559fc21ec8f7
Author
Owner

@dhiltgen commented on GitHub (Nov 5, 2025):

Happy to hear things are working on the latest version. Just to confirm, from your logs, yes, you are running inference on CPU, not GPU

time=2025-11-05T06:44:21.622+05:30 level=INFO source=types.go:60 msg="inference compute" id=cpu library=cpu compute="" name=cpu description=cpu libdirs=ollama driver="" pci_id="" type="" total="31.5 GiB" available="16.4 GiB"

Once we enable Vulkan you should be able to start running inference on your Intel GPU.

<!-- gh-comment-id:3492502494 --> @dhiltgen commented on GitHub (Nov 5, 2025): Happy to hear things are working on the latest version. Just to confirm, from your logs, yes, you are running inference on CPU, not GPU ``` time=2025-11-05T06:44:21.622+05:30 level=INFO source=types.go:60 msg="inference compute" id=cpu library=cpu compute="" name=cpu description=cpu libdirs=ollama driver="" pci_id="" type="" total="31.5 GiB" available="16.4 GiB" ``` Once we enable Vulkan you should be able to start running inference on your Intel GPU.
Author
Owner

@Swagatade commented on GitHub (Nov 5, 2025):

Happy to hear things are working on the latest version. Just to confirm, from your logs, yes, you are running inference on CPU, not GPU

time=2025-11-05T06:44:21.622+05:30 level=INFO source=types.go:60 msg="inference compute" id=cpu library=cpu compute="" name=cpu description=cpu libdirs=ollama driver="" pci_id="" type="" total="31.5 GiB" available="16.4 GiB"

Once we enable Vulkan you should be able to start running inference on your Intel GPU.

Can you suggest me how to Collaborate Full-Time with Ollama as a Developer? I am a college studend. I seen your LinkedIn profile.

<!-- gh-comment-id:3492641266 --> @Swagatade commented on GitHub (Nov 5, 2025): > Happy to hear things are working on the latest version. Just to confirm, from your logs, yes, you are running inference on CPU, not GPU > ``` > time=2025-11-05T06:44:21.622+05:30 level=INFO source=types.go:60 msg="inference compute" id=cpu library=cpu compute="" name=cpu description=cpu libdirs=ollama driver="" pci_id="" type="" total="31.5 GiB" available="16.4 GiB" > ``` > > Once we enable Vulkan you should be able to start running inference on your Intel GPU. Can you suggest me how to Collaborate Full-Time with Ollama as a Developer? I am a college studend. I seen your LinkedIn profile.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#8542