[GH-ISSUE #10506] Ollama not detecting AVX2 #6913

Closed
opened 2026-04-12 18:48:12 -05:00 by GiteaMirror · 4 comments
Owner

Originally created by @aaronoaks on GitHub (Apr 30, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/10506

What is the issue?

I'm trying to run Ollama on a Dell Latitude laptop with integrated Intel Iris Xe graphics. I wasn't expecting any GPU support with that, but I found my testing was extremely slow. When I looked at the logs I saw it was using library=cpu variant="", not using AVX2 variant. I saw looking at other issues that most log outputs have a "Dynamic LLM libraries" line like this:

Nov 14 08:53:09 tal-ai-server ollama[1296]: time=2024-11-14T08:53:09.136+09:00 level=INFO source=common.go:49 msg="Dynamic LLM libraries" runners="[rocm cpu cpu_avx cpu_avx2 cuda_v11 cuda_v12]"

but a similar line does not appear in my log output. The CPU is an Intel Core i7-1365U, which Intel says supports AVX2.

Is there something I am missing to get Ollama to perform this detection and use AVX2?

Relevant log output

ollama serve
2025/04/30 12:09:09 routes.go:1232: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:2048 OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:C:\\Users\\aoaks\\.ollama\\models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES:]"
time=2025-04-30T12:09:09.236-05:00 level=INFO source=images.go:458 msg="total blobs: 0"
time=2025-04-30T12:09:09.236-05:00 level=INFO source=images.go:465 msg="total unused blobs removed: 0"
time=2025-04-30T12:09:09.237-05:00 level=INFO source=routes.go:1299 msg="Listening on 127.0.0.1:11434 (version 0.6.6)"
time=2025-04-30T12:09:09.237-05:00 level=INFO source=gpu.go:217 msg="looking for compatible GPUs"
time=2025-04-30T12:09:09.237-05:00 level=INFO source=gpu_windows.go:167 msg=packages count=1
time=2025-04-30T12:09:09.237-05:00 level=INFO source=gpu_windows.go:183 msg="efficiency cores detected" maxEfficiencyClass=1
time=2025-04-30T12:09:09.238-05:00 level=INFO source=gpu_windows.go:214 msg="" package=0 cores=10 efficiency=8 threads=12
time=2025-04-30T12:09:09.253-05:00 level=INFO source=gpu.go:377 msg="no compatible GPUs were discovered"
time=2025-04-30T12:09:09.253-05:00 level=INFO source=types.go:130 msg="inference compute" id=0 library=cpu variant="" compute="" driver=0.0 name="" total="31.4 GiB" available="13.7 GiB"

OS

Windows

GPU

Intel

CPU

Intel

Ollama version

0.6.6

Originally created by @aaronoaks on GitHub (Apr 30, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/10506 ### What is the issue? I'm trying to run Ollama on a Dell Latitude laptop with integrated Intel Iris Xe graphics. I wasn't expecting any GPU support with that, but I found my testing was extremely slow. When I looked at the logs I saw it was using `library=cpu variant=""`, not using AVX2 variant. I saw looking at other issues that most log outputs have a "Dynamic LLM libraries" line like this: > Nov 14 08:53:09 tal-ai-server ollama[1296]: time=2024-11-14T08:53:09.136+09:00 level=INFO source=common.go:49 msg="Dynamic LLM libraries" runners="[rocm cpu cpu_avx cpu_avx2 cuda_v11 cuda_v12]" but a similar line does not appear in my log output. The CPU is an Intel Core i7-1365U, which Intel says supports AVX2. Is there something I am missing to get Ollama to perform this detection and use AVX2? ### Relevant log output ```shell ollama serve 2025/04/30 12:09:09 routes.go:1232: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:2048 OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:C:\\Users\\aoaks\\.ollama\\models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES:]" time=2025-04-30T12:09:09.236-05:00 level=INFO source=images.go:458 msg="total blobs: 0" time=2025-04-30T12:09:09.236-05:00 level=INFO source=images.go:465 msg="total unused blobs removed: 0" time=2025-04-30T12:09:09.237-05:00 level=INFO source=routes.go:1299 msg="Listening on 127.0.0.1:11434 (version 0.6.6)" time=2025-04-30T12:09:09.237-05:00 level=INFO source=gpu.go:217 msg="looking for compatible GPUs" time=2025-04-30T12:09:09.237-05:00 level=INFO source=gpu_windows.go:167 msg=packages count=1 time=2025-04-30T12:09:09.237-05:00 level=INFO source=gpu_windows.go:183 msg="efficiency cores detected" maxEfficiencyClass=1 time=2025-04-30T12:09:09.238-05:00 level=INFO source=gpu_windows.go:214 msg="" package=0 cores=10 efficiency=8 threads=12 time=2025-04-30T12:09:09.253-05:00 level=INFO source=gpu.go:377 msg="no compatible GPUs were discovered" time=2025-04-30T12:09:09.253-05:00 level=INFO source=types.go:130 msg="inference compute" id=0 library=cpu variant="" compute="" driver=0.0 name="" total="31.4 GiB" available="13.7 GiB" ``` ### OS Windows ### GPU Intel ### CPU Intel ### Ollama version 0.6.6
GiteaMirror added the bug label 2026-04-12 18:48:12 -05:00
Author
Owner

@rick-github commented on GitHub (Apr 30, 2025):

The runner architecture has changed, there are no longer runner variants for different CPU architectures. Instead the runner dynamically loads a library to implement architecture-dependent processing. Load a model and look for a line in the logs with compiler= in it:

time=2025-04-30T16:45:32.662Z level=INFO source=ggml.go:109 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX_VNNI=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=500,600,610,700,750,800,860,870,890,900,1200 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc)
<!-- gh-comment-id:2842767440 --> @rick-github commented on GitHub (Apr 30, 2025): The runner architecture has changed, there are no longer runner variants for different CPU architectures. Instead the runner dynamically loads a library to implement architecture-dependent processing. Load a model and look for a line in the logs with `compiler=` in it: ``` time=2025-04-30T16:45:32.662Z level=INFO source=ggml.go:109 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX_VNNI=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=500,600,610,700,750,800,860,870,890,900,1200 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc) ```
Author
Owner

@aaronoaks commented on GitHub (Apr 30, 2025):

Thanks for the fast response! I checked the logs and found the corresponding line:

time=2025-04-30T17:39:10.236Z level=INFO source=ggml.go:109 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX_VNNI=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc)

I'm assuming the =1 means it will be able to use the associated instruction sets?

<!-- gh-comment-id:2842856410 --> @aaronoaks commented on GitHub (Apr 30, 2025): Thanks for the fast response! I checked the logs and found the corresponding line: ``` time=2025-04-30T17:39:10.236Z level=INFO source=ggml.go:109 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX_VNNI=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc) ``` I'm assuming the `=1` means it will be able to use the associated instruction sets?
Author
Owner

@rick-github commented on GitHub (Apr 30, 2025):

Yep.

<!-- gh-comment-id:2842922449 --> @rick-github commented on GitHub (Apr 30, 2025): Yep.
Author
Owner

@aaronoaks commented on GitHub (Apr 30, 2025):

Ok, thanks for the info, good to know where to check now.

<!-- gh-comment-id:2842983474 --> @aaronoaks commented on GitHub (Apr 30, 2025): Ok, thanks for the info, good to know where to check now.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#6913