[GH-ISSUE #9111] Emerald Rapids host - AMX instruction not recognized #5929

Closed
opened 2026-04-12 17:16:06 -05:00 by GiteaMirror · 9 comments
Owner

Originally created by @js333031 on GitHub (Feb 14, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/9111

Not sure if AMX is being used. I expected the line below to have some indication of AMX or other relevant CPU instructions

system info="CPU : LLAMAFILE = 1 | CPU : LLAMAFILE = 1 | cgo(gcc)" threads=72

Some snippet of /proc/cpuinfo:
`
model name : INTEL(R) XEON(R) GOLD 6554S

perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cat_l2 cdp_l3 tdx_host_platform cdp_l2 ssbd mba ibrs ibpb stibp ibrs_enhanced tpr_shadow flexpriority ept vpid ept_ad fsgsbase tsc_adjust sgx bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb intel_pt avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local split_lock_detect user_shstk avx_vnni avx512_bf16 wbnoinvd dtherm ida arat pln pts hwp hwp_act_window hwp_epp hwp_pkg_req hfi vnmi avx512vbmi umip pku ospke waitpkg avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg tme avx512_vpopcntdq la57 rdpid bus_lock_detect cldemote movdiri movdir64b enqcmd sgx_lc fsrm md_clear serialize tsxldtrk pconfig arch_lbr ibt amx_bf16 avx512_fp16 amx_tile amx_int8 flush_l1d arch_capabilities
vmx flags : vnmi preemption_timer posted_intr invvpid ept_x_only ept_ad ept_1gb ept_5level flexpriority apicv tsc_offset vtpr mtf vapic ept vpid unrestricted_guest vapic_reg vid ple shadow_vmcs pml ept_mode_based_exec tsc_scaling usr_wait_pause notify_vm_exiting ipi_virt
bugs : spectre_v1 spectre_v2 spec_store_bypass swapgs eibrs_pbrsb tdx_pw_mce bhi
`

ollama.log

Originally created by @js333031 on GitHub (Feb 14, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/9111 Not sure if AMX is being used. I expected the line below to have some indication of AMX or other relevant CPU instructions `system info="CPU : LLAMAFILE = 1 | CPU : LLAMAFILE = 1 | cgo(gcc)" threads=72` Some snippet of /proc/cpuinfo: ` model name : INTEL(R) XEON(R) GOLD 6554S perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cat_l2 cdp_l3 tdx_host_platform cdp_l2 ssbd mba ibrs ibpb stibp ibrs_enhanced tpr_shadow flexpriority ept vpid ept_ad fsgsbase tsc_adjust sgx bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb intel_pt avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local split_lock_detect user_shstk avx_vnni avx512_bf16 wbnoinvd dtherm ida arat pln pts hwp hwp_act_window hwp_epp hwp_pkg_req hfi vnmi avx512vbmi umip pku ospke waitpkg avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg tme avx512_vpopcntdq la57 rdpid bus_lock_detect cldemote movdiri movdir64b enqcmd sgx_lc fsrm md_clear serialize tsxldtrk pconfig arch_lbr ibt amx_bf16 avx512_fp16 amx_tile amx_int8 flush_l1d arch_capabilities vmx flags : vnmi preemption_timer posted_intr invvpid ept_x_only ept_ad ept_1gb ept_5level flexpriority apicv tsc_offset vtpr mtf vapic ept vpid unrestricted_guest vapic_reg vid ple shadow_vmcs pml ept_mode_based_exec tsc_scaling usr_wait_pause notify_vm_exiting ipi_virt bugs : spectre_v1 spectre_v2 spec_store_bypass swapgs eibrs_pbrsb tdx_pw_mce bhi ` [ollama.log](https://github.com/user-attachments/files/18802126/ollama.log)
Author
Owner

@js333031 commented on GitHub (Feb 14, 2025):

If I run llama.cpp directly, following line is printed:
system_info: n_threads = 72 (n_threads_batch = 72) / 144 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX_VNNI = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | AVX512 = 1 | AVX512_VBMI = 1 | AVX512_VNNI = 1 | AVX512_BF16 = 1 | AMX_INT8 = 1 | LLAMAFILE = 1 | OPENMP = 1 | AARCH64_REPACK = 1 |

<!-- gh-comment-id:2659779870 --> @js333031 commented on GitHub (Feb 14, 2025): If I run llama.cpp directly, following line is printed: ` system_info: n_threads = 72 (n_threads_batch = 72) / 144 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX_VNNI = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | AVX512 = 1 | AVX512_VBMI = 1 | AVX512_VNNI = 1 | AVX512_BF16 = 1 | AMX_INT8 = 1 | LLAMAFILE = 1 | OPENMP = 1 | AARCH64_REPACK = 1 | `
Author
Owner

@rick-github commented on GitHub (Feb 14, 2025):

The log is from version 0.5.7 of ollama, which only compiles runners with AVX/AVX2 extensions.

<!-- gh-comment-id:2660201821 --> @rick-github commented on GitHub (Feb 14, 2025): The log is from version 0.5.7 of ollama, which only compiles runners with AVX/AVX2 extensions.
Author
Owner

@js333031 commented on GitHub (Feb 14, 2025):

Should the log at least show AVX/AVX2 instruction in the system info line?

How can AMX runner be compiled?

<!-- gh-comment-id:2660321048 --> @js333031 commented on GitHub (Feb 14, 2025): Should the log at least show AVX/AVX2 instruction in the system info line? How can AMX runner be compiled?
Author
Owner

@rick-github commented on GitHub (Feb 14, 2025):

0.5.8+ starts a basic CPU runner that then dynamically loads libraries for dealing with extended CPU architectures or GPUs. If you set OLLAMA_DEBUG=1 in the server environment and post the resulting logs, it will show what library was dynamically loaded.

<!-- gh-comment-id:2660333059 --> @rick-github commented on GitHub (Feb 14, 2025): 0.5.8+ starts a basic CPU runner that then dynamically loads libraries for dealing with extended CPU architectures or GPUs. If you set `OLLAMA_DEBUG=1` in the server environment and post the resulting logs, it will show what library was dynamically loaded.
Author
Owner

@js333031 commented on GitHub (Feb 15, 2025):

Here's another try, this time with 0.5.11. I don't see any change.

(base) jays@m50fcp-1:~/data/ollama_build/ollama$ git log
commit f8453e9d4a15f5f54b610993e8647d252cb65626 (grafted, HEAD, tag: v0.5.11)
Author: Jeffrey Morgan <jmorganca@gmail.com>
Date:   Thu Feb 13 22:37:59 2025 -0800

    llm: attempt to evaluate symlinks, but do not fail (#9089)

    provides a better approach to #9088 that will attempt to
    evaluate symlinks (important for macOS where 'ollama' is
    often a symlink), but use the result of os.Executable()
    as a fallback in scenarios where filepath.EvalSymlinks
    fails due to permission erorrs or other issues
(base) jays@m50fcp-1:~/data/ollama_build/ollama$

ollama.0.5.11.log

<!-- gh-comment-id:2660678427 --> @js333031 commented on GitHub (Feb 15, 2025): Here's another try, this time with 0.5.11. I don't see any change. ``` (base) jays@m50fcp-1:~/data/ollama_build/ollama$ git log commit f8453e9d4a15f5f54b610993e8647d252cb65626 (grafted, HEAD, tag: v0.5.11) Author: Jeffrey Morgan <jmorganca@gmail.com> Date: Thu Feb 13 22:37:59 2025 -0800 llm: attempt to evaluate symlinks, but do not fail (#9089) provides a better approach to #9088 that will attempt to evaluate symlinks (important for macOS where 'ollama' is often a symlink), but use the result of os.Executable() as a fallback in scenarios where filepath.EvalSymlinks fails due to permission erorrs or other issues (base) jays@m50fcp-1:~/data/ollama_build/ollama$ ``` [ollama.0.5.11.log](https://github.com/user-attachments/files/18807581/ollama.0.5.11.log)
Author
Owner

@rick-github commented on GitHub (Feb 15, 2025):

time=2025-02-14T21:48:31.757-05:00 level=INFO source=runner.go:936 msg="starting go runner"
time=2025-02-14T21:48:31.757-05:00 level=INFO source=runner.go:937 msg=system info="CPU : LLAMAFILE = 1 | CPU : LLAMAFILE = 1 | cgo(gcc)" threads=72
time=2025-02-14T21:48:31.757-05:00 level=DEBUG source=ggml.go:89 msg="ggml backend load all from path" path=/mnt/data/ollama_build/ollama

It didn't find any dynamic libraries to load. It looks like you are running the command from /home/jays/data/ollama_build/ollama yet the build path was /mnt/data/ollama_build/ollama. Do you have fancy symlinks/mount points that might be confusing ollama about where to find the libraries?

<!-- gh-comment-id:2660901296 --> @rick-github commented on GitHub (Feb 15, 2025): ``` time=2025-02-14T21:48:31.757-05:00 level=INFO source=runner.go:936 msg="starting go runner" time=2025-02-14T21:48:31.757-05:00 level=INFO source=runner.go:937 msg=system info="CPU : LLAMAFILE = 1 | CPU : LLAMAFILE = 1 | cgo(gcc)" threads=72 time=2025-02-14T21:48:31.757-05:00 level=DEBUG source=ggml.go:89 msg="ggml backend load all from path" path=/mnt/data/ollama_build/ollama ``` It didn't find any dynamic libraries to load. It looks like you are running the command from `/home/jays/data/ollama_build/ollama` yet the build path was `/mnt/data/ollama_build/ollama`. Do you have fancy symlinks/mount points that might be confusing ollama about where to find the libraries?
Author
Owner

@js333031 commented on GitHub (Feb 15, 2025):

/home/jays/data is a symlink to /mnt/data/

(base) jays@m50fcp-1:~/data/ollama_build/ollama$ md5sum /mnt/data/ollama_build/ollama/ollama
6dd3ce2d15ed0b70c68c99b066f12299  /mnt/data/ollama_build/ollama/ollama
(base) jays@m50fcp-1:~/data/ollama_build/ollama$ md5sum /home/jays/data/ollama_build/ollama/ollama
6dd3ce2d15ed0b70c68c99b066f12299  /home/jays/data/ollama_build/ollama/ollama
(base) jays@m50fcp-1:~/data/ollama_build/ollama$
<!-- gh-comment-id:2660969805 --> @js333031 commented on GitHub (Feb 15, 2025): `/home/jays/data` is a symlink to `/mnt/data/` ``` (base) jays@m50fcp-1:~/data/ollama_build/ollama$ md5sum /mnt/data/ollama_build/ollama/ollama 6dd3ce2d15ed0b70c68c99b066f12299 /mnt/data/ollama_build/ollama/ollama (base) jays@m50fcp-1:~/data/ollama_build/ollama$ md5sum /home/jays/data/ollama_build/ollama/ollama 6dd3ce2d15ed0b70c68c99b066f12299 /home/jays/data/ollama_build/ollama/ollama (base) jays@m50fcp-1:~/data/ollama_build/ollama$ ```
Author
Owner

@js333031 commented on GitHub (Feb 15, 2025):

ollama.0.5.11-amx.log

Some progress to report... I had cloned the repo, followed by go build .

But that is not sufficient. Based on @rick-github your hint about not finding the libraries, I built the libraries by doing:

mkdir build
cd build
cmake ../
make 

Then, running OLLAMA_DEBUG=1 ./ollama serve I see AMX being utilized.

A few observations/questions:

  1. The inference results appear faster now than previously but gradually slow down. This behavior wasn't observed using llama.cpp directly. Might be a quantization issue as some layers of the model are not utilizing AMX
  2. Is there a way to print inference statistics like llama.cpp does when app is shutdown?
<!-- gh-comment-id:2660980216 --> @js333031 commented on GitHub (Feb 15, 2025): [ollama.0.5.11-amx.log](https://github.com/user-attachments/files/18811698/ollama.0.5.11-amx.log) Some progress to report... I had cloned the repo, followed by `go build .` But that is not sufficient. Based on @rick-github your hint about not finding the libraries, I built the libraries by doing: ``` mkdir build cd build cmake ../ make ``` Then, running `OLLAMA_DEBUG=1 ./ollama serve` I see AMX being utilized. A few observations/questions: 1. The inference results appear faster now than previously but gradually slow down. This behavior wasn't observed using llama.cpp directly. Might be a quantization issue as some layers of the model are not utilizing AMX 2. Is there a way to print inference statistics like llama.cpp does when app is shutdown?
Author
Owner

@rick-github commented on GitHub (Feb 20, 2025):

https://github.com/ollama/ollama/pull/9203

  1. Can you quantify "slow down?"
  2. ollama --verbose is the best that ollama can offer.
<!-- gh-comment-id:2672021298 --> @rick-github commented on GitHub (Feb 20, 2025): https://github.com/ollama/ollama/pull/9203 1. Can you quantify "slow down?" 2. `ollama --verbose` is the best that ollama can offer.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#5929