[GH-ISSUE #1317] Support AVX2 #62721

Closed
opened 2026-05-03 10:05:47 -05:00 by GiteaMirror · 5 comments
Owner

Originally created by @grigio on GitHub (Nov 29, 2023).
Original GitHub issue: https://github.com/ollama/ollama/issues/1317

Originally assigned to: @dhiltgen on GitHub.

ollama        | 2023/11/29 16:18:06 images.go:784: total blobs: 34
ollama        | 2023/11/29 16:18:06 images.go:791: total unused blobs removed: 0
ollama        | 2023/11/29 16:18:06 routes.go:777: Listening on [::]:11434 (version 0.1.12)
ollama        | 2023/11/29 16:18:06 routes.go:797: warning: gpu support may not be enabled, check that you have installed GPU drivers: nvidia-smi command failed
ollama        | 2023/11/29 16:18:28 llama.go:390: skipping accelerated runner because num_gpu=0
ollama        | 2023/11/29 16:18:28 llama.go:421: starting llama runner
ollama        | 2023/11/29 16:18:28 llama.go:479: waiting for llama runner to start responding
ollama        | {"timestamp":1701274708,"level":"WARNING","function":"server_params_parse","line":2035,"message":"Not compiled with GPU offload support, --n-gpu-layers option will be ignored. See main README.md for information on enabling GPU BLAS support","n_gpu_layers":-1}
ollama        | {"timestamp":1701274708,"level":"INFO","function":"main","line":2534,"message":"build info","build":375,"commit":"9656026"}
ollama        | {"timestamp":1701274708,"level":"INFO","function":"main","line":2537,"message":"system info","n_threads":8,"n_threads_batch":-1,"total_threads":16,"system_info":"AVX = 1 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | "}
ollama        | llama_model_loader: loaded meta data with 19 key-value pairs and 291 tensors from /root/.ollama/models/blobs/sha256:683b485b03a488019b807a1829bb3e7b8084501f4ed6eab4b21e1100655c8e1d (version GGUF V2)
ollama        | llama_model_loader: - tensor    0:                token_embd.weight q4_K     [  4096, 32002,     1,     1 ]
ollama        | llama_model_loader: - tensor    1:              blk.0.attn_q.weight q4_K     [  4096,  4096,     1,     1 ]
ollama        | llama_model_loader: - tensor    2:              blk.0.attn_k.weight q4_K     [  4096,  1024,     1,     1 ]
ollama        | llama_model_loader: - tensor    3:              blk.0.attn_v.weight q6_K     [  4096,  1024,     1,     1 ]
ollama        | llama_model_loader: - tensor    4:         blk.0.attn_output.weight q4_K     [  4096,  4096,     1,     1 ]

My CPU

$ lscpu
Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         48 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  16
  On-line CPU(s) list:   0-15
Vendor ID:               AuthenticAMD
  Model name:            AMD Ryzen 7 7700 8-Core Processor
    CPU family:          25
    Model:               97
    Thread(s) per core:  2
    Core(s) per socket:  8
    Socket(s):           1
    Stepping:            2
    Frequency boost:     enabled
    CPU(s) scaling MHz:  57%
    CPU max MHz:         5388,2808
    CPU min MHz:         3000,0000
    BogoMIPS:            7586,07
    Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1
                         gb rdtscp lm constant_tsc rep_good amd_lbr_v2 nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16
                          sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowpre
                         fetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba perfmon_v
                         2 ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a avx512f avx512dq rdseed adx smap avx512ifma clflushopt 
                         clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local avx512_bf16 c
                         lzero irperf xsaveerptr rdpru wbnoinvd cppc arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilt
                         er pfthreshold avic v_vmsave_vmload vgif x2avic v_spec_ctrl avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni a
                         vx512_bitalg avx512_vpopcntdq rdpid overflow_recov succor smca fsrm flush_l1d
Virtualization features: 
  Virtualization:        AMD-V
Caches (sum of all):     

Originally created by @grigio on GitHub (Nov 29, 2023). Original GitHub issue: https://github.com/ollama/ollama/issues/1317 Originally assigned to: @dhiltgen on GitHub. ``` ollama | 2023/11/29 16:18:06 images.go:784: total blobs: 34 ollama | 2023/11/29 16:18:06 images.go:791: total unused blobs removed: 0 ollama | 2023/11/29 16:18:06 routes.go:777: Listening on [::]:11434 (version 0.1.12) ollama | 2023/11/29 16:18:06 routes.go:797: warning: gpu support may not be enabled, check that you have installed GPU drivers: nvidia-smi command failed ollama | 2023/11/29 16:18:28 llama.go:390: skipping accelerated runner because num_gpu=0 ollama | 2023/11/29 16:18:28 llama.go:421: starting llama runner ollama | 2023/11/29 16:18:28 llama.go:479: waiting for llama runner to start responding ollama | {"timestamp":1701274708,"level":"WARNING","function":"server_params_parse","line":2035,"message":"Not compiled with GPU offload support, --n-gpu-layers option will be ignored. See main README.md for information on enabling GPU BLAS support","n_gpu_layers":-1} ollama | {"timestamp":1701274708,"level":"INFO","function":"main","line":2534,"message":"build info","build":375,"commit":"9656026"} ollama | {"timestamp":1701274708,"level":"INFO","function":"main","line":2537,"message":"system info","n_threads":8,"n_threads_batch":-1,"total_threads":16,"system_info":"AVX = 1 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | "} ollama | llama_model_loader: loaded meta data with 19 key-value pairs and 291 tensors from /root/.ollama/models/blobs/sha256:683b485b03a488019b807a1829bb3e7b8084501f4ed6eab4b21e1100655c8e1d (version GGUF V2) ollama | llama_model_loader: - tensor 0: token_embd.weight q4_K [ 4096, 32002, 1, 1 ] ollama | llama_model_loader: - tensor 1: blk.0.attn_q.weight q4_K [ 4096, 4096, 1, 1 ] ollama | llama_model_loader: - tensor 2: blk.0.attn_k.weight q4_K [ 4096, 1024, 1, 1 ] ollama | llama_model_loader: - tensor 3: blk.0.attn_v.weight q6_K [ 4096, 1024, 1, 1 ] ollama | llama_model_loader: - tensor 4: blk.0.attn_output.weight q4_K [ 4096, 4096, 1, 1 ] ``` My CPU ``` $ lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 48 bits physical, 48 bits virtual Byte Order: Little Endian CPU(s): 16 On-line CPU(s) list: 0-15 Vendor ID: AuthenticAMD Model name: AMD Ryzen 7 7700 8-Core Processor CPU family: 25 Model: 97 Thread(s) per core: 2 Core(s) per socket: 8 Socket(s): 1 Stepping: 2 Frequency boost: enabled CPU(s) scaling MHz: 57% CPU max MHz: 5388,2808 CPU min MHz: 3000,0000 BogoMIPS: 7586,07 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1 gb rdtscp lm constant_tsc rep_good amd_lbr_v2 nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowpre fetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba perfmon_v 2 ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local avx512_bf16 c lzero irperf xsaveerptr rdpru wbnoinvd cppc arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilt er pfthreshold avic v_vmsave_vmload vgif x2avic v_spec_ctrl avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni a vx512_bitalg avx512_vpopcntdq rdpid overflow_recov succor smca fsrm flush_l1d Virtualization features: Virtualization: AMD-V Caches (sum of all): ```
GiteaMirror added the feature request label 2026-05-03 10:05:47 -05:00
Author
Owner

@grigio commented on GitHub (Nov 29, 2023):

2ae80e1e27/llm/llama.cpp/generate_linux.go (L17)

Is possible to rebuild ollama with AVX2 support or pass that option via a parameter?

<!-- gh-comment-id:1832278745 --> @grigio commented on GitHub (Nov 29, 2023): https://github.com/jmorganca/ollama/blob/2ae80e1e27f94f66212de3fcdeadf31b49e25cc6/llm/llama.cpp/generate_linux.go#L17 Is possible to rebuild ollama with AVX2 support or pass that option via a parameter?
Author
Owner

@easp commented on GitHub (Nov 30, 2023):

This is for compatibility, but they are planning for making this choice at runtime, rather than compile time: #1279

<!-- gh-comment-id:1833022730 --> @easp commented on GitHub (Nov 30, 2023): This is for compatibility, but they are planning for making this choice at runtime, rather than compile time: #1279
Author
Owner

@jmorganca commented on GitHub (Nov 30, 2023):

Yes, thanks @easp! Will close this for #1279. Will also investigate making this at least a compile time flag for now – that part should be much simpler

<!-- gh-comment-id:1833043995 --> @jmorganca commented on GitHub (Nov 30, 2023): Yes, thanks @easp! Will close this for #1279. Will also investigate making this at least a compile time flag for now – that part should be much simpler
Author
Owner

@grigio commented on GitHub (Nov 30, 2023):

This is for compatibility, but they are planning for making this choice at runtime, rather than compile time: #1279

This is for compatibility, but they are planning for making this choice at runtime, rather than compile time: #1279

Thanks, I missed that!

<!-- gh-comment-id:1833321805 --> @grigio commented on GitHub (Nov 30, 2023): > This is for compatibility, but they are planning for making this choice at runtime, rather than compile time: #1279 > This is for compatibility, but they are planning for making this choice at runtime, rather than compile time: #1279 Thanks, I missed that!
Author
Owner

@dhiltgen commented on GitHub (Jan 20, 2024):

With release 0.1.21 we now support multiple CPU optimized variants of the LLM library. The system will auto-detect the capabilities of the CPU and select one of AVX2, AVX, or unoptimized. This works on linux, mac, and windows.

<!-- gh-comment-id:1902417561 --> @dhiltgen commented on GitHub (Jan 20, 2024): With [release 0.1.21](https://github.com/jmorganca/ollama/releases/tag/v0.1.21) we now support multiple CPU optimized variants of the LLM library. The system will auto-detect the capabilities of the CPU and select one of AVX2, AVX, or unoptimized. This works on linux, mac, and windows.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#62721