[GH-ISSUE #2332] using a legacy x86_64 cpu and GTX 1050 Ti? #27107

Closed
opened 2026-04-22 04:04:40 -05:00 by GiteaMirror · 7 comments
Owner

Originally created by @truatpasteurdotfr on GitHub (Feb 2, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/2332

Hi,

I have an old machine I would try to play with:

$ lscpu
...
Model name:            Intel(R) Xeon(R) CPU           E5410  @ 2.33GHz
...
Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good nopl aperfmperf eagerfpu pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 lahf_lm rsb_ctxsw tpr_shadow vnmi flexpriority dtherm

No AVX, but the gpu card is still supported (CC=6.1)

$ /c7/shared/cuda/12.1.1_530.30.02/samples/bin/x86_64/linux/release/deviceQuery
...
Device 0: "NVIDIA GeForce GTX 1050 Ti"
  CUDA Driver Version / Runtime Version          12.2 / 12.1
  CUDA Capability Major/Minor version number:    6.1
  Total amount of global memory:                 4038 MBytes (4234674176 bytes)
  (006) Multiprocessors, (128) CUDA Cores/MP:    768 CUDA Cores
...  

I have rebuild ollama with cuda support and it is not using the gpu (although properly detected):

[tru@mafalda ollama]$ ./ollama --version
Warning: could not connect to a running Ollama instance
Warning: client version is 0.1.23-0-g09a6f76
[tru@mafalda ollama]$ ./ollama serve
time=2024-02-02T17:27:46.581+01:00 level=INFO source=images.go:860 msg="total blobs: 16"
time=2024-02-02T17:27:46.583+01:00 level=INFO source=images.go:867 msg="total unused blobs removed: 0"
time=2024-02-02T17:27:46.585+01:00 level=INFO source=routes.go:995 msg="Listening on 127.0.0.1:11434 (version 0.1.23-0-g09a6f76)"
time=2024-02-02T17:27:46.585+01:00 level=INFO source=payload_common.go:106 msg="Extracting dynamic libraries..."
time=2024-02-02T17:27:58.309+01:00 level=INFO source=payload_common.go:145 msg="Dynamic LLM libraries [cpu cuda_v1_530 cpu_avx2 cpu_avx]"
time=2024-02-02T17:27:58.310+01:00 level=INFO source=gpu.go:94 msg="Detecting GPU type"
time=2024-02-02T17:27:58.310+01:00 level=INFO source=gpu.go:242 msg="Searching for GPU management library libnvidia-ml.so"
time=2024-02-02T17:27:58.318+01:00 level=INFO source=gpu.go:288 msg="Discovered GPU libraries: [/usr/lib64/libnvidia-ml.so.535.129.03]"
time=2024-02-02T17:27:58.331+01:00 level=INFO source=gpu.go:99 msg="Nvidia GPU detected"
time=2024-02-02T17:27:58.332+01:00 level=INFO source=cpu_common.go:18 msg="CPU does not have vector extensions"
time=2024-02-02T17:27:58.332+01:00 level=WARN source=gpu.go:128 msg="CPU does not have AVX or AVX2, disabling GPU support."
time=2024-02-02T17:27:58.332+01:00 level=INFO source=routes.go:1018 msg="no GPU detected"
[GIN] 2024/02/02 - 17:27:59 | 200 |     100.887µs |       127.0.0.1 | HEAD     "/"
[GIN] 2024/02/02 - 17:27:59 | 200 |    1.543664ms |       127.0.0.1 | POST     "/api/show"
[GIN] 2024/02/02 - 17:27:59 | 200 |    1.425633ms |       127.0.0.1 | POST     "/api/show"
time=2024-02-02T17:28:01.622+01:00 level=INFO source=cpu_common.go:18 msg="CPU does not have vector extensions"
time=2024-02-02T17:28:01.622+01:00 level=WARN source=gpu.go:128 msg="CPU does not have AVX or AVX2, disabling GPU support."
time=2024-02-02T17:28:01.622+01:00 level=INFO source=cpu_common.go:18 msg="CPU does not have vector extensions"
time=2024-02-02T17:28:01.622+01:00 level=WARN source=gpu.go:128 msg="CPU does not have AVX or AVX2, disabling GPU support."
time=2024-02-02T17:28:01.622+01:00 level=INFO source=llm.go:77 msg="GPU not available, falling back to CPU"
loading library /tmp/ollama2276873866/cpu/libext_server.so
...

The fallback to cpu works as expected and I can it run fine abeit slowly:

 [tru@mafalda ~]$ ollama run stablelm2 <<< ' why is the sky blue? '
The color of the sky depends on several ....

Why is AVX/AXV2 required to enable the gpu part?

Thanks

Tru

Originally created by @truatpasteurdotfr on GitHub (Feb 2, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/2332 Hi, I have an old machine I would try to play with: ``` $ lscpu ... Model name: Intel(R) Xeon(R) CPU E5410 @ 2.33GHz ... Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good nopl aperfmperf eagerfpu pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 lahf_lm rsb_ctxsw tpr_shadow vnmi flexpriority dtherm ``` No AVX, but the gpu card is still supported (CC=6.1) ``` $ /c7/shared/cuda/12.1.1_530.30.02/samples/bin/x86_64/linux/release/deviceQuery ... Device 0: "NVIDIA GeForce GTX 1050 Ti" CUDA Driver Version / Runtime Version 12.2 / 12.1 CUDA Capability Major/Minor version number: 6.1 Total amount of global memory: 4038 MBytes (4234674176 bytes) (006) Multiprocessors, (128) CUDA Cores/MP: 768 CUDA Cores ... ``` I have rebuild ollama with cuda support and it is not using the gpu (although properly detected): ``` [tru@mafalda ollama]$ ./ollama --version Warning: could not connect to a running Ollama instance Warning: client version is 0.1.23-0-g09a6f76 [tru@mafalda ollama]$ ./ollama serve time=2024-02-02T17:27:46.581+01:00 level=INFO source=images.go:860 msg="total blobs: 16" time=2024-02-02T17:27:46.583+01:00 level=INFO source=images.go:867 msg="total unused blobs removed: 0" time=2024-02-02T17:27:46.585+01:00 level=INFO source=routes.go:995 msg="Listening on 127.0.0.1:11434 (version 0.1.23-0-g09a6f76)" time=2024-02-02T17:27:46.585+01:00 level=INFO source=payload_common.go:106 msg="Extracting dynamic libraries..." time=2024-02-02T17:27:58.309+01:00 level=INFO source=payload_common.go:145 msg="Dynamic LLM libraries [cpu cuda_v1_530 cpu_avx2 cpu_avx]" time=2024-02-02T17:27:58.310+01:00 level=INFO source=gpu.go:94 msg="Detecting GPU type" time=2024-02-02T17:27:58.310+01:00 level=INFO source=gpu.go:242 msg="Searching for GPU management library libnvidia-ml.so" time=2024-02-02T17:27:58.318+01:00 level=INFO source=gpu.go:288 msg="Discovered GPU libraries: [/usr/lib64/libnvidia-ml.so.535.129.03]" time=2024-02-02T17:27:58.331+01:00 level=INFO source=gpu.go:99 msg="Nvidia GPU detected" time=2024-02-02T17:27:58.332+01:00 level=INFO source=cpu_common.go:18 msg="CPU does not have vector extensions" time=2024-02-02T17:27:58.332+01:00 level=WARN source=gpu.go:128 msg="CPU does not have AVX or AVX2, disabling GPU support." time=2024-02-02T17:27:58.332+01:00 level=INFO source=routes.go:1018 msg="no GPU detected" [GIN] 2024/02/02 - 17:27:59 | 200 | 100.887µs | 127.0.0.1 | HEAD "/" [GIN] 2024/02/02 - 17:27:59 | 200 | 1.543664ms | 127.0.0.1 | POST "/api/show" [GIN] 2024/02/02 - 17:27:59 | 200 | 1.425633ms | 127.0.0.1 | POST "/api/show" time=2024-02-02T17:28:01.622+01:00 level=INFO source=cpu_common.go:18 msg="CPU does not have vector extensions" time=2024-02-02T17:28:01.622+01:00 level=WARN source=gpu.go:128 msg="CPU does not have AVX or AVX2, disabling GPU support." time=2024-02-02T17:28:01.622+01:00 level=INFO source=cpu_common.go:18 msg="CPU does not have vector extensions" time=2024-02-02T17:28:01.622+01:00 level=WARN source=gpu.go:128 msg="CPU does not have AVX or AVX2, disabling GPU support." time=2024-02-02T17:28:01.622+01:00 level=INFO source=llm.go:77 msg="GPU not available, falling back to CPU" loading library /tmp/ollama2276873866/cpu/libext_server.so ... ``` The fallback to cpu works as expected and I can it run fine abeit slowly: ``` [tru@mafalda ~]$ ollama run stablelm2 <<< ' why is the sky blue? ' The color of the sky depends on several .... ``` Why is AVX/AXV2 required to enable the gpu part? Thanks Tru
Author
Owner

@remy415 commented on GitHub (Feb 2, 2024):

Because their default llama_cpp builds using cuda have AVX enabled, and if you try to run it on a CPU that doesn't have AVX your program will crash. If you want cuda support without avx, you'll have to modify the gpu detection source code in "gpu/gpu.go" to not disable the GPU when it doesn't detect AVX, and modify the generate scripts at llm/generate/gen_linux.sh to include a CUDA build with all the extras turned off (-DLLAMA_AVX=off -DLLAMA_AVX2=off -DLLAMA_AVX512=off -DLLAMA_FMA=off -DLLAMA_F16c=off). Note that this is entirely out of scope of support and you'll be mostly on your own for that.

<!-- gh-comment-id:1924596265 --> @remy415 commented on GitHub (Feb 2, 2024): Because their default llama_cpp builds using cuda have AVX enabled, and if you try to run it on a CPU that doesn't have AVX your program will crash. If you want cuda support without avx, you'll have to modify the gpu detection source code in "gpu/gpu.go" to not disable the GPU when it doesn't detect AVX, and modify the generate scripts at llm/generate/gen_linux.sh to include a CUDA build with all the extras turned off (-DLLAMA_AVX=off -DLLAMA_AVX2=off -DLLAMA_AVX512=off -DLLAMA_FMA=off -DLLAMA_F16c=off). Note that this is entirely out of scope of support and you'll be mostly on your own for that.
Author
Owner

@truatpasteurdotfr commented on GitHub (Feb 2, 2024):

Thanks for your feedack, I went down the rabbit hole and go stuck there.

$ git diff
diff --git a/gpu/cpu_common.go b/gpu/cpu_common.go
index 3b299e4..e1caec2 100644
--- a/gpu/cpu_common.go
+++ b/gpu/cpu_common.go
@@ -15,6 +15,10 @@ func GetCPUVariant() string {
                slog.Info("CPU has AVX")
                return "avx"
        }
+       if cpu.X86.HasSSE41 {
+               slog.Info("CPU has SSE41")
+               return "sse41"
+       }
        slog.Info("CPU does not have vector extensions")
        // else LCD
        return ""
diff --git a/llm/llama.cpp b/llm/llama.cpp
--- a/llm/llama.cpp
+++ b/llm/llama.cpp
@@ -1 +1 @@
-Subproject commit d2f650cb5b04ee2726663e79b47da5efe196ce00
+Subproject commit d2f650cb5b04ee2726663e79b47da5efe196ce00-dirty

I rebuilt ollama with the custom OLLAMA_CUSTOM_CPU_DEFS="-DLLAMA_AVX=off -DLLAMA_AVX2=off -DLLAMA_F16C=off -DLLAMA_FMA=off" flags.

run it and it died with some unsupported cpu instruction a little later:

$ ./ollama serve
time=2024-02-02T18:17:31.336+01:00 level=INFO source=images.go:860 msg="total blobs: 16"
time=2024-02-02T18:17:31.337+01:00 level=INFO source=images.go:867 msg="total unused blobs removed: 0"
time=2024-02-02T18:17:31.338+01:00 level=INFO source=routes.go:995 msg="Listening on 127.0.0.1:11434 (version 0.1.23-0-g09a6f76)"
time=2024-02-02T18:17:31.338+01:00 level=INFO source=payload_common.go:106 msg="Extracting dynamic libraries..."
time=2024-02-02T18:17:43.121+01:00 level=INFO source=payload_common.go:145 msg="Dynamic LLM libraries [cuda_v1_530 cpu_avx2 cpu cpu_avx]"
time=2024-02-02T18:17:43.122+01:00 level=INFO source=gpu.go:94 msg="Detecting GPU type"
time=2024-02-02T18:17:43.122+01:00 level=INFO source=gpu.go:242 msg="Searching for GPU management library libnvidia-ml.so"
time=2024-02-02T18:17:43.128+01:00 level=INFO source=gpu.go:288 msg="Discovered GPU libraries: [/usr/lib64/libnvidia-ml.so.535.129.03]"
time=2024-02-02T18:17:43.141+01:00 level=INFO source=gpu.go:99 msg="Nvidia GPU detected"
time=2024-02-02T18:17:43.142+01:00 level=INFO source=cpu_common.go:19 msg="CPU has SSE41"
time=2024-02-02T18:17:43.147+01:00 level=INFO source=gpu.go:146 msg="CUDA Compute Capability detected: 6.1"
[GIN] 2024/02/02 - 18:19:15 | 200 |     117.257µs |       127.0.0.1 | HEAD     "/"
[GIN] 2024/02/02 - 18:19:15 | 200 |    1.899756ms |       127.0.0.1 | GET      "/api/tags"
[GIN] 2024/02/02 - 18:19:25 | 200 |      47.734µs |       127.0.0.1 | HEAD     "/"
[GIN] 2024/02/02 - 18:19:25 | 200 |    2.418251ms |       127.0.0.1 | POST     "/api/show"
[GIN] 2024/02/02 - 18:19:25 | 200 |      788.16µs |       127.0.0.1 | POST     "/api/show"
time=2024-02-02T18:19:26.848+01:00 level=INFO source=cpu_common.go:19 msg="CPU has SSE41"
time=2024-02-02T18:19:26.848+01:00 level=INFO source=cpu_common.go:19 msg="CPU has SSE41"
time=2024-02-02T18:19:26.848+01:00 level=INFO source=gpu.go:146 msg="CUDA Compute Capability detected: 6.1"
time=2024-02-02T18:19:26.848+01:00 level=INFO source=cpu_common.go:19 msg="CPU has SSE41"
time=2024-02-02T18:19:26.848+01:00 level=INFO source=gpu.go:146 msg="CUDA Compute Capability detected: 6.1"
time=2024-02-02T18:19:26.848+01:00 level=INFO source=cpu_common.go:19 msg="CPU has SSE41"
loading library /tmp/ollama2581997617/cuda_v1_530/libext_server.so
SIGILL: illegal instruction
PC=0x7fd8b51cc06c m=7 sigcode=2
signal arrived during cgo execution
instruction bytes: 0xc5 0xf9 0xef 0xc0 0x41 0x54 0x4c 0x8d 0x24 0xd5 0x0 0x0 0x0 0x0 0x55 0x53

using gdb the traces are:

Program received signal SIGILL, Illegal instruction.
[Switching to Thread 0x7fff9d7fa700 (LWP 2665)]
0x00007fff8718105c in std::vector<std::pair<unsigned int, unsigned int>, std::allocator<std::pair<unsigned int, unsigned int> > >::vector(std::initializer_list<std::pair<unsigned int, unsigned int> >, std::allocator<std::pair<unsigned int, unsigned int> > const&) ()
   from /tmp/ollama1432208079/cuda_v0_520/libext_server.so
Missing separate debuginfos, use: debuginfo-install glibc-2.17-326.el7_9.x86_64 libgcc-4.8.5-44.el7.x86_64 libstdc++-4.8.5-44.el7.x86_64
(gdb)
(gdb) x/i $pc
=> 0x7fff8718105c <_ZNSt6vectorISt4pairIjjESaIS1_EEC2ESt16initializer_listIS1_ERKS2_+12>:
    vpxor  %xmm0,%xmm0,%xmm0

and vpxor is AVX ...

Now I need to find why AVX instructions are defined in the cuda build!

<!-- gh-comment-id:1924810997 --> @truatpasteurdotfr commented on GitHub (Feb 2, 2024): Thanks for your feedack, I went down the rabbit hole and go stuck there. ``` $ git diff diff --git a/gpu/cpu_common.go b/gpu/cpu_common.go index 3b299e4..e1caec2 100644 --- a/gpu/cpu_common.go +++ b/gpu/cpu_common.go @@ -15,6 +15,10 @@ func GetCPUVariant() string { slog.Info("CPU has AVX") return "avx" } + if cpu.X86.HasSSE41 { + slog.Info("CPU has SSE41") + return "sse41" + } slog.Info("CPU does not have vector extensions") // else LCD return "" diff --git a/llm/llama.cpp b/llm/llama.cpp --- a/llm/llama.cpp +++ b/llm/llama.cpp @@ -1 +1 @@ -Subproject commit d2f650cb5b04ee2726663e79b47da5efe196ce00 +Subproject commit d2f650cb5b04ee2726663e79b47da5efe196ce00-dirty ``` I rebuilt ollama with the custom `OLLAMA_CUSTOM_CPU_DEFS="-DLLAMA_AVX=off -DLLAMA_AVX2=off -DLLAMA_F16C=off -DLLAMA_FMA=off"` flags. run it and it died with some unsupported cpu instruction a little later: ``` $ ./ollama serve time=2024-02-02T18:17:31.336+01:00 level=INFO source=images.go:860 msg="total blobs: 16" time=2024-02-02T18:17:31.337+01:00 level=INFO source=images.go:867 msg="total unused blobs removed: 0" time=2024-02-02T18:17:31.338+01:00 level=INFO source=routes.go:995 msg="Listening on 127.0.0.1:11434 (version 0.1.23-0-g09a6f76)" time=2024-02-02T18:17:31.338+01:00 level=INFO source=payload_common.go:106 msg="Extracting dynamic libraries..." time=2024-02-02T18:17:43.121+01:00 level=INFO source=payload_common.go:145 msg="Dynamic LLM libraries [cuda_v1_530 cpu_avx2 cpu cpu_avx]" time=2024-02-02T18:17:43.122+01:00 level=INFO source=gpu.go:94 msg="Detecting GPU type" time=2024-02-02T18:17:43.122+01:00 level=INFO source=gpu.go:242 msg="Searching for GPU management library libnvidia-ml.so" time=2024-02-02T18:17:43.128+01:00 level=INFO source=gpu.go:288 msg="Discovered GPU libraries: [/usr/lib64/libnvidia-ml.so.535.129.03]" time=2024-02-02T18:17:43.141+01:00 level=INFO source=gpu.go:99 msg="Nvidia GPU detected" time=2024-02-02T18:17:43.142+01:00 level=INFO source=cpu_common.go:19 msg="CPU has SSE41" time=2024-02-02T18:17:43.147+01:00 level=INFO source=gpu.go:146 msg="CUDA Compute Capability detected: 6.1" [GIN] 2024/02/02 - 18:19:15 | 200 | 117.257µs | 127.0.0.1 | HEAD "/" [GIN] 2024/02/02 - 18:19:15 | 200 | 1.899756ms | 127.0.0.1 | GET "/api/tags" [GIN] 2024/02/02 - 18:19:25 | 200 | 47.734µs | 127.0.0.1 | HEAD "/" [GIN] 2024/02/02 - 18:19:25 | 200 | 2.418251ms | 127.0.0.1 | POST "/api/show" [GIN] 2024/02/02 - 18:19:25 | 200 | 788.16µs | 127.0.0.1 | POST "/api/show" time=2024-02-02T18:19:26.848+01:00 level=INFO source=cpu_common.go:19 msg="CPU has SSE41" time=2024-02-02T18:19:26.848+01:00 level=INFO source=cpu_common.go:19 msg="CPU has SSE41" time=2024-02-02T18:19:26.848+01:00 level=INFO source=gpu.go:146 msg="CUDA Compute Capability detected: 6.1" time=2024-02-02T18:19:26.848+01:00 level=INFO source=cpu_common.go:19 msg="CPU has SSE41" time=2024-02-02T18:19:26.848+01:00 level=INFO source=gpu.go:146 msg="CUDA Compute Capability detected: 6.1" time=2024-02-02T18:19:26.848+01:00 level=INFO source=cpu_common.go:19 msg="CPU has SSE41" loading library /tmp/ollama2581997617/cuda_v1_530/libext_server.so SIGILL: illegal instruction PC=0x7fd8b51cc06c m=7 sigcode=2 signal arrived during cgo execution instruction bytes: 0xc5 0xf9 0xef 0xc0 0x41 0x54 0x4c 0x8d 0x24 0xd5 0x0 0x0 0x0 0x0 0x55 0x53 ``` using gdb the traces are: ``` Program received signal SIGILL, Illegal instruction. [Switching to Thread 0x7fff9d7fa700 (LWP 2665)] 0x00007fff8718105c in std::vector<std::pair<unsigned int, unsigned int>, std::allocator<std::pair<unsigned int, unsigned int> > >::vector(std::initializer_list<std::pair<unsigned int, unsigned int> >, std::allocator<std::pair<unsigned int, unsigned int> > const&) () from /tmp/ollama1432208079/cuda_v0_520/libext_server.so Missing separate debuginfos, use: debuginfo-install glibc-2.17-326.el7_9.x86_64 libgcc-4.8.5-44.el7.x86_64 libstdc++-4.8.5-44.el7.x86_64 (gdb) (gdb) x/i $pc => 0x7fff8718105c <_ZNSt6vectorISt4pairIjjESaIS1_EEC2ESt16initializer_listIS1_ERKS2_+12>: vpxor %xmm0,%xmm0,%xmm0 ``` and `vpxor` is AVX ... Now I need to find why AVX instructions are defined in the cuda build!
Author
Owner

@remy415 commented on GitHub (Feb 2, 2024):

Yes, that’s what I mean by their default CUDA builds include avx. cpu_common.go only reports what extensions are present. You need to modify the CUDA section of llm/generate/build_linux.sh and watch for the CMAKE_COMMON_DEFS variable at the top as it enables avx too. Then you need to modify the section in gpu/gpu.go that says something about CpuVariant()==“” && arch==“x86_64” or something like that (I’m on my phone, sorry) and change the if statement to not trigger; you could simply set it to “x86_64a” or whatever since you’re not publishing the source code and it will just skip the check when it builds.

<!-- gh-comment-id:1924854411 --> @remy415 commented on GitHub (Feb 2, 2024): Yes, that’s what I mean by their default CUDA builds include avx. cpu_common.go only reports what extensions are present. You need to modify the CUDA section of llm/generate/build_linux.sh and watch for the CMAKE_COMMON_DEFS variable at the top as it enables avx too. Then you need to modify the section in gpu/gpu.go that says something about CpuVariant()==“” && arch==“x86_64” or something like that (I’m on my phone, sorry) and change the if statement to not trigger; you could simply set it to “x86_64a” or whatever since you’re not publishing the source code and it will just skip the check when it builds.
Author
Owner

@remy415 commented on GitHub (Feb 2, 2024):

Essentially the code shouldn’t need too much modification other than the build_linux.sh will need a completely custom line under CUDA and it will need the -DLLAMA_AVX=yes part changed to “no” at the top of the script in the cmake common defs line.

<!-- gh-comment-id:1924864443 --> @remy415 commented on GitHub (Feb 2, 2024): Essentially the code shouldn’t need too much modification other than the build_linux.sh will need a completely custom line under CUDA and it will need the -DLLAMA_AVX=yes part changed to “no” at the top of the script in the cmake common defs line.
Author
Owner

@truatpasteurdotfr commented on GitHub (Feb 2, 2024):

the issue was that previous build left AVX code all around and that were not remove when I ran:

OLLAMA_CUSTOM_CPU_DEFS="-DLLAMA_AVX=off -DLLAMA_AVX2=off -DLLAMA_F16C=off -DLLAMA_FMA=off" go generate ./...
OLLAMA_CUSTOM_CPU_DEFS="-DLLAMA_AVX=off -DLLAMA_AVX2=off -DLLAMA_F16C=off -DLLAMA_FMA=off" go build .

When I cleared llm/llama.cpp/build/linux/x86_64/ before running the go commands it worked!

Summary of the changes for this custom build:

$ git diff
diff --git a/gpu/cpu_common.go b/gpu/cpu_common.go
index 3b299e4..e1caec2 100644
--- a/gpu/cpu_common.go
+++ b/gpu/cpu_common.go
@@ -15,6 +15,10 @@ func GetCPUVariant() string {
                slog.Info("CPU has AVX")
                return "avx"
        }
+       if cpu.X86.HasSSE41 {
+               slog.Info("CPU has SSE41")
+               return "sse41"
+       }
        slog.Info("CPU does not have vector extensions")
        // else LCD
        return ""
diff --git a/llm/generate/gen_common.sh b/llm/generate/gen_common.sh
index b359936..2cdfab7 100644
--- a/llm/generate/gen_common.sh
+++ b/llm/generate/gen_common.sh
@@ -40,7 +40,7 @@ init_vars() {
         ;;
     esac
     if [ -z "${CMAKE_CUDA_ARCHITECTURES}" ] ; then 
-        CMAKE_CUDA_ARCHITECTURES="50;52;61;70;75;80"
+        CMAKE_CUDA_ARCHITECTURES="61"
     fi
 }
 
diff --git a/llm/generate/gen_linux.sh b/llm/generate/gen_linux.sh
index 82c8c75..ddd253c 100755
--- a/llm/generate/gen_linux.sh
+++ b/llm/generate/gen_linux.sh
@@ -49,7 +49,7 @@ if [ -z "${CUDACXX}" ]; then
         export CUDACXX=$(command -v nvcc)
     fi
 fi
-COMMON_CMAKE_DEFS="-DCMAKE_POSITION_INDEPENDENT_CODE=on -DLLAMA_NATIVE=off -DLLAMA_AVX=on -DLLAMA_AVX2=off -DLLAMA_AVX512=off -DLLAMA_FMA=off -DLLAMA_F16C=off"
+COMMON_CMAKE_DEFS="-DCMAKE_POSITION_INDEPENDENT_CODE=on -DLLAMA_NATIVE=off -DLLAMA_AVX=off -DLLAMA_AVX2=off -DLLAMA_AVX512=off -DLLAMA_FMA=off -DLLAMA_F16C=off"
 source $(dirname $0)/gen_common.sh
 init_vars
 git_module_setup
<!-- gh-comment-id:1924898391 --> @truatpasteurdotfr commented on GitHub (Feb 2, 2024): the issue was that previous build left AVX code all around and that were not remove when I ran: ``` OLLAMA_CUSTOM_CPU_DEFS="-DLLAMA_AVX=off -DLLAMA_AVX2=off -DLLAMA_F16C=off -DLLAMA_FMA=off" go generate ./... OLLAMA_CUSTOM_CPU_DEFS="-DLLAMA_AVX=off -DLLAMA_AVX2=off -DLLAMA_F16C=off -DLLAMA_FMA=off" go build . ``` When I cleared `llm/llama.cpp/build/linux/x86_64/` before running the go commands it worked! Summary of the changes for this custom build: ``` $ git diff diff --git a/gpu/cpu_common.go b/gpu/cpu_common.go index 3b299e4..e1caec2 100644 --- a/gpu/cpu_common.go +++ b/gpu/cpu_common.go @@ -15,6 +15,10 @@ func GetCPUVariant() string { slog.Info("CPU has AVX") return "avx" } + if cpu.X86.HasSSE41 { + slog.Info("CPU has SSE41") + return "sse41" + } slog.Info("CPU does not have vector extensions") // else LCD return "" diff --git a/llm/generate/gen_common.sh b/llm/generate/gen_common.sh index b359936..2cdfab7 100644 --- a/llm/generate/gen_common.sh +++ b/llm/generate/gen_common.sh @@ -40,7 +40,7 @@ init_vars() { ;; esac if [ -z "${CMAKE_CUDA_ARCHITECTURES}" ] ; then - CMAKE_CUDA_ARCHITECTURES="50;52;61;70;75;80" + CMAKE_CUDA_ARCHITECTURES="61" fi } diff --git a/llm/generate/gen_linux.sh b/llm/generate/gen_linux.sh index 82c8c75..ddd253c 100755 --- a/llm/generate/gen_linux.sh +++ b/llm/generate/gen_linux.sh @@ -49,7 +49,7 @@ if [ -z "${CUDACXX}" ]; then export CUDACXX=$(command -v nvcc) fi fi -COMMON_CMAKE_DEFS="-DCMAKE_POSITION_INDEPENDENT_CODE=on -DLLAMA_NATIVE=off -DLLAMA_AVX=on -DLLAMA_AVX2=off -DLLAMA_AVX512=off -DLLAMA_FMA=off -DLLAMA_F16C=off" +COMMON_CMAKE_DEFS="-DCMAKE_POSITION_INDEPENDENT_CODE=on -DLLAMA_NATIVE=off -DLLAMA_AVX=off -DLLAMA_AVX2=off -DLLAMA_AVX512=off -DLLAMA_FMA=off -DLLAMA_F16C=off" source $(dirname $0)/gen_common.sh init_vars git_module_setup ```
Author
Owner

@remy415 commented on GitHub (Feb 2, 2024):

The variable CMAKE_CUDA_ARCHITECTURES can be set as an environment variable and it will update it without having to change the script. Was it able to accommodate building in sse?

I completely forgot about cpp build artifacts, good catch! Easy way to clear it next time is to run “go clean” before you run generate and build again

<!-- gh-comment-id:1924905851 --> @remy415 commented on GitHub (Feb 2, 2024): The variable CMAKE_CUDA_ARCHITECTURES can be set as an environment variable and it will update it without having to change the script. Was it able to accommodate building in sse? I completely forgot about cpp build artifacts, good catch! Easy way to clear it next time is to run “go clean” before you run generate and build again
Author
Owner

@truatpasteurdotfr commented on GitHub (Feb 3, 2024):

thanks for helpful discussion, closing the issue

<!-- gh-comment-id:1925377200 --> @truatpasteurdotfr commented on GitHub (Feb 3, 2024): thanks for helpful discussion, closing the issue
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#27107