[GH-ISSUE #1435] Ollama and Xeon 5660 #26527

Closed
opened 2026-04-22 02:51:27 -05:00 by GiteaMirror · 9 comments
Owner

Originally created by @Andreh1982 on GitHub (Dec 8, 2023).
Original GitHub issue: https://github.com/ollama/ollama/issues/1435

Hello! I'm facing an issue running Ollama in a Dell T610 server with 64GB ram and Xeon 5660:

Dec 8 15:52:15 constellation kernel: [ 1529.028302] traps: ollama-runner[2213] trap invalid opcode ip:472d79 sp:7ffdbe33f680 error:0 in ollama-runner[407000+da000] Dec 8 15:52:15 constellation ollama[2161]: 2023/12/08 15:52:15 llama.go:436: signal: illegal instruction (core dumped)

Please, any help in this situation? The processor is too old to run llm models? :(

Originally created by @Andreh1982 on GitHub (Dec 8, 2023). Original GitHub issue: https://github.com/ollama/ollama/issues/1435 Hello! I'm facing an issue running Ollama in a Dell T610 server with 64GB ram and Xeon 5660: `Dec 8 15:52:15 constellation kernel: [ 1529.028302] traps: ollama-runner[2213] trap invalid opcode ip:472d79 sp:7ffdbe33f680 error:0 in ollama-runner[407000+da000] Dec 8 15:52:15 constellation ollama[2161]: 2023/12/08 15:52:15 llama.go:436: signal: illegal instruction (core dumped)` Please, any help in this situation? The processor is too old to run llm models? :(
Author
Owner

@easp commented on GitHub (Dec 8, 2023):

Ollama is compiled to use the AVX instruction set instructions. That CPU was released well over a decade ago and doesn't support AVX.
https://ark.intel.com/content/www/us/en/ark/products/47921/intel-xeon-processor-x5660-12m-cache-2-80-ghz-6-40-gt-s-intel-qpi.html

There has been some talk about relaxing that requirement, but for now your only option (besides different hardware) is downloading the repo, changing compile options, and compiling your own version. I believe that all the compile options that need to be changed are in this file: /llm/llama.cpp/generate_linux.go

<!-- gh-comment-id:1847853281 --> @easp commented on GitHub (Dec 8, 2023): Ollama is compiled to use the AVX instruction set instructions. That CPU was released well over a decade ago and doesn't support AVX. https://ark.intel.com/content/www/us/en/ark/products/47921/intel-xeon-processor-x5660-12m-cache-2-80-ghz-6-40-gt-s-intel-qpi.html There has been some talk about relaxing that requirement, but for now your only option (besides different hardware) is downloading the repo, changing compile options, and compiling your own version. I believe that all the compile options that need to be changed are in this file: /llm/llama.cpp/generate_linux.go
Author
Owner

@technovangelist commented on GitHub (Dec 8, 2023):

I think it is a bit old. That was released in 2010 and AVX instructions started showing up in 2011. And since there is no mention on the Intel page about the 5660, I think it may be missing that critical component, and maybe others. We really want Ollama to be usable by everyone who wants to try it and hope to find a way to accommodate this in the future. I have created an issue to see if there is a way to get around this that we can track : #1442

<!-- gh-comment-id:1847853920 --> @technovangelist commented on GitHub (Dec 8, 2023): I think it is a bit old. That was released in 2010 and AVX instructions started showing up in 2011. And since there is no mention on the Intel page about the 5660, I think it may be missing that critical component, and maybe others. We really want Ollama to be usable by everyone who wants to try it and hope to find a way to accommodate this in the future. I have created an issue to see if there is a way to get around this that we can track : #1442
Author
Owner

@Cybervet commented on GitHub (Dec 9, 2023):

Just recompiled the repo and I can confirm that runs fine in CPUs without AVX. Running it in a proxmox LXC debian container.
Only thing I did was set AVX=off to /llm/llama.cpp/generate_linux.go

<!-- gh-comment-id:1848604011 --> @Cybervet commented on GitHub (Dec 9, 2023): Just recompiled the repo and I can confirm that runs fine in CPUs without AVX. Running it in a proxmox LXC debian container. Only thing I did was set AVX=off to /llm/llama.cpp/generate_linux.go
Author
Owner

@Andreh1982 commented on GitHub (Dec 9, 2023):

Thank you so much, guys! Got success here, disabling AVX direct on llama.cpp also did the trick. Cheers! :)

<!-- gh-comment-id:1848646738 --> @Andreh1982 commented on GitHub (Dec 9, 2023): Thank you so much, guys! Got success here, disabling AVX direct on llama.cpp also did the trick. Cheers! :)
Author
Owner

@Andreh1982 commented on GitHub (Dec 9, 2023):

Oh by the way, i made it on llama.cpp based on the messages on this issue, thank you!

Just recompiled the repo and I can confirm that runs fine in CPUs without AVX. Running it in a proxmox LXC debian container. Only thing I did was set AVX=off to /llm/llama.cpp/generate_linux.go

did you get any performance issues?

<!-- gh-comment-id:1848660974 --> @Andreh1982 commented on GitHub (Dec 9, 2023): Oh by the way, i made it on llama.cpp based on the messages on this issue, thank you! > Just recompiled the repo and I can confirm that runs fine in CPUs without AVX. Running it in a proxmox LXC debian container. Only thing I did was set AVX=off to /llm/llama.cpp/generate_linux.go did you get any performance issues?
Author
Owner

@Cybervet commented on GitHub (Dec 9, 2023):

Its a bit slow, but that is expected , 2.5 -5 (Orca2) tokens/sec with 4 cores assigned to the container. I am trying now to pass through the GPU to see if it will make a difference.

<!-- gh-comment-id:1848677961 --> @Cybervet commented on GitHub (Dec 9, 2023): Its a bit slow, but that is expected , 2.5 -5 (Orca2) tokens/sec with 4 cores assigned to the container. I am trying now to pass through the GPU to see if it will make a difference.
Author
Owner

@Andreh1982 commented on GitHub (Dec 9, 2023):

Its a bit slow, but that is expected , 2.5 -5 (Orca2) tokens/sec with 4 cores assigned to the container. I am trying now to pass through the GPU to see if it will make a difference.

here the first run is really slow, but after that, goes fine i guess. I will try later to emulate the flag using a custom cpu in proxmox. Not having AVX instructions is a problem, many other services need it like mongodb for example...

<!-- gh-comment-id:1848680911 --> @Andreh1982 commented on GitHub (Dec 9, 2023): > Its a bit slow, but that is expected , 2.5 -5 (Orca2) tokens/sec with 4 cores assigned to the container. I am trying now to pass through the GPU to see if it will make a difference. here the first run is really slow, but after that, goes fine i guess. I will try later to emulate the flag using a custom cpu in proxmox. Not having AVX instructions is a problem, many other services need it like mongodb for example...
Author
Owner

@Cybervet commented on GitHub (Dec 9, 2023):

Yeap I know , I just have a couple of workstations with dual cpus around and I thought to try them out.

<!-- gh-comment-id:1848684459 --> @Cybervet commented on GitHub (Dec 9, 2023): Yeap I know , I just have a couple of workstations with dual cpus around and I thought to try them out.
Author
Owner

@Andreh1982 commented on GitHub (Dec 9, 2023):

I just bought one past week... haha, didnt expected this kind of problem :(

<!-- gh-comment-id:1848718847 --> @Andreh1982 commented on GitHub (Dec 9, 2023): I just bought one past week... haha, didnt expected this kind of problem :(
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#26527