[GH-ISSUE #2903] msg="CPU does not have AVX or AVX2, disabling GPU support." #48290

Closed
opened 2026-04-28 07:36:11 -05:00 by GiteaMirror · 4 comments
Owner

Originally created by @digicr on GitHub (Mar 3, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/2903

winserver2022 old cpuX5675 GPU RTX3070 CUDA11.8

Originally created by @digicr on GitHub (Mar 3, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/2903 winserver2022 old cpuX5675 GPU RTX3070 CUDA11.8
Author
Owner

@digicr commented on GitHub (Mar 3, 2024):

I have used several large models in local deployment on this CPU server that does not support the AVX instruction set. The calls to GPUs work well for all of them, except for Ollama's Windows version, which displays a message saying "No AVX
instructions," and forces GPU work to be abandoned.

<!-- gh-comment-id:1975330577 --> @digicr commented on GitHub (Mar 3, 2024): I have used several large models in local deployment on this CPU server that does not support the AVX instruction set. The calls to GPUs work well for all of them, except for Ollama's Windows version, which displays a message saying "No AVX instructions," and forces GPU work to be abandoned.
Author
Owner

@easp commented on GitHub (Mar 3, 2024):

Ollama only compiles GPU libraries for AVX. I believe the choice was made in order to reduce the number of permutations they have to compile for.

If reducing the # of permutations is the goal, it seems more important to support GPUs on old CPUs than it does to support CPU-only inference on old CPUs (since it is so slow).

<!-- gh-comment-id:1975380986 --> @easp commented on GitHub (Mar 3, 2024): Ollama only compiles GPU libraries for AVX. I believe the choice was made in order to reduce the number of permutations they have to compile for. If reducing the # of permutations is the goal, it seems more important to support GPUs on old CPUs than it does to support CPU-only inference on old CPUs (since it is so slow).
Author
Owner

@remy415 commented on GitHub (Mar 5, 2024):

@digicr AVX was introduced > 10 years ago, and the X5675 runs on PCIe Gen 2 (and depending on your motherboard, may offer x16 or x8 lanes per physical slot) which will certainly impact GPU performance (minimally if PCIe Gen 2 x16, but may be substantial if lane is x8).

If you're keen on not upgrading your system hardware, you could always modify the code to support your system. Note you will need to ensure you have the dev environment configured according to the Ollama documentation here:
git clone https://github.com/ollama/ollama.git && cd ollama
Edit these two files:
llm/generate/gen_linux.sh: line 142: add -DLLAMA_AVX=off -DLLAMA_AVX2=off to the end.
gpu/gpu.go: line 133: change the first && to ||: if gpuHandles.cuda != nil && (cpuVariant != "" || runtime.GOARCH != "amd64") { -> if gpuHandles.cuda != nil || (cpuVariant != "" || runtime.GOARCH != "amd64") {

Then from the base folder:
OLLAMA_SKIP_CPU_GENERATE=1 go generate ./... && go build . (skip CPU generate since you're only using CUDA build)

The resulting binary should work with your configuration.

<!-- gh-comment-id:1979642779 --> @remy415 commented on GitHub (Mar 5, 2024): @digicr AVX was introduced > 10 years ago, and the X5675 runs on PCIe Gen 2 (and depending on your motherboard, may offer x16 or x8 lanes per physical slot) which will certainly impact GPU performance (minimally if PCIe Gen 2 x16, but may be substantial if lane is x8). If you're keen on not upgrading your system hardware, you could always modify the code to support your system. Note you will need to ensure you have the dev environment configured according to the Ollama documentation [here](https://github.com/ollama/ollama/blob/main/docs/development.md): `git clone https://github.com/ollama/ollama.git && cd ollama` Edit these two files: llm/generate/gen_linux.sh: line 142: add `-DLLAMA_AVX=off -DLLAMA_AVX2=off` to the end. gpu/gpu.go: line 133: change the first `&&` to `||`: `if gpuHandles.cuda != nil && (cpuVariant != "" || runtime.GOARCH != "amd64") {` -> `if gpuHandles.cuda != nil || (cpuVariant != "" || runtime.GOARCH != "amd64") {` Then from the base folder: `OLLAMA_SKIP_CPU_GENERATE=1 go generate ./... && go build .` (skip CPU generate since you're only using CUDA build) The resulting binary should work with your configuration.
Author
Owner

@dhiltgen commented on GitHub (Mar 6, 2024):

Let's close this as a dup of #2187 so we can keep track of the popularity of that feature request.

<!-- gh-comment-id:1981330535 --> @dhiltgen commented on GitHub (Mar 6, 2024): Let's close this as a dup of #2187 so we can keep track of the popularity of that feature request.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#48290