[GH-ISSUE #1279] Support CPUs without AVX #47170

Closed
opened 2026-04-28 03:24:13 -05:00 by GiteaMirror · 10 comments
Owner

Originally created by @jmorganca on GitHub (Nov 26, 2023).
Original GitHub issue: https://github.com/ollama/ollama/issues/1279

Originally assigned to: @dhiltgen on GitHub.

Currently CPU instructions are determined at build time, meaning Ollama needs to target instruction sets that support the largest set of CPUs possible. Instead, CPU instructions should be detected at runtime allowing for both speed and compatibility with older/less powerful CPUs

Originally created by @jmorganca on GitHub (Nov 26, 2023). Original GitHub issue: https://github.com/ollama/ollama/issues/1279 Originally assigned to: @dhiltgen on GitHub. Currently CPU instructions are determined at build time, meaning Ollama needs to target instruction sets that support the largest set of CPUs possible. Instead, CPU instructions should be detected at runtime allowing for both speed and compatibility with older/less powerful CPUs
GiteaMirror added the feature request label 2026-04-28 03:24:13 -05:00
Author
Owner

@JRM73 commented on GitHub (Nov 28, 2023):

Great news! Can't wait. I can't afford to change computers, I have to make do with my old processors. I hope to be able to run Ollama on them soon. Thanks Jeffrey!

<!-- gh-comment-id:1829377389 --> @JRM73 commented on GitHub (Nov 28, 2023): Great news! Can't wait. I can't afford to change computers, I have to make do with my old processors. I hope to be able to run Ollama on them soon. Thanks Jeffrey!
Author
Owner

@jyap808 commented on GitHub (Dec 12, 2023):

For anyone wondering, here's how you can manually disable AVX to build Ollama.

$ git diff
diff --git a/llm/llama.cpp/generate_linux.go b/llm/llama.cpp/generate_linux.go
index ce9e78a..77c9795 100644
--- a/llm/llama.cpp/generate_linux.go
+++ b/llm/llama.cpp/generate_linux.go
@@ -14,13 +14,13 @@ package llm
 //go:generate git submodule update --force gguf
 //go:generate git -C gguf apply ../patches/0001-copy-cuda-runtime-libraries.patch
 //go:generate git -C gguf apply ../patches/0001-update-default-log-target.patch
-//go:generate cmake -S gguf -B gguf/build/cpu -DLLAMA_K_QUANTS=on -DLLAMA_NATIVE=off -DLLAMA_AVX=on -DLLAMA_AVX2=off -DLLAMA_AVX512=off -DLLAMA_FMA=off -DLLAMA_F16C=off
+//go:generate cmake -S gguf -B gguf/build/cpu -DLLAMA_K_QUANTS=on -DLLAMA_NATIVE=off -DLLAMA_AVX=off -DLLAMA_AVX2=off -DLLAMA_AVX512=off -DLLAMA_FMA=off -DLLAMA_F16C=off
 //go:generate cmake --build gguf/build/cpu --target server --config Release
 //go:generate mv gguf/build/cpu/bin/server gguf/build/cpu/bin/ollama-runner

 //go:generate cmake -S ggml -B ggml/build/cuda -DLLAMA_CUBLAS=on -DLLAMA_ACCELERATE=on -DLLAMA_K_QUANTS=on
 //go:generate cmake --build ggml/build/cuda --target server --config Release
 //go:generate mv ggml/build/cuda/bin/server ggml/build/cuda/bin/ollama-runner
-//go:generate cmake -S gguf -B gguf/build/cuda -DLLAMA_CUBLAS=on -DLLAMA_ACCELERATE=on -DLLAMA_K_QUANTS=on -DLLAMA_NATIVE=off -DLLAMA_AVX=on -DLLAMA_AVX2=off -DLLAMA_AVX512=off -DLLAMA_FMA=off -DLLAMA_F16C=off -DLLAMA_CUDA_PEER_MAX_BATCH_SIZE=0
+//go:generate cmake -S gguf -B gguf/build/cuda -DLLAMA_CUBLAS=on -DLLAMA_ACCELERATE=on -DLLAMA_K_QUANTS=on -DLLAMA_NATIVE=off -DLLAMA_AVX=off -DLLAMA_AVX2=off -DLLAMA_AVX512=off -DLLAMA_FMA=off -DLLAMA_F16C=off -DLLAMA_CUDA_PEER_MAX_BATCH_SIZE=0
 //go:generate cmake --build gguf/build/cuda --target server --config Release
 //go:generate mv gguf/build/cuda/bin/server gguf/build/cuda/bin/ollama-runner
<!-- gh-comment-id:1852797709 --> @jyap808 commented on GitHub (Dec 12, 2023): For anyone wondering, here's how you can manually disable AVX to build Ollama. ``` $ git diff diff --git a/llm/llama.cpp/generate_linux.go b/llm/llama.cpp/generate_linux.go index ce9e78a..77c9795 100644 --- a/llm/llama.cpp/generate_linux.go +++ b/llm/llama.cpp/generate_linux.go @@ -14,13 +14,13 @@ package llm //go:generate git submodule update --force gguf //go:generate git -C gguf apply ../patches/0001-copy-cuda-runtime-libraries.patch //go:generate git -C gguf apply ../patches/0001-update-default-log-target.patch -//go:generate cmake -S gguf -B gguf/build/cpu -DLLAMA_K_QUANTS=on -DLLAMA_NATIVE=off -DLLAMA_AVX=on -DLLAMA_AVX2=off -DLLAMA_AVX512=off -DLLAMA_FMA=off -DLLAMA_F16C=off +//go:generate cmake -S gguf -B gguf/build/cpu -DLLAMA_K_QUANTS=on -DLLAMA_NATIVE=off -DLLAMA_AVX=off -DLLAMA_AVX2=off -DLLAMA_AVX512=off -DLLAMA_FMA=off -DLLAMA_F16C=off //go:generate cmake --build gguf/build/cpu --target server --config Release //go:generate mv gguf/build/cpu/bin/server gguf/build/cpu/bin/ollama-runner //go:generate cmake -S ggml -B ggml/build/cuda -DLLAMA_CUBLAS=on -DLLAMA_ACCELERATE=on -DLLAMA_K_QUANTS=on //go:generate cmake --build ggml/build/cuda --target server --config Release //go:generate mv ggml/build/cuda/bin/server ggml/build/cuda/bin/ollama-runner -//go:generate cmake -S gguf -B gguf/build/cuda -DLLAMA_CUBLAS=on -DLLAMA_ACCELERATE=on -DLLAMA_K_QUANTS=on -DLLAMA_NATIVE=off -DLLAMA_AVX=on -DLLAMA_AVX2=off -DLLAMA_AVX512=off -DLLAMA_FMA=off -DLLAMA_F16C=off -DLLAMA_CUDA_PEER_MAX_BATCH_SIZE=0 +//go:generate cmake -S gguf -B gguf/build/cuda -DLLAMA_CUBLAS=on -DLLAMA_ACCELERATE=on -DLLAMA_K_QUANTS=on -DLLAMA_NATIVE=off -DLLAMA_AVX=off -DLLAMA_AVX2=off -DLLAMA_AVX512=off -DLLAMA_FMA=off -DLLAMA_F16C=off -DLLAMA_CUDA_PEER_MAX_BATCH_SIZE=0 //go:generate cmake --build gguf/build/cuda --target server --config Release //go:generate mv gguf/build/cuda/bin/server gguf/build/cuda/bin/ollama-runner ```
Author
Owner

@khromov commented on GitHub (Jan 15, 2024):

I was trying to run ollama on a Intel® Pentium® Silver N6005 (Released in 2021!) and it does apparently not support AVX so Ollama doesn't work. So it's definitely something that affects newer processors as well.

Compiling from scratch as per the README file does work.

2024/01/15 23:59:10 cpu_common.go:18: CPU does not have vector extensions
<!-- gh-comment-id:1892888048 --> @khromov commented on GitHub (Jan 15, 2024): I was trying to run `ollama` on a Intel® Pentium® Silver N6005 (Released in **2021**!) and it does apparently not support AVX so Ollama doesn't work. So it's definitely something that affects newer processors as well. Compiling from scratch as per the `README` file does work. ``` 2024/01/15 23:59:10 cpu_common.go:18: CPU does not have vector extensions ```
Author
Owner

@dhiltgen commented on GitHub (Jan 20, 2024):

With release 0.1.21 we now support multiple CPU optimized variants of the LLM library. The system will auto-detect the capabilities of the CPU and select one of AVX2, AVX, or unoptimized. This works on linux, mac, and windows. In particular, the unoptimized variant works under Rosetta now.

<!-- gh-comment-id:1902419922 --> @dhiltgen commented on GitHub (Jan 20, 2024): With [release 0.1.21](https://github.com/jmorganca/ollama/releases/tag/v0.1.21) we now support multiple CPU optimized variants of the LLM library. The system will auto-detect the capabilities of the CPU and select one of AVX2, AVX, or unoptimized. This works on linux, mac, and windows. In particular, the unoptimized variant works under Rosetta now.
Author
Owner

@GuiPoM commented on GitHub (Jan 22, 2024):

With release 0.1.21 we now support multiple CPU optimized variants of the LLM library. The system will auto-detect the capabilities of the CPU and select one of AVX2, AVX, or unoptimized. This works on linux, mac, and windows. In particular, the unoptimized variant works under Rosetta now.

Hello. Is it also true for the docker image?
I am not 100% sure that my issue is related, but I tried to debug and the docker container crashed for an error linked to CPU instructions.
My Intel G6400 does not ave AVX nor AVX2 but SSE 4.1 and 4.2.
Could it be linked to a bad detection of the set of instructions it supports ?
https://github.com/jmorganca/ollama/issues/2122
edit: looking to the release date of the docker image (11 days ago) it must be using a version older than 0.1.21, so which is not implementing this enhancement.

<!-- gh-comment-id:1904863728 --> @GuiPoM commented on GitHub (Jan 22, 2024): > With [release 0.1.21](https://github.com/jmorganca/ollama/releases/tag/v0.1.21) we now support multiple CPU optimized variants of the LLM library. The system will auto-detect the capabilities of the CPU and select one of AVX2, AVX, or unoptimized. This works on linux, mac, and windows. In particular, the unoptimized variant works under Rosetta now. Hello. Is it also true for the docker image? I am not 100% sure that my issue is related, but I tried to debug and the docker container crashed for an error linked to CPU instructions. My Intel G6400 does not ave AVX nor AVX2 but SSE 4.1 and 4.2. Could it be linked to a bad detection of the set of instructions it supports ? https://github.com/jmorganca/ollama/issues/2122 edit: looking to the release date of the docker image (11 days ago) it must be using a version older than 0.1.21, so which is not implementing this enhancement.
Author
Owner

@dhiltgen commented on GitHub (Jan 24, 2024):

We haven't pushed an official updated image yet, since 0.1.21 is still a pre-release while we squash a few final bugs.

If you're eager to try it out, I've pushed an image up to Docker Hub at dhiltgen/ollama:0.1.21-rc3

<!-- gh-comment-id:1907124830 --> @dhiltgen commented on GitHub (Jan 24, 2024): We haven't pushed an official updated image yet, since [0.1.21](https://github.com/ollama/ollama/releases/tag/v0.1.21) is still a pre-release while we squash a few final bugs. If you're eager to try it out, I've pushed an image up to Docker Hub at `dhiltgen/ollama:0.1.21-rc3`
Author
Owner

@GuiPoM commented on GitHub (Jan 24, 2024):

We haven't pushed an official updated image yet, since 0.1.21 is still a pre-release while we squash a few final bugs.

If you're eager to try it out, I've pushed an image up to Docker Hub at dhiltgen/ollama:0.1.21-rc3

Thank you! That's very kind of you.
Is it normal for there to be such an increase in size between the rc2 and the rc3? We go from ~500Mb to ~5Gb.
I'll try to deploy the image tonight, currently my Portainer instance is crashing due to a timeout, probably related to the image size, I'll have to test it locally.

<!-- gh-comment-id:1907730672 --> @GuiPoM commented on GitHub (Jan 24, 2024): > We haven't pushed an official updated image yet, since [0.1.21](https://github.com/ollama/ollama/releases/tag/v0.1.21) is still a pre-release while we squash a few final bugs. > > If you're eager to try it out, I've pushed an image up to Docker Hub at `dhiltgen/ollama:0.1.21-rc3` Thank you! That's very kind of you. Is it normal for there to be such an increase in size between the rc2 and the rc3? We go from ~500Mb to ~5Gb. I'll try to deploy the image tonight, currently my Portainer instance is crashing due to a timeout, probably related to the image size, I'll have to test it locally.
Author
Owner

@dhiltgen commented on GitHub (Jan 24, 2024):

@GuiPoM we've recently added ROCm support to the container image, which required switching the base layer to include the ROCm libraries, which unfortunately are quite large. We'd prefer to have a single image that works for both NVIDIA and Radeon cards, but if this size increase is too much for your use-case, please open a new issue so we can track it.

<!-- gh-comment-id:1908753300 --> @dhiltgen commented on GitHub (Jan 24, 2024): @GuiPoM we've recently added ROCm support to the container image, which required switching the base layer to include the ROCm libraries, which unfortunately are quite large. We'd prefer to have a single image that works for both NVIDIA and Radeon cards, but if this size increase is too much for your use-case, please open a new issue so we can track it.
Author
Owner

@GuiPoM commented on GitHub (Jan 24, 2024):

@GuiPoM we've recently added ROCm support to the container image, which required switching the base layer to include the ROCm libraries, which unfortunately are quite large. We'd prefer to have a single image that works for both NVIDIA and Radeon cards, but if this size increase is too much for your use-case, please open a new issue so we can track it.

No, that's okay, but for testing CPU only scenario this is huge, even on my fiber.
By the way I managed thanks to this rc3 image to get ollama starting as a docker container on a non AVX processor, so I can confirm that this image is working great.

<!-- gh-comment-id:1908999551 --> @GuiPoM commented on GitHub (Jan 24, 2024): > @GuiPoM we've recently added ROCm support to the container image, which required switching the base layer to include the ROCm libraries, which unfortunately are quite large. We'd prefer to have a single image that works for both NVIDIA and Radeon cards, but if this size increase is too much for your use-case, please open a new issue so we can track it. No, that's okay, but for testing CPU only scenario this is huge, even on my fiber. By the way I managed thanks to this rc3 image to get ollama starting as a docker container on a non AVX processor, so I can confirm that this image is working great.
Author
Owner

@dhiltgen commented on GitHub (Jan 24, 2024):

@GuiPoM if you just need CPU only, you could grab the binary direct from the github release page and stick that into ~any modern container image base.

A simple Dockerfile like this would work:

FROM ubuntu:latest
ADD --chmod=655 https://github.com/ollama/ollama/releases/download/v0.1.21/ollama-linux-amd64 /bin/ollama
ENTRYPOINT ["/bin/ollama"]
CMD ["serve"]
<!-- gh-comment-id:1909109207 --> @dhiltgen commented on GitHub (Jan 24, 2024): @GuiPoM if you just need CPU only, you could grab the binary direct from the github release page and stick that into ~any modern container image base. A simple Dockerfile like this would work: ```Dockerfile FROM ubuntu:latest ADD --chmod=655 https://github.com/ollama/ollama/releases/download/v0.1.21/ollama-linux-amd64 /bin/ollama ENTRYPOINT ["/bin/ollama"] CMD ["serve"] ```
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#47170