[PR #2007] [MERGED] Add multiple CPU variants for Intel Mac #10755

Closed
opened 2026-04-12 23:09:42 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/2007
Author: @dhiltgen
Created: 1/15/2024
Status: Merged
Merged: 1/18/2024
Merged by: @dhiltgen

Base: mainHead: cpu_fallback


📝 Commits (2)

  • 1b24974 Add multiple CPU variants for Intel Mac
  • b992bf6 Disable arm64 for test phase

📊 Changes

18 files changed (+321 additions, -186 deletions)

View changed files

📝 .github/workflows/test.yaml (+4 -1)
📝 Dockerfile.build (+8 -2)
📝 llm/dyn_ext_server.c (+3 -3)
📝 llm/dyn_ext_server.go (+1 -10)
📝 llm/ext_server/CMakeLists.txt (+10 -14)
📝 llm/generate/gen_common.sh (+53 -20)
📝 llm/generate/gen_darwin.sh (+42 -23)
📝 llm/generate/gen_linux.sh (+40 -47)
📝 llm/generate/gen_windows.ps1 (+55 -15)
📝 llm/payload_common.go (+59 -33)
llm/payload_darwin.go (+0 -8)
llm/payload_darwin_amd64.go (+8 -0)
llm/payload_darwin_arm64.go (+8 -0)
📝 llm/payload_linux.go (+1 -1)
📝 llm/payload_windows.go (+1 -1)
📝 scripts/build_darwin.sh (+23 -8)
📝 scripts/build_remote.py (+4 -0)
📝 scripts/rh_linux_deps.sh (+1 -0)

📄 Description

This also refines the build process for the ext_server build.

I had initially aimed to get rid of the gcc/g++ library generation step and rely on cmake to build a shared library, but due to toolchain quirks, this model didn't work reliably. (e.g. linux worked since it's a consistent toolchain, and arm mac worked, but intel mac segfaults when calling the init function pointer). This may still be achievable in a follow up incremental PR, but for now I'll stick with g++ to create the main library we dlopen on all platforms except windows.

Another potential follow up is to consider splitting out the cuda shared libraries as a discrete download and handle it in the installer script if we don't detect cuda present on the host. That would further reduce the footprint and resolve the slow initial startup due to decompressing large payloads.

Marking draft until I have a chance to more fully test, but so far happy path testing on mac (intel/arm), windows(cuda), and linux (rocm/cuda) looks good.

Extracting the now compressed payloads takes some time - ~15s on my older laptop

2024/01/15 11:12:42 payload_common.go:106: Extracting dynamic libraries...
2024/01/15 11:12:57 payload_common.go:145: Dynamic LLM libraries [rocm_v6 cpu cpu_avx2 cpu_avx cuda_v11 rocm_v5]

Uncompressed sizes once on disk:

% du -sh /tmp/ollama3226276348/*
36M	/tmp/ollama3226276348/cpu
36M	/tmp/ollama3226276348/cpu_avx
36M	/tmp/ollama3226276348/cpu_avx2
410M	/tmp/ollama3226276348/cuda_v11
30M	/tmp/ollama3226276348/rocm_v5
31M	/tmp/ollama3226276348/rocm_v6

The actual linux binary:

% ls -lh ollama-linux-amd64
-rwxrwxr-x 1 daniel daniel 294M Jan 15 11:12 ollama-linux-amd64

🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/2007 **Author:** [@dhiltgen](https://github.com/dhiltgen) **Created:** 1/15/2024 **Status:** ✅ Merged **Merged:** 1/18/2024 **Merged by:** [@dhiltgen](https://github.com/dhiltgen) **Base:** `main` ← **Head:** `cpu_fallback` --- ### 📝 Commits (2) - [`1b24974`](https://github.com/ollama/ollama/commit/1b249748abdf6905edb4a94c6a367d6cbbc4d00a) Add multiple CPU variants for Intel Mac - [`b992bf6`](https://github.com/ollama/ollama/commit/b992bf65fcabc8142f127c7488984f37d4a42b97) Disable arm64 for test phase ### 📊 Changes **18 files changed** (+321 additions, -186 deletions) <details> <summary>View changed files</summary> 📝 `.github/workflows/test.yaml` (+4 -1) 📝 `Dockerfile.build` (+8 -2) 📝 `llm/dyn_ext_server.c` (+3 -3) 📝 `llm/dyn_ext_server.go` (+1 -10) 📝 `llm/ext_server/CMakeLists.txt` (+10 -14) 📝 `llm/generate/gen_common.sh` (+53 -20) 📝 `llm/generate/gen_darwin.sh` (+42 -23) 📝 `llm/generate/gen_linux.sh` (+40 -47) 📝 `llm/generate/gen_windows.ps1` (+55 -15) 📝 `llm/payload_common.go` (+59 -33) ➖ `llm/payload_darwin.go` (+0 -8) ➕ `llm/payload_darwin_amd64.go` (+8 -0) ➕ `llm/payload_darwin_arm64.go` (+8 -0) 📝 `llm/payload_linux.go` (+1 -1) 📝 `llm/payload_windows.go` (+1 -1) 📝 `scripts/build_darwin.sh` (+23 -8) 📝 `scripts/build_remote.py` (+4 -0) 📝 `scripts/rh_linux_deps.sh` (+1 -0) </details> ### 📄 Description This also refines the build process for the ext_server build. I had initially aimed to get rid of the gcc/g++ library generation step and rely on cmake to build a shared library, but due to toolchain quirks, this model didn't work reliably. (e.g. linux worked since it's a consistent toolchain, and arm mac worked, but intel mac segfaults when calling the init function pointer). This may still be achievable in a follow up incremental PR, but for now I'll stick with g++ to create the main library we dlopen on all platforms except windows. Another potential follow up is to consider splitting out the cuda shared libraries as a discrete download and handle it in the installer script if we don't detect cuda present on the host. That would further reduce the footprint and resolve the slow initial startup due to decompressing large payloads. _Marking draft until I have a chance to more fully test, but so far happy path testing on mac (intel/arm), windows(cuda), and linux (rocm/cuda) looks good._ Extracting the now compressed payloads takes some time - ~15s on my older laptop ``` 2024/01/15 11:12:42 payload_common.go:106: Extracting dynamic libraries... 2024/01/15 11:12:57 payload_common.go:145: Dynamic LLM libraries [rocm_v6 cpu cpu_avx2 cpu_avx cuda_v11 rocm_v5] ``` Uncompressed sizes once on disk: ``` % du -sh /tmp/ollama3226276348/* 36M /tmp/ollama3226276348/cpu 36M /tmp/ollama3226276348/cpu_avx 36M /tmp/ollama3226276348/cpu_avx2 410M /tmp/ollama3226276348/cuda_v11 30M /tmp/ollama3226276348/rocm_v5 31M /tmp/ollama3226276348/rocm_v6 ``` The actual linux binary: ``` % ls -lh ollama-linux-amd64 -rwxrwxr-x 1 daniel daniel 294M Jan 15 11:12 ollama-linux-amd64 ``` --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-12 23:09:42 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#10755