[PR #1819] [MERGED] Support multiple LLM libs; ROCm v5 and v6; Rosetta, AVX, and AVX2 compatible CPU builds #36566

Closed
opened 2026-04-22 21:12:08 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/1819
Author: @dhiltgen
Created: 1/6/2024
Status: Merged
Merged: 1/11/2024
Merged by: @dhiltgen

Base: mainHead: multi_variant


📝 Commits (4)

  • 8da7bef Support multiple variants for a given llm lib type
  • 052b33b DRY out the Dockefile.build
  • d88c527 Build multiple CPU variants and pick the best
  • 39928a4 Always dynamically load the llm server library

📊 Changes

33 files changed (+821 additions, -625 deletions)

View changed files

📝 Dockerfile.build (+59 -67)
📝 docs/development.md (+16 -0)
📝 docs/troubleshooting.md (+33 -2)
📝 go.mod (+1 -1)
gpu/cpu_common.go (+21 -0)
📝 gpu/gpu.go (+11 -6)
📝 gpu/gpu_darwin.go (+8 -5)
📝 gpu/gpu_info_rocm.c (+26 -2)
📝 gpu/gpu_info_rocm.h (+14 -0)
📝 gpu/gpu_test.go (+1 -1)
📝 gpu/types.go (+3 -0)
📝 llm/dyn_ext_server.c (+14 -14)
📝 llm/dyn_ext_server.go (+85 -51)
📝 llm/dyn_ext_server.h (+12 -12)
📝 llm/ext_server/README.md (+16 -2)
llm/ext_server_default.go (+0 -80)
llm/ext_server_windows.go (+0 -12)
📝 llm/generate/gen_common.sh (+10 -0)
📝 llm/generate/gen_darwin.sh (+12 -0)
📝 llm/generate/gen_linux.sh (+67 -11)

...and 13 more files

📄 Description

In some cases we may want multiple variants for a given GPU type or CPU. This adds logic to have an optional Variant which we can use to select an optimal library, but also allows us to try multiple variants in case some fail to load.

This change includes updates to the Dockerfile.build to compile 2 variants for ROCm so we can support v5 and v6.

I've also added multiple CPU variants and runtime detection logic so we can support both lowest-common-denominator for really old CPUs (and rosetta emulation on macos) as well as more modern CPUs. At present, llama.cpp does not verify CPU features, so loading the wrong cpu variant will panic the whole process with illegal instruction. Ollama should autodetect the optimal llm library variant for the given system, but I've also added a fail-safe mechanism for users to be able to force a specific library to workaround problems should they arise.

This also converges the LLM library model to use dynamic loading for all scenarios instead of having a built-in static link for macos and linux. Windows was always fully dynamic, and now linux and macos follow the same pattern, so I was able to clean up the implementation and reduce some unnecessary complexity.

Fixes #1868
Fixes #1821


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/1819 **Author:** [@dhiltgen](https://github.com/dhiltgen) **Created:** 1/6/2024 **Status:** ✅ Merged **Merged:** 1/11/2024 **Merged by:** [@dhiltgen](https://github.com/dhiltgen) **Base:** `main` ← **Head:** `multi_variant` --- ### 📝 Commits (4) - [`8da7bef`](https://github.com/ollama/ollama/commit/8da7bef05f6d14dff244860db75d84cc0995e0dd) Support multiple variants for a given llm lib type - [`052b33b`](https://github.com/ollama/ollama/commit/052b33b81bef9482dc8dd8306c197c0d5aa1e4a6) DRY out the Dockefile.build - [`d88c527`](https://github.com/ollama/ollama/commit/d88c527be392ff4a05648f6e2cbd8f69241714ca) Build multiple CPU variants and pick the best - [`39928a4`](https://github.com/ollama/ollama/commit/39928a42e8e2b68d5d904c70c4bd07f849e1b76d) Always dynamically load the llm server library ### 📊 Changes **33 files changed** (+821 additions, -625 deletions) <details> <summary>View changed files</summary> 📝 `Dockerfile.build` (+59 -67) 📝 `docs/development.md` (+16 -0) 📝 `docs/troubleshooting.md` (+33 -2) 📝 `go.mod` (+1 -1) ➕ `gpu/cpu_common.go` (+21 -0) 📝 `gpu/gpu.go` (+11 -6) 📝 `gpu/gpu_darwin.go` (+8 -5) 📝 `gpu/gpu_info_rocm.c` (+26 -2) 📝 `gpu/gpu_info_rocm.h` (+14 -0) 📝 `gpu/gpu_test.go` (+1 -1) 📝 `gpu/types.go` (+3 -0) 📝 `llm/dyn_ext_server.c` (+14 -14) 📝 `llm/dyn_ext_server.go` (+85 -51) 📝 `llm/dyn_ext_server.h` (+12 -12) 📝 `llm/ext_server/README.md` (+16 -2) ➖ `llm/ext_server_default.go` (+0 -80) ➖ `llm/ext_server_windows.go` (+0 -12) 📝 `llm/generate/gen_common.sh` (+10 -0) 📝 `llm/generate/gen_darwin.sh` (+12 -0) 📝 `llm/generate/gen_linux.sh` (+67 -11) _...and 13 more files_ </details> ### 📄 Description In some cases we may want multiple variants for a given GPU type or CPU. This adds logic to have an optional Variant which we can use to select an optimal library, but also allows us to try multiple variants in case some fail to load. This change includes updates to the Dockerfile.build to compile 2 variants for ROCm so we can support v5 and v6. I've also added multiple CPU variants and runtime detection logic so we can support both lowest-common-denominator for really old CPUs (and rosetta emulation on macos) as well as more modern CPUs. At present, llama.cpp does not verify CPU features, so loading the wrong cpu variant will panic the whole process with illegal instruction. Ollama should autodetect the optimal llm library variant for the given system, but I've also added a fail-safe mechanism for users to be able to force a specific library to workaround problems should they arise. This also converges the LLM library model to use dynamic loading for all scenarios instead of having a built-in static link for macos and linux. Windows was always fully dynamic, and now linux and macos follow the same pattern, so I was able to clean up the implementation and reduce some unnecessary complexity. Fixes #1868 Fixes #1821 --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-22 21:12:08 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#36566