[PR #6570] [CLOSED] llama: opt-in at build time #17425

Closed
opened 2026-04-16 06:02:21 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/6570
Author: @dhiltgen
Created: 8/30/2024
Status: Closed

Base: jmorganca/llamaHead: go_server


📝 Commits (10+)

📊 Changes

299 files changed (+165277 additions, -898 deletions)

View changed files

📝 .dockerignore (+2 -0)
📝 .gitattributes (+2 -0)
📝 .github/workflows/release.yaml (+180 -29)
📝 .github/workflows/test.yaml (+1 -42)
📝 .gitignore (+3 -1)
📝 Dockerfile (+91 -60)
Dockerfile.new (+221 -0)
build/darwin/amd64/placeholder (+1 -0)
build/darwin/arm64/placeholder (+1 -0)
build/embed_darwin_amd64.go (+8 -0)
build/embed_darwin_arm64.go (+8 -0)
build/embed_linux.go (+6 -0)
build/embed_unused.go (+8 -0)
build/linux/amd64/placeholder (+1 -0)
build/linux/arm64/placeholder (+1 -0)
📝 envconfig/config.go (+0 -48)
gpu/assets.go (+0 -148)
📝 gpu/gpu.go (+3 -4)
📝 integration/concurrency_test.go (+1 -1)
📝 integration/utils_test.go (+1 -1)

...and 80 more files

📄 Description

This PR layers on #6547 for the new Go server.

Unfortunately the sizes are too large to try to make the opt-in strategy work at runtime (the linux tgz would significantly exceed the 2G github artifact size limit) so this makes the opt-in strategy work at build time.

Notable refinements:

  • the ggml library is now moved out as a payload in the tar file to reduce the binary size, and the names are adjusted to avoid clashing between cuda v11, v12, and rocm.
  • The static cgo wiring for the main app is shifted over to the new llama package and the old go generate wiring for the static build is removed as no longer needed.
  • An initial foundation for requirement information is added to the runner so eventually we can pick compatible runners more easily
  • Use the CPU vector flags when compiling the GPU runners

I'm still working through verifying all the build stages, so I'll mark it draft for now until I confirm they're all correct.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/6570 **Author:** [@dhiltgen](https://github.com/dhiltgen) **Created:** 8/30/2024 **Status:** ❌ Closed **Base:** `jmorganca/llama` ← **Head:** `go_server` --- ### 📝 Commits (10+) - [`adafbf9`](https://github.com/ollama/ollama/commit/adafbf908143dfa23c5ea28df889665e8969e54b) add sync of llama.cpp - [`c815358`](https://github.com/ollama/ollama/commit/c815358a8114e373c05e04bfbf62427a57555910) Initial `llama` Go module - [`8523233`](https://github.com/ollama/ollama/commit/8523233d55c3ba7b6605273407ca577ad45744a9) fix `.gitattributes` - [`211a8b0`](https://github.com/ollama/ollama/commit/211a8b02907c2b97a17694d473d22097f31c511b) Add missing hipcc flags - [`00a8a37`](https://github.com/ollama/ollama/commit/00a8a37bda419d341fca6659a5a77a6e8b25dbe5) Update README.md - [`9b11f27`](https://github.com/ollama/ollama/commit/9b11f272de71342ab1a7148b891f3495ba9ee04f) Update README.md - [`87868a9`](https://github.com/ollama/ollama/commit/87868a94188a5458d57f478363e876ba8b030db5) Update README.md - [`6b3e946`](https://github.com/ollama/ollama/commit/6b3e946f694c036033884064240306af302d0f0f) Update README.md - [`c08bb0b`](https://github.com/ollama/ollama/commit/c08bb0b1077e1c06a5f321fe2733596cc76f662a) rename `server` to `runner` - [`ad64db1`](https://github.com/ollama/ollama/commit/ad64db1a6385d1cf2b49e95633ca1a5c04683ffa) wip... ### 📊 Changes **299 files changed** (+165277 additions, -898 deletions) <details> <summary>View changed files</summary> 📝 `.dockerignore` (+2 -0) 📝 `.gitattributes` (+2 -0) 📝 `.github/workflows/release.yaml` (+180 -29) 📝 `.github/workflows/test.yaml` (+1 -42) 📝 `.gitignore` (+3 -1) 📝 `Dockerfile` (+91 -60) ➕ `Dockerfile.new` (+221 -0) ➕ `build/darwin/amd64/placeholder` (+1 -0) ➕ `build/darwin/arm64/placeholder` (+1 -0) ➕ `build/embed_darwin_amd64.go` (+8 -0) ➕ `build/embed_darwin_arm64.go` (+8 -0) ➕ `build/embed_linux.go` (+6 -0) ➕ `build/embed_unused.go` (+8 -0) ➕ `build/linux/amd64/placeholder` (+1 -0) ➕ `build/linux/arm64/placeholder` (+1 -0) 📝 `envconfig/config.go` (+0 -48) ➖ `gpu/assets.go` (+0 -148) 📝 `gpu/gpu.go` (+3 -4) 📝 `integration/concurrency_test.go` (+1 -1) 📝 `integration/utils_test.go` (+1 -1) _...and 80 more files_ </details> ### 📄 Description This PR layers on #6547 for the new Go server. Unfortunately the sizes are too large to try to make the opt-in strategy work at runtime (the linux tgz would significantly exceed the 2G github artifact size limit) so this makes the opt-in strategy work at build time. Notable refinements: - the ggml library is now moved out as a payload in the tar file to reduce the binary size, and the names are adjusted to avoid clashing between cuda v11, v12, and rocm. - The static cgo wiring for the main app is shifted over to the new llama package and the old `go generate` wiring for the static build is removed as no longer needed. - An initial foundation for requirement information is added to the runner so eventually we can pick compatible runners more easily - Use the CPU vector flags when compiling the GPU runners I'm still working through verifying all the build stages, so I'll mark it draft for now until I confirm they're all correct. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-16 06:02:21 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#17425