[PR #5034] [MERGED] Re-introduce the llama package #11652

Closed
opened 2026-04-12 23:34:47 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/5034
Author: @jmorganca
Created: 6/13/2024
Status: Merged
Merged: 10/8/2024
Merged by: @dhiltgen

Base: mainHead: jmorganca/llama


📝 Commits (10+)

  • a07f5cb Re-introduce the llama package
  • f8c11a5 cache: Clear old KV cache entries when evicting a slot
  • 3d602d7 doc: explain golang objc linker warning (#6830)
  • 57971c6 llama: gather transitive dependencies for rocm for dist packaging (#6848)
  • 0db090d Refine go server makefiles to be more DRY (#6924)
  • ae0c6f0 llama: don't create extraneous directories (#6988)
  • 4023e3a llama: Exercise the new build in CI (#6989)
  • 7ad6251 llama: Refine developer docs for Go server (#6842)
  • 3509b02 runner.go: Allocate batches for all sequences during init
  • a16399a llama.go: Don't return nil from Tokenize on zero length input

📊 Changes

289 files changed (+166143 additions, -166 deletions)

View changed files

📝 .gitattributes (+2 -0)
📝 .github/workflows/test.yaml (+54 -0)
📝 .gitignore (+0 -1)
📝 Dockerfile (+2 -10)
📝 docs/development.md (+182 -1)
📝 envconfig/config.go (+3 -0)
📝 integration/concurrency_test.go (+1 -1)
📝 integration/utils_test.go (+1 -1)
llama/.gitignore (+3 -0)
llama/Dockerfile (+221 -0)
llama/Makefile (+54 -0)
llama/README.md (+100 -0)
llama/base64.hpp (+392 -0)
llama/build-info.cpp (+30 -0)
llama/clip.cpp (+2689 -0)
llama/clip.h (+120 -0)
llama/common.cpp (+3688 -0)
llama/common.h (+514 -0)
llama/ggml-aarch64.c (+2206 -0)
llama/ggml-aarch64.h (+65 -0)

...and 80 more files

📄 Description

This PR brings back the llama package, making it possible to call llama.cpp and ggml APIs from Go directly via CGo. This has a few advantages:

  1. C APIs can be called directly from Go without needing to use the previous "server" REST API
  2. On macOS and for CPU builds on Linux and Windows, Ollama can be built without a go generate ./... step, making it easy to get up and running to hack on parts of Ollama that don't require fast inference
  3. Faster build times for AVX,AVX2,CUDA and ROCM (a full build of all runners takes <5 min on a fast CPU)
  4. No git submodule making it easier to clone and build from source

This is a big PR, but much of it is vendor code except for:

  1. llama.go CGo bindings
  2. example/: a simple example of running inference
  3. runner/: a subprocess server designed to replace the llm/ext_server package
  4. Makefile an as minimal as possible Makefile to build the runner package for different targets (cpu, avx, avx2, cuda, rocm)

The easiest way to try out the PR:

cd llama
make -j

Which will produce ollama_runner binaries based on the current platform.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/5034 **Author:** [@jmorganca](https://github.com/jmorganca) **Created:** 6/13/2024 **Status:** ✅ Merged **Merged:** 10/8/2024 **Merged by:** [@dhiltgen](https://github.com/dhiltgen) **Base:** `main` ← **Head:** `jmorganca/llama` --- ### 📝 Commits (10+) - [`a07f5cb`](https://github.com/ollama/ollama/commit/a07f5cb5094db592fa73d7f7e3edd1251a6a8237) Re-introduce the llama package - [`f8c11a5`](https://github.com/ollama/ollama/commit/f8c11a5e66b0b73ddd6ad56d8c66a6c8a885da68) cache: Clear old KV cache entries when evicting a slot - [`3d602d7`](https://github.com/ollama/ollama/commit/3d602d7888e7c4d50f8e45c61330614981c3e0dd) doc: explain golang objc linker warning (#6830) - [`57971c6`](https://github.com/ollama/ollama/commit/57971c630b8045673e7baa7afc6d9eef3b31b1d9) llama: gather transitive dependencies for rocm for dist packaging (#6848) - [`0db090d`](https://github.com/ollama/ollama/commit/0db090d25e793edf146334815777d4219a7f3cef) Refine go server makefiles to be more DRY (#6924) - [`ae0c6f0`](https://github.com/ollama/ollama/commit/ae0c6f0f509289b2f4113a1f87e7668436e0e815) llama: don't create extraneous directories (#6988) - [`4023e3a`](https://github.com/ollama/ollama/commit/4023e3a2db5201a998c1f8bb7e40cc9ec094b2fc) llama: Exercise the new build in CI (#6989) - [`7ad6251`](https://github.com/ollama/ollama/commit/7ad62519889f04dfe25eebd1d4a9eedbd1174d26) llama: Refine developer docs for Go server (#6842) - [`3509b02`](https://github.com/ollama/ollama/commit/3509b022b59f8a84a20bc250a212431a6be28610) runner.go: Allocate batches for all sequences during init - [`a16399a`](https://github.com/ollama/ollama/commit/a16399a8334538ddb4c06b5e425377b8e9b68c6e) llama.go: Don't return nil from Tokenize on zero length input ### 📊 Changes **289 files changed** (+166143 additions, -166 deletions) <details> <summary>View changed files</summary> 📝 `.gitattributes` (+2 -0) 📝 `.github/workflows/test.yaml` (+54 -0) 📝 `.gitignore` (+0 -1) 📝 `Dockerfile` (+2 -10) 📝 `docs/development.md` (+182 -1) 📝 `envconfig/config.go` (+3 -0) 📝 `integration/concurrency_test.go` (+1 -1) 📝 `integration/utils_test.go` (+1 -1) ➕ `llama/.gitignore` (+3 -0) ➕ `llama/Dockerfile` (+221 -0) ➕ `llama/Makefile` (+54 -0) ➕ `llama/README.md` (+100 -0) ➕ `llama/base64.hpp` (+392 -0) ➕ `llama/build-info.cpp` (+30 -0) ➕ `llama/clip.cpp` (+2689 -0) ➕ `llama/clip.h` (+120 -0) ➕ `llama/common.cpp` (+3688 -0) ➕ `llama/common.h` (+514 -0) ➕ `llama/ggml-aarch64.c` (+2206 -0) ➕ `llama/ggml-aarch64.h` (+65 -0) _...and 80 more files_ </details> ### 📄 Description This PR brings back the `llama` package, making it possible to call llama.cpp and ggml APIs from Go directly via CGo. This has a few advantages: 1. C APIs can be called directly from Go without needing to use the previous "server" REST API 2. On macOS and for CPU builds on Linux and Windows, Ollama can be built without a `go generate ./...` step, making it easy to get up and running to hack on parts of Ollama that don't require fast inference 3. Faster build times for AVX,AVX2,CUDA and ROCM (a full build of all runners takes <5 min on a fast CPU) 4. No git submodule making it easier to clone and build from source This is a big PR, but much of it is vendor code except for: 1. `llama.go` CGo bindings 2. `example/`: a simple example of running inference 3. `runner/`: a subprocess server designed to replace the `llm/ext_server` package 4. `Makefile` an as minimal as possible `Makefile` to build the `runner` package for different targets (cpu, avx, avx2, cuda, rocm) The easiest way to try out the PR: ``` cd llama make -j ``` Which will produce `ollama_runner` binaries based on the current platform. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-12 23:34:47 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#11652