[PR #1146] [MERGED] Add cgo implementation for llama.cpp #57178

Closed
opened 2026-04-29 11:45:39 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/1146
Author: @dhiltgen
Created: 11/16/2023
Status: Merged
Merged: 12/22/2023
Merged by: @dhiltgen

Base: mainHead: ext_server_cgo


📝 Commits (10+)

  • 811b1f0 deprecate ggml
  • 5e7fd69 Update images.go
  • d4cd695 Add cgo implementation for llama.cpp
  • f8ef443 Use build tags to generate accelerated binaries for CUDA and ROCm on Linux.
  • 35934b2 Adapted rocm support to cgo based llama.cpp
  • 89bbaaf Build linux using ubuntu 20.04
  • 9adca7f Bump llama.cpp to b1662 and set n_parallel=1
  • 5108253 Add automated test for multimodal
  • 1b991d0 Refine build to support CPU only
  • 3269535 Refine handling of shim presence

📊 Changes

55 files changed (+3205 additions, -1179 deletions)

View changed files

📝 .dockerignore (+1 -2)
📝 .gitignore (+2 -1)
📝 .gitmodules (+0 -5)
📝 Dockerfile.build (+57 -14)
📝 README.md (+8 -2)
📝 cmd/cmd.go (+22 -2)
📝 docs/development.md (+32 -3)
📝 go.mod (+1 -1)
📝 go.sum (+2 -1)
gpu/gpu.go (+134 -0)
gpu/gpu_darwin.go (+41 -0)
gpu/gpu_info.h (+49 -0)
gpu/gpu_info_cpu.c (+42 -0)
gpu/gpu_info_cuda.c (+106 -0)
gpu/gpu_info_cuda.h (+35 -0)
gpu/gpu_info_rocm.c (+114 -0)
gpu/gpu_info_rocm.h (+36 -0)
gpu/gpu_test.go (+26 -0)
gpu/types.go (+11 -0)
llm/dynamic_shim.c (+136 -0)

...and 35 more files

📄 Description

This change revamps the way ollama wires up llama.cpp for gguf to link directly via cgo instead
of running a subprocess. Within llama.cpp, a thin facade has been added to server.cpp (via included patch)
to enable extern "C" access to the main logic to minimize changes to the existing LLM interface.

Mac, Linux, and Windows are supported and manually tested.

Carries #1268 and #814


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/1146 **Author:** [@dhiltgen](https://github.com/dhiltgen) **Created:** 11/16/2023 **Status:** ✅ Merged **Merged:** 12/22/2023 **Merged by:** [@dhiltgen](https://github.com/dhiltgen) **Base:** `main` ← **Head:** `ext_server_cgo` --- ### 📝 Commits (10+) - [`811b1f0`](https://github.com/ollama/ollama/commit/811b1f03c8270baa8004aff9e977ed3d18593661) deprecate ggml - [`5e7fd69`](https://github.com/ollama/ollama/commit/5e7fd6906f4653fa671aa5d2e2d4dd5bdf17fd36) Update images.go - [`d4cd695`](https://github.com/ollama/ollama/commit/d4cd6957598ba6a3a1bb4e2660ee24b82e2541da) Add cgo implementation for llama.cpp - [`f8ef443`](https://github.com/ollama/ollama/commit/f8ef4439e9673c7df2314fafb5975aeab856c51f) Use build tags to generate accelerated binaries for CUDA and ROCm on Linux. - [`35934b2`](https://github.com/ollama/ollama/commit/35934b2e05cd598a6de0a1ed1ef62c11fb078f36) Adapted rocm support to cgo based llama.cpp - [`89bbaaf`](https://github.com/ollama/ollama/commit/89bbaafa64421e835c841435d0fdff94aa4152e7) Build linux using ubuntu 20.04 - [`9adca7f`](https://github.com/ollama/ollama/commit/9adca7f71128c09759fd0bc0b9a146f4e79fe935) Bump llama.cpp to b1662 and set n_parallel=1 - [`5108253`](https://github.com/ollama/ollama/commit/51082535e17fd7480aed24772413167bf0512b80) Add automated test for multimodal - [`1b991d0`](https://github.com/ollama/ollama/commit/1b991d0ba961936ec8bb50c5b8dabdcd2f9aff25) Refine build to support CPU only - [`3269535`](https://github.com/ollama/ollama/commit/3269535a4c19e5b1f3178645a136e753df8ed9ba) Refine handling of shim presence ### 📊 Changes **55 files changed** (+3205 additions, -1179 deletions) <details> <summary>View changed files</summary> 📝 `.dockerignore` (+1 -2) 📝 `.gitignore` (+2 -1) 📝 `.gitmodules` (+0 -5) 📝 `Dockerfile.build` (+57 -14) 📝 `README.md` (+8 -2) 📝 `cmd/cmd.go` (+22 -2) 📝 `docs/development.md` (+32 -3) 📝 `go.mod` (+1 -1) 📝 `go.sum` (+2 -1) ➕ `gpu/gpu.go` (+134 -0) ➕ `gpu/gpu_darwin.go` (+41 -0) ➕ `gpu/gpu_info.h` (+49 -0) ➕ `gpu/gpu_info_cpu.c` (+42 -0) ➕ `gpu/gpu_info_cuda.c` (+106 -0) ➕ `gpu/gpu_info_cuda.h` (+35 -0) ➕ `gpu/gpu_info_rocm.c` (+114 -0) ➕ `gpu/gpu_info_rocm.h` (+36 -0) ➕ `gpu/gpu_test.go` (+26 -0) ➕ `gpu/types.go` (+11 -0) ➕ `llm/dynamic_shim.c` (+136 -0) _...and 35 more files_ </details> ### 📄 Description This change revamps the way ollama wires up llama.cpp for gguf to link directly via cgo instead of running a subprocess. Within llama.cpp, a thin facade has been added to server.cpp (via included patch) to enable extern "C" access to the main logic to minimize changes to the existing LLM interface. Mac, Linux, and Windows are supported and manually tested. Carries #1268 and #814 --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-29 11:45:39 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#57178