[PR #15122] runner: Remove CGO engines, use llama-server exclusively for GGML models #61736

Open
opened 2026-04-29 16:45:57 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/15122
Author: @dhiltgen
Created: 3/28/2026
Status: 🔄 Open

Base: mainHead: llama-runner


📝 Commits (4)

  • df93105 manifest: add manifest list support
  • a7161c5 runner: Remove CGO engines, use llama-server exclusively for GGML models
  • fb2428a broad lint fixes to sidestep CI scope glitch
  • 18cb5c1 compat: migrate local GGUFs into manifest lists

📊 Changes

1024 files changed (+22029 additions, -427021 deletions)

View changed files

📝 .github/workflows/release.yaml (+66 -13)
📝 .github/workflows/test.yaml (+25 -23)
📝 CMakeLists.txt (+10 -148)
📝 CMakePresets.json (+2 -140)
📝 Dockerfile (+111 -78)
LLAMA_CPP_VERSION (+1 -0)
Makefile.sync (+0 -76)
📝 api/client.go (+15 -0)
📝 api/types.go (+26 -2)
📝 app/ollama.iss (+0 -48)
📝 cmd/cmd.go (+323 -35)
📝 cmd/cmd_test.go (+284 -0)
📝 cmd/interactive.go (+4 -3)
📝 cmd/launch/hermes_test.go (+0 -9)
📝 cmd/tui/signin_test.go (+0 -1)
📝 cmd/warn_thinking_test.go (+1 -1)
compatmigrate/bakllava.go (+69 -0)
compatmigrate/deepseek_ocr.go (+202 -0)
compatmigrate/disk_unix.go (+14 -0)
compatmigrate/disk_windows.go (+24 -0)

...and 80 more files

📄 Description

Marking draft as this is still WIP.

Remove the vendored GGML and llama.cpp backend, CGO runner, GGML-based Go model implementations, and sample. llama-server (built from upstream llama.cpp via FetchContent) is now the sole inference engine for GGUF-based models. (Safetensor based models continue to run on the new MLX engine.) This allows us to more rapidly pick up new capabilities from llama.cpp as they come out.

On windows this now requires recent AMD driver versions to support ROCm v7 as llama.cpp currently does not support building against v6.

The conversion code for models is still WIP. For models where the existing library version is incompatible with llama.cpp, new models are pushed to dhiltgen/xxx, with temporary routing logic in this branch to translate library/xxx -> dhiltgen/xxx. e.g. ollama run gpt-oss:20b will temporarily map to dhiltgen/gpt-oss:20b which is llama.cpp compatible. Before merging the namespace routing logic will be removed and compatible models will be pushed to the library with selection logic to ensure the correct versions are used.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/15122 **Author:** [@dhiltgen](https://github.com/dhiltgen) **Created:** 3/28/2026 **Status:** 🔄 Open **Base:** `main` ← **Head:** `llama-runner` --- ### 📝 Commits (4) - [`df93105`](https://github.com/ollama/ollama/commit/df931050154af3d8bab3d1a3da71b1111545a6a1) manifest: add manifest list support - [`a7161c5`](https://github.com/ollama/ollama/commit/a7161c5f6dcaaf2ca16e41b423f16f70d43c703e) runner: Remove CGO engines, use llama-server exclusively for GGML models - [`fb2428a`](https://github.com/ollama/ollama/commit/fb2428ab08aa0e89dc64aa1350c298575cab9d77) broad lint fixes to sidestep CI scope glitch - [`18cb5c1`](https://github.com/ollama/ollama/commit/18cb5c1af607a950237d673faa0e9b5776e8ff3a) compat: migrate local GGUFs into manifest lists ### 📊 Changes **1024 files changed** (+22029 additions, -427021 deletions) <details> <summary>View changed files</summary> 📝 `.github/workflows/release.yaml` (+66 -13) 📝 `.github/workflows/test.yaml` (+25 -23) 📝 `CMakeLists.txt` (+10 -148) 📝 `CMakePresets.json` (+2 -140) 📝 `Dockerfile` (+111 -78) ➕ `LLAMA_CPP_VERSION` (+1 -0) ➖ `Makefile.sync` (+0 -76) 📝 `api/client.go` (+15 -0) 📝 `api/types.go` (+26 -2) 📝 `app/ollama.iss` (+0 -48) 📝 `cmd/cmd.go` (+323 -35) 📝 `cmd/cmd_test.go` (+284 -0) 📝 `cmd/interactive.go` (+4 -3) 📝 `cmd/launch/hermes_test.go` (+0 -9) 📝 `cmd/tui/signin_test.go` (+0 -1) 📝 `cmd/warn_thinking_test.go` (+1 -1) ➕ `compatmigrate/bakllava.go` (+69 -0) ➕ `compatmigrate/deepseek_ocr.go` (+202 -0) ➕ `compatmigrate/disk_unix.go` (+14 -0) ➕ `compatmigrate/disk_windows.go` (+24 -0) _...and 80 more files_ </details> ### 📄 Description Marking draft as this is still WIP. Remove the vendored GGML and llama.cpp backend, CGO runner, GGML-based Go model implementations, and sample. llama-server (built from upstream llama.cpp via FetchContent) is now the sole inference engine for GGUF-based models. (Safetensor based models continue to run on the new MLX engine.) This allows us to more rapidly pick up new capabilities from llama.cpp as they come out. On windows this now requires recent AMD driver versions to support ROCm v7 as llama.cpp currently does not support building against v6. The conversion code for models is still WIP. For models where the existing library version is incompatible with llama.cpp, new models are pushed to dhiltgen/xxx, with temporary routing logic in this branch to translate library/xxx -> dhiltgen/xxx. e.g. `ollama run gpt-oss:20b` will temporarily map to `dhiltgen/gpt-oss:20b` which is llama.cpp compatible. Before merging the namespace routing logic will be removed and compatible models will be pushed to the library with selection logic to ensure the correct versions are used. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-29 16:45:57 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#61736