[PR #15122] runner: Remove CGO engines, use llama-server exclusively for GGML models #46287

Open
opened 2026-04-25 01:46:05 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/15122
Author: @dhiltgen
Created: 3/28/2026
Status: 🔄 Open

Base: mainHead: llama-runner


📝 Commits (2)

  • 36d5cfc runner: Remove CGO engines, use llama-server exclusively for GGML models
  • b5f1190 broad lint fixes to sidestep CI scope glitch

📊 Changes

944 files changed (+6112 additions, -423678 deletions)

View changed files

📝 .github/workflows/release.yaml (+66 -13)
📝 .github/workflows/test.yaml (+25 -23)
📝 CMakeLists.txt (+10 -148)
📝 CMakePresets.json (+2 -140)
📝 Dockerfile (+109 -78)
LLAMA_CPP_VERSION (+1 -0)
Makefile.sync (+0 -76)
📝 app/ollama.iss (+0 -48)
📝 cmd/cmd.go (+38 -6)
📝 cmd/launch/hermes_test.go (+0 -9)
📝 cmd/tui/signin_test.go (+0 -1)
📝 convert/convert.go (+51 -2)
📝 convert/convert_deepseekocr.go (+112 -13)
📝 convert/convert_gemma3.go (+225 -38)
📝 convert/convert_gemma3n.go (+72 -4)
📝 convert/convert_gemma4.go (+175 -58)
📝 convert/convert_glm4moelite.go (+39 -30)
📝 convert/convert_glmocr.go (+165 -136)
📝 convert/convert_gptoss.go (+26 -19)
📝 convert/convert_qwen25vl.go (+140 -16)

...and 80 more files

📄 Description

Marking draft as this is still WIP.

Remove the vendored GGML and llama.cpp backend, CGO runner, GGML-based Go model implementations, and sample. llama-server (built from upstream llama.cpp via FetchContent) is now the sole inference engine for GGUF-based models. (Safetensor based models continue to run on the new MLX engine.) This allows us to more rapidly pick up new capabilities from llama.cpp as they come out.

On windows this now requires recent AMD driver versions to support ROCm v7 as llama.cpp currently does not support building against v6.

The conversion code for models is still WIP. For models where the existing library version is incompatible with llama.cpp, new models are pushed to dhiltgen/xxx, with temporary routing logic in this branch to translate library/xxx -> dhiltgen/xxx. e.g. ollama run gpt-oss:20b will temporarily map to dhiltgen/gpt-oss:20b which is llama.cpp compatible. Before merging the namespace routing logic will be removed and compatible models will be pushed to the library with selection logic to ensure the correct versions are used.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/15122 **Author:** [@dhiltgen](https://github.com/dhiltgen) **Created:** 3/28/2026 **Status:** 🔄 Open **Base:** `main` ← **Head:** `llama-runner` --- ### 📝 Commits (2) - [`36d5cfc`](https://github.com/ollama/ollama/commit/36d5cfc3ca84952c8f0d1c658dfdccb3d41bd063) runner: Remove CGO engines, use llama-server exclusively for GGML models - [`b5f1190`](https://github.com/ollama/ollama/commit/b5f1190b054fe28daa290cf1e53045b3c81c45d7) broad lint fixes to sidestep CI scope glitch ### 📊 Changes **944 files changed** (+6112 additions, -423678 deletions) <details> <summary>View changed files</summary> 📝 `.github/workflows/release.yaml` (+66 -13) 📝 `.github/workflows/test.yaml` (+25 -23) 📝 `CMakeLists.txt` (+10 -148) 📝 `CMakePresets.json` (+2 -140) 📝 `Dockerfile` (+109 -78) ➕ `LLAMA_CPP_VERSION` (+1 -0) ➖ `Makefile.sync` (+0 -76) 📝 `app/ollama.iss` (+0 -48) 📝 `cmd/cmd.go` (+38 -6) 📝 `cmd/launch/hermes_test.go` (+0 -9) 📝 `cmd/tui/signin_test.go` (+0 -1) 📝 `convert/convert.go` (+51 -2) 📝 `convert/convert_deepseekocr.go` (+112 -13) 📝 `convert/convert_gemma3.go` (+225 -38) 📝 `convert/convert_gemma3n.go` (+72 -4) 📝 `convert/convert_gemma4.go` (+175 -58) 📝 `convert/convert_glm4moelite.go` (+39 -30) 📝 `convert/convert_glmocr.go` (+165 -136) 📝 `convert/convert_gptoss.go` (+26 -19) 📝 `convert/convert_qwen25vl.go` (+140 -16) _...and 80 more files_ </details> ### 📄 Description Marking draft as this is still WIP. Remove the vendored GGML and llama.cpp backend, CGO runner, GGML-based Go model implementations, and sample. llama-server (built from upstream llama.cpp via FetchContent) is now the sole inference engine for GGUF-based models. (Safetensor based models continue to run on the new MLX engine.) This allows us to more rapidly pick up new capabilities from llama.cpp as they come out. On windows this now requires recent AMD driver versions to support ROCm v7 as llama.cpp currently does not support building against v6. The conversion code for models is still WIP. For models where the existing library version is incompatible with llama.cpp, new models are pushed to dhiltgen/xxx, with temporary routing logic in this branch to translate library/xxx -> dhiltgen/xxx. e.g. `ollama run gpt-oss:20b` will temporarily map to `dhiltgen/gpt-oss:20b` which is llama.cpp compatible. Before merging the namespace routing logic will be removed and compatible models will be pushed to the library with selection logic to ensure the correct versions are used. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-25 01:46:05 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#46287