[PR #15122] runner: Remove CGO engines, use llama-server exclusively for GGML models #77333

Open
opened 2026-05-05 10:00:26 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/15122
Author: @dhiltgen
Created: 3/28/2026
Status: 🔄 Open

Base: mainHead: llama-runner


📝 Commits (5)

  • 4d6d6dd broad lint fixes to sidestep CI scope glitch
  • 869e724 runner: Remove CGO engines, use llama-server exclusively for GGML models
  • 417e731 llama/compat: load Ollama-format GGUFs in llama-server
  • dfd8d88 server: remove embeddings compat redirect
  • 7f166a8 runner: iteration...

📊 Changes

973 files changed (+12081 additions, -426187 deletions)

View changed files

📝 .github/workflows/release.yaml (+66 -16)
📝 .github/workflows/test.yaml (+25 -23)
📝 CMakeLists.txt (+10 -148)
📝 CMakePresets.json (+2 -140)
📝 Dockerfile (+131 -81)
LLAMA_CPP_VERSION (+1 -0)
Makefile.sync (+0 -76)
📝 app/ollama.iss (+0 -48)
📝 cmd/cmd.go (+38 -6)
📝 cmd/tui/signin_test.go (+0 -1)
📝 convert/convert.go (+51 -2)
📝 convert/convert_deepseekocr.go (+112 -13)
📝 convert/convert_gemma3.go (+225 -38)
📝 convert/convert_gemma3n.go (+72 -4)
📝 convert/convert_gemma4.go (+175 -58)
📝 convert/convert_glm4moelite.go (+39 -30)
📝 convert/convert_glmocr.go (+165 -136)
📝 convert/convert_gptoss.go (+26 -19)
📝 convert/convert_qwen25vl.go (+140 -16)
📝 convert/convert_qwen3vl.go (+154 -17)

...and 80 more files

📄 Description

Marking draft as this is still WIP.

Remove the vendored GGML and llama.cpp backend, CGO runner, GGML-based Go model implementations, and sample. llama-server (built from upstream llama.cpp via FetchContent) is now the sole inference engine for GGUF-based models. (Safetensor based models continue to run on the new MLX engine.) This allows us to more rapidly pick up new capabilities from llama.cpp as they come out.

On windows this now requires recent AMD driver versions to support ROCm v7 as llama.cpp currently does not support building against v6.

The conversion code for models is still WIP. For models where the existing library version is incompatible with llama.cpp, new models are pushed to dhiltgen/xxx, with temporary routing logic in this branch to translate library/xxx -> dhiltgen/xxx. e.g. ollama run gpt-oss:20b will temporarily map to dhiltgen/gpt-oss:20b which is llama.cpp compatible. Before merging the namespace routing logic will be removed and compatible models will be pushed to the library with selection logic to ensure the correct versions are used.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/15122 **Author:** [@dhiltgen](https://github.com/dhiltgen) **Created:** 3/28/2026 **Status:** 🔄 Open **Base:** `main` ← **Head:** `llama-runner` --- ### 📝 Commits (5) - [`4d6d6dd`](https://github.com/ollama/ollama/commit/4d6d6dd7cb0f1bba10da84a8e1f9d3d68c890b77) broad lint fixes to sidestep CI scope glitch - [`869e724`](https://github.com/ollama/ollama/commit/869e724922a8631e09a0bfb0801dc14036985c56) runner: Remove CGO engines, use llama-server exclusively for GGML models - [`417e731`](https://github.com/ollama/ollama/commit/417e731726538f0376d2b3b731ca766cbbda7eb0) llama/compat: load Ollama-format GGUFs in llama-server - [`dfd8d88`](https://github.com/ollama/ollama/commit/dfd8d88fccac5728ede0f8ed6710054f34151011) server: remove embeddings compat redirect - [`7f166a8`](https://github.com/ollama/ollama/commit/7f166a8b8b05aeb3cc7d73b285a40d01a9d82179) runner: iteration... ### 📊 Changes **973 files changed** (+12081 additions, -426187 deletions) <details> <summary>View changed files</summary> 📝 `.github/workflows/release.yaml` (+66 -16) 📝 `.github/workflows/test.yaml` (+25 -23) 📝 `CMakeLists.txt` (+10 -148) 📝 `CMakePresets.json` (+2 -140) 📝 `Dockerfile` (+131 -81) ➕ `LLAMA_CPP_VERSION` (+1 -0) ➖ `Makefile.sync` (+0 -76) 📝 `app/ollama.iss` (+0 -48) 📝 `cmd/cmd.go` (+38 -6) 📝 `cmd/tui/signin_test.go` (+0 -1) 📝 `convert/convert.go` (+51 -2) 📝 `convert/convert_deepseekocr.go` (+112 -13) 📝 `convert/convert_gemma3.go` (+225 -38) 📝 `convert/convert_gemma3n.go` (+72 -4) 📝 `convert/convert_gemma4.go` (+175 -58) 📝 `convert/convert_glm4moelite.go` (+39 -30) 📝 `convert/convert_glmocr.go` (+165 -136) 📝 `convert/convert_gptoss.go` (+26 -19) 📝 `convert/convert_qwen25vl.go` (+140 -16) 📝 `convert/convert_qwen3vl.go` (+154 -17) _...and 80 more files_ </details> ### 📄 Description Marking draft as this is still WIP. Remove the vendored GGML and llama.cpp backend, CGO runner, GGML-based Go model implementations, and sample. llama-server (built from upstream llama.cpp via FetchContent) is now the sole inference engine for GGUF-based models. (Safetensor based models continue to run on the new MLX engine.) This allows us to more rapidly pick up new capabilities from llama.cpp as they come out. On windows this now requires recent AMD driver versions to support ROCm v7 as llama.cpp currently does not support building against v6. The conversion code for models is still WIP. For models where the existing library version is incompatible with llama.cpp, new models are pushed to dhiltgen/xxx, with temporary routing logic in this branch to translate library/xxx -> dhiltgen/xxx. e.g. `ollama run gpt-oss:20b` will temporarily map to `dhiltgen/gpt-oss:20b` which is llama.cpp compatible. Before merging the namespace routing logic will be removed and compatible models will be pushed to the library with selection logic to ensure the correct versions are used. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-05-05 10:00:26 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#77333