[PR #1067] [CLOSED] add custom ollama-runner #36329

Closed
opened 2026-04-22 21:00:33 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/1067
Author: @BruceMacD
Created: 11/10/2023
Status: Closed

Base: mainHead: brucemacd/llama-cpp-server


📝 Commits (1)

📊 Changes

8 files changed (+34871 additions, -103 deletions)

View changed files

📝 llm/llama.cpp/generate_darwin_amd64.go (+3 -3)
📝 llm/llama.cpp/generate_darwin_arm64.go (+3 -3)
📝 llm/llama.cpp/generate_linux.go (+3 -2)
📝 llm/llama.cpp/generate_windows.go (+3 -2)
📝 llm/llama.cpp/gguf (+1 -1)
llm/llama.cpp/patches/0001-custom-ollama-runner.patch (+34857 -0)
llm/llama.cpp/patches/0001-metal-handle-ggml_scale-for-n-4-0-close-3754.patch (+0 -91)
📝 llm/llama.go (+1 -1)

📄 Description

  • update llama.cpp examples with custom ollama-runner
  • update llama.cpp gguf version to latest

This change adds a custom inference server to llama.cpp based on the server we use in the current version, but with excess features removed. This allows us to have a more stable interface to build on when llama.cpp updates.

To review this please pull down the changes run go generate ./... and review the contents of the llm/llama.cpp/gguf/examples/ollama-runner

This change may be superseded by packaging in llama.cpp directly in the near future.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/1067 **Author:** [@BruceMacD](https://github.com/BruceMacD) **Created:** 11/10/2023 **Status:** ❌ Closed **Base:** `main` ← **Head:** `brucemacd/llama-cpp-server` --- ### 📝 Commits (1) - [`91a47f9`](https://github.com/ollama/ollama/commit/91a47f9a79790a4b04afab061585ad381647244d) add custom ollama-runner ### 📊 Changes **8 files changed** (+34871 additions, -103 deletions) <details> <summary>View changed files</summary> 📝 `llm/llama.cpp/generate_darwin_amd64.go` (+3 -3) 📝 `llm/llama.cpp/generate_darwin_arm64.go` (+3 -3) 📝 `llm/llama.cpp/generate_linux.go` (+3 -2) 📝 `llm/llama.cpp/generate_windows.go` (+3 -2) 📝 `llm/llama.cpp/gguf` (+1 -1) ➕ `llm/llama.cpp/patches/0001-custom-ollama-runner.patch` (+34857 -0) ➖ `llm/llama.cpp/patches/0001-metal-handle-ggml_scale-for-n-4-0-close-3754.patch` (+0 -91) 📝 `llm/llama.go` (+1 -1) </details> ### 📄 Description - update llama.cpp examples with custom ollama-runner - update llama.cpp gguf version to latest This change adds a custom inference server to llama.cpp based on the server we use in the current version, but with excess features removed. This allows us to have a more stable interface to build on when llama.cpp updates. To review this please pull down the changes run `go generate ./...` and review the contents of the `llm/llama.cpp/gguf/examples/ollama-runner` This change may be superseded by packaging in llama.cpp directly in the near future. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-22 21:00:33 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#36329