[PR #6729] [CLOSED] discover/gpu.go: Add Support for Distributed Inferencing #74502

Closed
opened 2026-05-05 06:36:32 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/6729
Author: @ecyht2
Created: 9/10/2024
Status: Closed

Base: mainHead: feat/rpc


📝 Commits (10+)

  • 7015fce feat: Added support for llama.cpp RPC
  • c234eea doc: Added documentation for distributed inferencing
  • bf80325 feat: Added Memory Check for RPC Servers
  • ed78baa feat: Added option to change RPC servers in HTTP options
  • 4f76c4b doc: Added docs for new API options
  • 056bd69 doc: Updated request for changing RPC server to be generate instead of chat
  • b177dcf Merge remote-tracking branch 'upstream/main' into feat/rpc
  • 8068cd1 Merge remote-tracking branch 'upstream/main' into feat/rpc
  • ca5c567 server/sched.go: Fixed missing legacy gpu module
  • f645eec dicover/gpu.go: Updated RPC communication to support new protocol

📊 Changes

39 files changed (+2216 additions, -3 deletions)

View changed files

📝 CMakeLists.txt (+4 -0)
📝 api/types.go (+2 -0)
📝 discover/gpu.go (+194 -0)
📝 discover/types.go (+21 -0)
📝 docs/api.md (+2 -1)
docs/distributed_inferencing.md (+35 -0)
📝 envconfig/config.go (+2 -0)
📝 llama/llama.go (+9 -0)
llama/patches/0001-ggml-backend-malloc-and-free-using-the-same-compiler.patched (+0 -0)
llama/patches/0002-pretokenizer.patched (+0 -0)
llama/patches/0003-embeddings.patched (+0 -0)
llama/patches/0004-clip-unicode.patched (+0 -0)
llama/patches/0005-solar-pro.patched (+0 -0)
llama/patches/0006-add-mllama-support.patched (+0 -0)
llama/patches/0007-add-unpad-operator.patched (+0 -0)
llama/patches/0008-fix-deepseek-deseret-regex.patched (+0 -0)
llama/patches/0009-maintain-ordering-for-rules-for-grammar.patched (+0 -0)
llama/patches/0010-ensure-KV-cache-is-fully-defragmented.patched (+0 -0)
llama/patches/0011-sort-devices-by-score.patched (+0 -0)
llama/patches/0012-add-phony-target-ggml-cpu-for-all-cpu-variants.patched (+0 -0)

...and 19 more files

📄 Description

This feature adds support for llama.cpp RPC. This allows for distributed inferencing on different devices.

This Pull Request aims to implement #4643.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/6729 **Author:** [@ecyht2](https://github.com/ecyht2) **Created:** 9/10/2024 **Status:** ❌ Closed **Base:** `main` ← **Head:** `feat/rpc` --- ### 📝 Commits (10+) - [`7015fce`](https://github.com/ollama/ollama/commit/7015fce447687276a10bdd2caefc20614048a6dd) feat: Added support for llama.cpp RPC - [`c234eea`](https://github.com/ollama/ollama/commit/c234eea9ecb54b905ccffdeba551e4bb8cd4c186) doc: Added documentation for distributed inferencing - [`bf80325`](https://github.com/ollama/ollama/commit/bf803251be33e8d4a588caefa49d59a058a04a12) feat: Added Memory Check for RPC Servers - [`ed78baa`](https://github.com/ollama/ollama/commit/ed78baaed7c50eeae5761b0836c70ba1307ded72) feat: Added option to change RPC servers in HTTP options - [`4f76c4b`](https://github.com/ollama/ollama/commit/4f76c4b26e40e76ae6c6208f9287c1773cc3b4a6) doc: Added docs for new API options - [`056bd69`](https://github.com/ollama/ollama/commit/056bd69ed9687c0ebeaf844c2bec23cf4890e91d) doc: Updated request for changing RPC server to be generate instead of chat - [`b177dcf`](https://github.com/ollama/ollama/commit/b177dcf52430b9e0766b6f0d4977d14e39fc6336) Merge remote-tracking branch 'upstream/main' into feat/rpc - [`8068cd1`](https://github.com/ollama/ollama/commit/8068cd10cfb255d9c831c384a4b286ad5275a720) Merge remote-tracking branch 'upstream/main' into feat/rpc - [`ca5c567`](https://github.com/ollama/ollama/commit/ca5c5677be4f7e0faf716cc7fc9f93b69b1edea5) server/sched.go: Fixed missing legacy gpu module - [`f645eec`](https://github.com/ollama/ollama/commit/f645eec1eab25649fe9d8bcf607585bd00cfd253) dicover/gpu.go: Updated RPC communication to support new protocol ### 📊 Changes **39 files changed** (+2216 additions, -3 deletions) <details> <summary>View changed files</summary> 📝 `CMakeLists.txt` (+4 -0) 📝 `api/types.go` (+2 -0) 📝 `discover/gpu.go` (+194 -0) 📝 `discover/types.go` (+21 -0) 📝 `docs/api.md` (+2 -1) ➕ `docs/distributed_inferencing.md` (+35 -0) 📝 `envconfig/config.go` (+2 -0) 📝 `llama/llama.go` (+9 -0) ➕ `llama/patches/0001-ggml-backend-malloc-and-free-using-the-same-compiler.patched` (+0 -0) ➕ `llama/patches/0002-pretokenizer.patched` (+0 -0) ➕ `llama/patches/0003-embeddings.patched` (+0 -0) ➕ `llama/patches/0004-clip-unicode.patched` (+0 -0) ➕ `llama/patches/0005-solar-pro.patched` (+0 -0) ➕ `llama/patches/0006-add-mllama-support.patched` (+0 -0) ➕ `llama/patches/0007-add-unpad-operator.patched` (+0 -0) ➕ `llama/patches/0008-fix-deepseek-deseret-regex.patched` (+0 -0) ➕ `llama/patches/0009-maintain-ordering-for-rules-for-grammar.patched` (+0 -0) ➕ `llama/patches/0010-ensure-KV-cache-is-fully-defragmented.patched` (+0 -0) ➕ `llama/patches/0011-sort-devices-by-score.patched` (+0 -0) ➕ `llama/patches/0012-add-phony-target-ggml-cpu-for-all-cpu-variants.patched` (+0 -0) _...and 19 more files_ </details> ### 📄 Description This feature adds support for llama.cpp RPC. This allows for distributed inferencing on different devices. This Pull Request aims to implement #4643. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-05-05 06:36:32 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#74502