[PR #10844] discover/gpu.go: Add Support for Distributed Inferencing (continued) #12123

Open
opened 2025-11-12 16:29:18 -06:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/10844
Author: @gkpln3
Created: 5/24/2025
Status: 🔄 Open

Base: mainHead: feat/rpc


📝 Commits (10+)

  • 7015fce feat: Added support for llama.cpp RPC
  • c234eea doc: Added documentation for distributed inferencing
  • bf80325 feat: Added Memory Check for RPC Servers
  • ed78baa feat: Added option to change RPC servers in HTTP options
  • 4f76c4b doc: Added docs for new API options
  • 056bd69 doc: Updated request for changing RPC server to be generate instead of chat
  • b177dcf Merge remote-tracking branch 'upstream/main' into feat/rpc
  • 8068cd1 Merge remote-tracking branch 'upstream/main' into feat/rpc
  • ca5c567 server/sched.go: Fixed missing legacy gpu module
  • f645eec dicover/gpu.go: Updated RPC communication to support new protocol

📊 Changes

46 files changed (+2616 additions, -8 deletions)

View changed files

📝 CMakeLists.txt (+4 -0)
📝 api/types.go (+2 -0)
📝 cmd/cmd.go (+14 -0)
cmd/rpc_server.go (+48 -0)
📝 discover/gpu.go (+1 -0)
📝 discover/gpu_darwin.go (+1 -1)
discover/gpu_rpc.go (+185 -0)
📝 discover/types.go (+9 -0)
📝 docs/api.md (+2 -1)
docs/distributed_inferencing.md (+37 -0)
📝 envconfig/config.go (+3 -0)
📝 llama/llama.go (+11 -0)
llama/patches/0001-ggml-backend-malloc-and-free-using-the-same-compiler.patched (+0 -0)
llama/patches/0002-pretokenizer.patched (+0 -0)
llama/patches/0003-embeddings.patched (+0 -0)
llama/patches/0004-clip-unicode.patched (+0 -0)
llama/patches/0005-solar-pro.patched (+0 -0)
llama/patches/0006-add-mllama-support.patched (+0 -0)
llama/patches/0007-add-unpad-operator.patched (+0 -0)
llama/patches/0008-fix-deepseek-deseret-regex.patched (+0 -0)

...and 26 more files

📄 Description

This PR builds on top of the work done by @ecyht2 on #6729, following issue #4643.
It aims to add RPC support to Ollama based on llama.cpp RPC mechanism to allow distributed inference across multiple devices.

This PR has been tested and confirmed working on MacOS (fixing a race condition in distributed inference). best performance can be achieved by connecting the devices using Thunderbolt 4.

This PR also adds the ollama rpc command that allows running the RPC server on the other computer.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/10844 **Author:** [@gkpln3](https://github.com/gkpln3) **Created:** 5/24/2025 **Status:** 🔄 Open **Base:** `main` ← **Head:** `feat/rpc` --- ### 📝 Commits (10+) - [`7015fce`](https://github.com/ollama/ollama/commit/7015fce447687276a10bdd2caefc20614048a6dd) feat: Added support for llama.cpp RPC - [`c234eea`](https://github.com/ollama/ollama/commit/c234eea9ecb54b905ccffdeba551e4bb8cd4c186) doc: Added documentation for distributed inferencing - [`bf80325`](https://github.com/ollama/ollama/commit/bf803251be33e8d4a588caefa49d59a058a04a12) feat: Added Memory Check for RPC Servers - [`ed78baa`](https://github.com/ollama/ollama/commit/ed78baaed7c50eeae5761b0836c70ba1307ded72) feat: Added option to change RPC servers in HTTP options - [`4f76c4b`](https://github.com/ollama/ollama/commit/4f76c4b26e40e76ae6c6208f9287c1773cc3b4a6) doc: Added docs for new API options - [`056bd69`](https://github.com/ollama/ollama/commit/056bd69ed9687c0ebeaf844c2bec23cf4890e91d) doc: Updated request for changing RPC server to be generate instead of chat - [`b177dcf`](https://github.com/ollama/ollama/commit/b177dcf52430b9e0766b6f0d4977d14e39fc6336) Merge remote-tracking branch 'upstream/main' into feat/rpc - [`8068cd1`](https://github.com/ollama/ollama/commit/8068cd10cfb255d9c831c384a4b286ad5275a720) Merge remote-tracking branch 'upstream/main' into feat/rpc - [`ca5c567`](https://github.com/ollama/ollama/commit/ca5c5677be4f7e0faf716cc7fc9f93b69b1edea5) server/sched.go: Fixed missing legacy gpu module - [`f645eec`](https://github.com/ollama/ollama/commit/f645eec1eab25649fe9d8bcf607585bd00cfd253) dicover/gpu.go: Updated RPC communication to support new protocol ### 📊 Changes **46 files changed** (+2616 additions, -8 deletions) <details> <summary>View changed files</summary> 📝 `CMakeLists.txt` (+4 -0) 📝 `api/types.go` (+2 -0) 📝 `cmd/cmd.go` (+14 -0) ➕ `cmd/rpc_server.go` (+48 -0) 📝 `discover/gpu.go` (+1 -0) 📝 `discover/gpu_darwin.go` (+1 -1) ➕ `discover/gpu_rpc.go` (+185 -0) 📝 `discover/types.go` (+9 -0) 📝 `docs/api.md` (+2 -1) ➕ `docs/distributed_inferencing.md` (+37 -0) 📝 `envconfig/config.go` (+3 -0) 📝 `llama/llama.go` (+11 -0) ➕ `llama/patches/0001-ggml-backend-malloc-and-free-using-the-same-compiler.patched` (+0 -0) ➕ `llama/patches/0002-pretokenizer.patched` (+0 -0) ➕ `llama/patches/0003-embeddings.patched` (+0 -0) ➕ `llama/patches/0004-clip-unicode.patched` (+0 -0) ➕ `llama/patches/0005-solar-pro.patched` (+0 -0) ➕ `llama/patches/0006-add-mllama-support.patched` (+0 -0) ➕ `llama/patches/0007-add-unpad-operator.patched` (+0 -0) ➕ `llama/patches/0008-fix-deepseek-deseret-regex.patched` (+0 -0) _...and 26 more files_ </details> ### 📄 Description This PR builds on top of the work done by @ecyht2 on #6729, following issue #4643. It aims to add RPC support to Ollama based on llama.cpp RPC mechanism to allow distributed inference across multiple devices. This PR has been tested and confirmed working on MacOS (fixing a race condition in distributed inference). best performance can be achieved by connecting the devices using Thunderbolt 4. This PR also adds the `ollama rpc` command that allows running the RPC server on the other computer. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2025-11-12 16:29:18 -06:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama-ollama#12123