[GH-ISSUE #5983] Distributed Computing: Run single large model on multiple machines #29502

Closed
opened 2026-04-22 08:27:02 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @mrmiket64 on GitHub (Jul 26, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/5983

Hello dear Ollama team.

I'd like to request a new feature in Ollama that allows a single large model to be run across multiple machines for inference. This would enable users to take advantage of distributed computing to handle very large models that would exceed the memory capacity of a single machine.

That functionality could allow to:

  • Make more accessible to run models like llama 3.1 405b
  • Distribute the load when running several smaller models to serve a large user base

Thank you for all of your great work, it is really making a difference for me.

All the Best
Miguel

Originally created by @mrmiket64 on GitHub (Jul 26, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/5983 Hello dear Ollama team. I'd like to request a new feature in Ollama that allows a single large model to be run across multiple machines for inference. This would enable users to take advantage of distributed computing to handle very large models that would exceed the memory capacity of a single machine. That functionality could allow to: - Make more accessible to run models like llama 3.1 405b - Distribute the load when running several smaller models to serve a large user base Thank you for all of your great work, it is really making a difference for me. All the Best Miguel
GiteaMirror added the feature request label 2026-04-22 08:27:02 -05:00
Author
Owner

@rick-github commented on GitHub (Jul 26, 2024):

ollama use llama.cpp for inference, so that's where the distributed computing needs to be implemented. There have been attempts to do so, mainly by using the MPI framework but that work has languished recently.

<!-- gh-comment-id:2252828020 --> @rick-github commented on GitHub (Jul 26, 2024): ollama use [llama.cpp](https://github.com/ggerganov/llama.cpp) for inference, so that's where the distributed computing needs to be implemented. There have been attempts to do so, mainly by using the [MPI](https://www.geeksforgeeks.org/mpi-distributed-computing-made-easy/) framework but that work has languished recently.
Author
Owner

@dhiltgen commented on GitHub (Jul 26, 2024):

Lets track this via #4643

<!-- gh-comment-id:2253284779 --> @dhiltgen commented on GitHub (Jul 26, 2024): Lets track this via #4643
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#29502