[GH-ISSUE #9207] Can Ollama invoke multiple servers' GPUs simultaneously to run a single model? #5999

Closed
opened 2026-04-12 17:21:05 -05:00 by GiteaMirror · 1 comment
Owner

Originally created by @Bob080812 on GitHub (Feb 19, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/9207

Can Ollama invoke multiple servers' GPUs simultaneously to run a single model?

Originally created by @Bob080812 on GitHub (Feb 19, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/9207 Can Ollama invoke multiple servers' GPUs simultaneously to run a single model?
GiteaMirror added the question label 2026-04-12 17:21:05 -05:00
Author
Owner

@pdevine commented on GitHub (Feb 19, 2025):

Hi @Bob080812 , that's not something that Ollama supports out of the box, but you can put it behind something like kubernetes with ollama running on different nodes. There are some tutorials out there for how to do it if you do a bit of searching.

I'll go ahead and close the issue, but you can feel free to keep commenting.

<!-- gh-comment-id:2667906197 --> @pdevine commented on GitHub (Feb 19, 2025): Hi @Bob080812 , that's not something that Ollama supports out of the box, but you can put it behind something like kubernetes with ollama running on different nodes. There are some tutorials out there for how to do it if you do a bit of searching. I'll go ahead and close the issue, but you can feel free to keep commenting.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#5999