[GH-ISSUE #8404] Running the same model on all GPUs #5400

Closed
opened 2026-04-12 16:38:01 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @ZanMax on GitHub (Jan 13, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/8404

Is it possible to run the same model on multiple GPUs?
I have a server with 5 GPUs, and I want to run the same model on each GPU to provide more concurrency for users.
I found a solution by running multiple instances of "ollama serve" on different ports and using "haproxy" as a load balancer to distribute requests across the instances.
If this feature is not implemented, could you add an option to run the model on all or specific GPUs?

For example:
ollama run --gpus=all mistral
or
ollama run --gpus=0,1,2 mistral

Originally created by @ZanMax on GitHub (Jan 13, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/8404 Is it possible to run the same model on multiple GPUs? I have a server with 5 GPUs, and I want to run the same model on each GPU to provide more concurrency for users. I found a solution by running multiple instances of "ollama serve" on different ports and using "haproxy" as a load balancer to distribute requests across the instances. If this feature is not implemented, could you add an option to run the model on all or specific GPUs? For example: ollama run --gpus=all mistral or ollama run --gpus=0,1,2 mistral
GiteaMirror added the gpufeature request labels 2026-04-12 16:38:01 -05:00
Author
Owner

@rick-github commented on GitHub (Jan 13, 2025):

https://github.com/ollama/ollama/issues/3902

The usual way to add concurrency is to increase OLLAMA_NUM_PARALLEL, see here for more info.

<!-- gh-comment-id:2588457728 --> @rick-github commented on GitHub (Jan 13, 2025): https://github.com/ollama/ollama/issues/3902 The usual way to add concurrency is to increase [`OLLAMA_NUM_PARALLEL`](https://github.com/ollama/ollama/blob/84a2314463a2ba0e7863b43a20a59d6564c21124/envconfig/config.go#L249), see [here](https://github.com/ollama/ollama/blob/main/docs/faq.md#how-does-ollama-handle-concurrent-requests) for more info.
Author
Owner

@rick-github commented on GitHub (Jan 28, 2025):

closing as dupe #3902

<!-- gh-comment-id:2620067124 --> @rick-github commented on GitHub (Jan 28, 2025): closing as dupe #3902
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#5400