[GH-ISSUE #10007] ollama uses multiple GPU resources #68616

Closed
opened 2026-05-04 14:37:47 -05:00 by GiteaMirror · 1 comment
Owner

Originally created by @cigar-wiki on GitHub (Mar 27, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/10007

Is there any way to make the model use multiple GPU resources in the container started by Docker? Now after starting ollama, I check nvidia-smi on the host machine and only one ollama process is running, and multiple GPUs are not used

Originally created by @cigar-wiki on GitHub (Mar 27, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/10007 Is there any way to make the model use multiple GPU resources in the container started by Docker? Now after starting ollama, I check nvidia-smi on the host machine and only one ollama process is running, and multiple GPUs are not used
GiteaMirror added the feature request label 2026-05-04 14:37:47 -05:00
Author
Owner

@rick-github commented on GitHub (Mar 27, 2025):

Ollama will use multiple GPUs if the model doesn't fit in a single GPU. Multiple GPUs doesn't increase inference rate for a single completion, see here.

<!-- gh-comment-id:2757051797 --> @rick-github commented on GitHub (Mar 27, 2025): Ollama will use multiple GPUs if the model doesn't fit in a single GPU. Multiple GPUs doesn't increase inference rate for a single completion, see [here](https://github.com/ollama/ollama/issues/7648#issuecomment-2473561990).
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#68616