[GH-ISSUE #5849] How to force the use of two GPUs to run a model? #65686

Closed
opened 2026-05-03 22:14:32 -05:00 by GiteaMirror · 1 comment
Owner

Originally created by @mizzlefeng on GitHub (Jul 22, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/5849

I have reviewed many issues, including #4198, #4517 and so on.
I found that the explanation given is that if the graphics memory of a single GPU is sufficient to run the current model, then it will not use more GPUs. But what should I do if I force him to use two GPUs evenly? Even if I set OLLAMA_NUMPARALLEL to 2, it was ineffective and only one GPU was used in the end.

Originally created by @mizzlefeng on GitHub (Jul 22, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/5849 I have reviewed many issues, including [#4198](https://github.com/ollama/ollama/issues/4198), [#4517](https://github.com/ollama/ollama/pull/4517) and so on. I found that the explanation given is that if the graphics memory of a single GPU is sufficient to run the current model, then it will not use more GPUs. But what should I do if I force him to use two GPUs evenly? Even if I set OLLAMA_NUMPARALLEL to 2, it was ineffective and only one GPU was used in the end.
GiteaMirror added the question label 2026-05-03 22:14:32 -05:00
Author
Owner

@wrapss commented on GitHub (Jul 22, 2024):

try use OLLAMA_SCHED_SPREAD=1

<!-- gh-comment-id:2243224616 --> @wrapss commented on GitHub (Jul 22, 2024): try use OLLAMA_SCHED_SPREAD=1
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#65686