[GH-ISSUE #7399] set gpu/cpu affinity per-model #51216

Closed
opened 2026-04-28 18:56:11 -05:00 by GiteaMirror · 1 comment
Owner

Originally created by @xucian on GitHub (Oct 28, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/7399

is it possible to set the target device (gpu0, gpu1, cpu) per-model? that'd be a game-changer as we could offload smaller models to cpu while keeping bigger models on gpu
basically preventing the warmup

Originally created by @xucian on GitHub (Oct 28, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/7399 is it possible to set the target device (gpu0, gpu1, cpu) per-model? that'd be a game-changer as we could offload smaller models to cpu while keeping bigger models on gpu basically preventing the warmup
GiteaMirror added the feature request label 2026-04-28 18:56:11 -05:00
Author
Owner

@rick-github commented on GitHub (Oct 28, 2024):

Not currently. #3902 tracks the work on model management, but there has been little progress so far. As a limited workaround, you can force a model on to CPU by specifying num_gpu as zero, see https://github.com/ollama/ollama/issues/6950#issuecomment-2373663650 for details.

<!-- gh-comment-id:2441718772 --> @rick-github commented on GitHub (Oct 28, 2024): Not currently. #3902 tracks the work on model management, but there has been little progress so far. As a limited workaround, you can force a model on to CPU by specifying `num_gpu` as zero, see https://github.com/ollama/ollama/issues/6950#issuecomment-2373663650 for details.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#51216