[GH-ISSUE #10531] Request for Manual GPU Assignment in Ollama: Feature Proposal to Specify Target GPUs for Model Loading #6928

Closed
opened 2026-04-12 18:49:17 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @varyagnord on GitHub (May 2, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/10531

Hello everyone!

I have a task where I need to load six, or even more, models simultaneously, assigning them strictly to specific GPUs. For instance, models A, C, and D should be loaded specifically onto GPU 0, while models B, E, and F must go to another GPU (e.g., GPU 1). I need to accomplish this in Linux.

I know that currently, Ollama evaluates the performance of each GPU installed in the system and automatically selects the best and most suitable GPUs for loading models. However, I really miss having a parameter that could be used when loading a model to override the GPU selection algorithm. Ideally, such a parameter should be available for specification during the initial loading of the model into memory.

I believe this feature would be useful to more people than just myself. Perhaps such a parameter already exists and I simply wasn't aware of it or couldn't find any related information.

Originally created by @varyagnord on GitHub (May 2, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/10531 Hello everyone! I have a task where I need to load six, or even more, models simultaneously, assigning them strictly to specific GPUs. For instance, models A, C, and D should be loaded specifically onto GPU 0, while models B, E, and F must go to another GPU (e.g., GPU 1). I need to accomplish this in Linux. I know that currently, Ollama evaluates the performance of each GPU installed in the system and automatically selects the best and most suitable GPUs for loading models. However, I really miss having a parameter that could be used when loading a model to override the GPU selection algorithm. Ideally, such a parameter should be available for specification during the initial loading of the model into memory. I believe this feature would be useful to more people than just myself. Perhaps such a parameter already exists and I simply wasn't aware of it or couldn't find any related information.
GiteaMirror added the feature request label 2026-04-12 18:49:17 -05:00
Author
Owner

@rick-github commented on GitHub (May 2, 2025):

#3902 for related work. The only way to do this at the moment is with multiple ollama servers.

<!-- gh-comment-id:2846772172 --> @rick-github commented on GitHub (May 2, 2025): #3902 for related work. The only way to do this at the moment is with multiple ollama servers.
Author
Owner

@chnxq commented on GitHub (May 4, 2025):

Multiple runners can be considered to be started.

<!-- gh-comment-id:2849300338 --> @chnxq commented on GitHub (May 4, 2025): Multiple runners can be considered to be started.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#6928