[GH-ISSUE #2156] I want to run Ollama on the limited number of GPUS and CPUS #63269

Closed
opened 2026-05-03 12:47:34 -05:00 by GiteaMirror · 3 comments
Owner

Originally created by @sfarzi on GitHub (Jan 23, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/2156

I have a machine with 4 GPUS and 16 CPUS. but I want to run Ollama just on one gpu and 8 cpus. How can I do this?

Originally created by @sfarzi on GitHub (Jan 23, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/2156 I have a machine with 4 GPUS and 16 CPUS. but I want to run Ollama just on one gpu and 8 cpus. How can I do this?
Author
Owner

@easp commented on GitHub (Jan 23, 2024):

Manuall setting num_thread in a Modelfile will limit the cores used. GPUs is more complicated. I think this will work (assuming you are using NVIDIA)
https://developer.nvidia.com/blog/cuda-pro-tip-control-gpu-visibility-cuda_visible_devices/

<!-- gh-comment-id:1906520931 --> @easp commented on GitHub (Jan 23, 2024): Manuall setting num_thread in a Modelfile will limit the cores used. GPUs is more complicated. I think this will work (assuming you are using NVIDIA) https://developer.nvidia.com/blog/cuda-pro-tip-control-gpu-visibility-cuda_visible_devices/
Author
Owner

@jukofyork commented on GitHub (Jan 24, 2024):

There is already the option to pass through the main_gpu option to the wrapped llama.cpp server but the patch to pass through the tensor_split option https://github.com/ollama/ollama/pull/1256 seems to be stuck and says: "This branch has conflicts that must be resolved".

Somebody in that thread replied that the patch works fine though.

<!-- gh-comment-id:1907918774 --> @jukofyork commented on GitHub (Jan 24, 2024): There is already the option to pass through the `main_gpu` option to the wrapped llama.cpp server but the patch to pass through the `tensor_split` option https://github.com/ollama/ollama/pull/1256 seems to be stuck and says: "This branch has conflicts that must be resolved". Somebody in that thread replied that the patch works fine though.
Author
Owner

@dhiltgen commented on GitHub (Mar 12, 2024):

I've updated our docs to show how to limit GPUs.

You can experiment with taskset for CPU limiting, but I would suggest using something like nice to adjust the priority of the process if you're trying to avoid starving other workloads of CPU cycles.

<!-- gh-comment-id:1992311810 --> @dhiltgen commented on GitHub (Mar 12, 2024): I've updated our docs to show how to limit GPUs. You can experiment with `taskset` for CPU limiting, but I would suggest using something like `nice` to adjust the priority of the process if you're trying to avoid starving other workloads of CPU cycles.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#63269