[GH-ISSUE #6453] Inconsistent GPU Usage #50570

Closed
opened 2026-04-28 16:24:59 -05:00 by GiteaMirror · 4 comments
Owner

Originally created by @gru3zi on GitHub (Aug 21, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/6453

What is the issue?

I have been happily using Ollama for sometime with my Dual RTX 3090's with an NV-Link Adaptor. Recently ive been finding the output to be quite slow. After checking both the outputs of 'ollama ps' and nvidia-smi I found that my GPUs are not fully being utilised anymore. It seems to split between the CPU and GPUs.

If I run different models of the same size, some output full GPU usage while others dont.

image

2024-08-21_15-48

My service file which I set with Environment=CUDA_VISIBLE_DEVICES and had also run 'sudo rmmod nvidia_uvm && sudo modprobe nvidia_uvm'

image

Is there a way to stop CPU usage all together?

OS

Linux

GPU

Nvidia

CPU

AMD

Ollama version

ollama version is 0.3.6

Originally created by @gru3zi on GitHub (Aug 21, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/6453 ### What is the issue? I have been happily using Ollama for sometime with my Dual RTX 3090's with an NV-Link Adaptor. Recently ive been finding the output to be quite slow. After checking both the outputs of 'ollama ps' and nvidia-smi I found that my GPUs are not fully being utilised anymore. It seems to split between the CPU and GPUs. If I run different models of the same size, some output full GPU usage while others dont. ![image](https://github.com/user-attachments/assets/4e2746ca-e8dc-4051-935b-5a99c2344600) ![2024-08-21_15-48](https://github.com/user-attachments/assets/d1099028-3146-4292-96ae-47151589f487) My service file which I set with **Environment=CUDA_VISIBLE_DEVICES** and had also run 'sudo rmmod nvidia_uvm && sudo modprobe nvidia_uvm' ![image](https://github.com/user-attachments/assets/9f0254c1-2db2-43e7-b5f1-01e94d1ee528) Is there a way to stop CPU usage all together? ### OS Linux ### GPU Nvidia ### CPU AMD ### Ollama version ollama version is 0.3.6
GiteaMirror added the bug label 2026-04-28 16:24:59 -05:00
Author
Owner

@rick-github commented on GitHub (Aug 21, 2024):

Server logs may give insight in to how ollama is scheduling the models. Based on the screenshots it looks like it's only detecting one GPU.

<!-- gh-comment-id:2302361033 --> @rick-github commented on GitHub (Aug 21, 2024): [Server logs](https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md#how-to-troubleshoot-issues) may give insight in to how ollama is scheduling the models. Based on the screenshots it looks like it's only detecting one GPU.
Author
Owner

@gru3zi commented on GitHub (Aug 21, 2024):

I made some changes and now it uses all my GPUs again.

[Unit]
Description=Ollama Service
After=network-online.target

[Service]
ExecStart=/usr/local/bin/ollama serve
User=ollama
Group=ollama
Restart=always
RestartSec=3
Environment="PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:>
CUDA_VISIBLE_DEVICES=0,1
OLLAMA_SCHED_SPREAD=1

[Install]
WantedBy=default.target

<!-- gh-comment-id:2302901676 --> @gru3zi commented on GitHub (Aug 21, 2024): I made some changes and now it uses all my GPUs again. [Unit] Description=Ollama Service After=network-online.target [Service] ExecStart=/usr/local/bin/ollama serve User=ollama Group=ollama Restart=always RestartSec=3 Environment="PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:> **CUDA_VISIBLE_DEVICES=0,1** **OLLAMA_SCHED_SPREAD=1** [Install] WantedBy=default.target
Author
Owner

@rick-github commented on GitHub (Aug 21, 2024):

If your intent is pass the variables CUDA_VISIBLE_DEVICES and OLLAMA_SCHED_SPREAD to the ollama server, you need to add them as Environment directions.

Environment="CUDA_VISIBLE_DEVICES=0,1"
Environment="OLLAMA_SCHED_SPREAD=1"
<!-- gh-comment-id:2302921872 --> @rick-github commented on GitHub (Aug 21, 2024): If your intent is pass the variables `CUDA_VISIBLE_DEVICES` and `OLLAMA_SCHED_SPREAD` to the ollama server, you need to add them as `Environment` directions. ``` Environment="CUDA_VISIBLE_DEVICES=0,1" Environment="OLLAMA_SCHED_SPREAD=1" ```
Author
Owner

@mxyng commented on GitHub (Aug 21, 2024):

The problem seems to be the content of CUDA_VISIBLE_DEVICES in your original systemd configuration.

CUDA_VISIBLE_DEVICES (and HIP_VISIBLE_DEVICES) should be comma separated without spaces. Your original line contained a space between the comma and the next device.

Environment="CUDA_VISIBLE_DEVICES=GPU-986c10fb-51e6-ba79-67dd-fd9e95d31034, GPU-d742ac89-cd44-c238>
Environment="CUDA_VISIBLE_DEVICES=GPU-986c10fb-51e6-ba79-67dd-fd9e95d31034,GPU-d742ac89-cd44-c238>
<!-- gh-comment-id:2302932917 --> @mxyng commented on GitHub (Aug 21, 2024): The problem seems to be the content of `CUDA_VISIBLE_DEVICES` in your original systemd configuration. `CUDA_VISIBLE_DEVICES` (and `HIP_VISIBLE_DEVICES`) should be comma separated _without_ spaces. Your original line contained a space between the comma and the next device. ``` Environment="CUDA_VISIBLE_DEVICES=GPU-986c10fb-51e6-ba79-67dd-fd9e95d31034, GPU-d742ac89-cd44-c238> ``` ``` Environment="CUDA_VISIBLE_DEVICES=GPU-986c10fb-51e6-ba79-67dd-fd9e95d31034,GPU-d742ac89-cd44-c238> ```
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#50570