[GH-ISSUE #11356] multi-GPU issue on SLURM with cgroups enabled #69550

Closed
opened 2026-05-04 18:26:44 -05:00 by GiteaMirror · 3 comments
Owner

Originally created by @scimerman on GitHub (Jul 10, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/11356

What is the issue?

I have a cluster that contains nodes with mutliple GPUs. In our case 8 GPUs per node. Slurm neatly exposes only requested GPUs to a job. For instance, running a simple srun -N 1 -n 1 --gres=gpu:a40:2 ..., will reserve one node with two GPUs.
Now the magic happens with cgroups. When job spins up, the control groups will show only the GPU devices that user can actually use. Lets say from all available gpus (/dev/nvidia[0-7]), it exposes the device /dev/nvidia[1,2]. Therefore nvidia-smi will simply report

$ srun --qos regular -N 1 -n 1 --gres=gpu:a40:2 -t 08:00:00 --mem 38480M --pty bash -i
srun: job 1479307 queued and waiting for resources
srun: job 1479307 has been allocated resources
$ nvidia-smi 
Thu Jul 10 10:23:48 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 555.42.06              Driver Version: 555.42.06      CUDA Version: 12.5     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA A40                     On  |   00000000:00:0A.0 Off |                    0 |
|  0%   28C    P8             11W /  300W |       1MiB /  46068MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA A40                     On  |   00000000:00:0B.0 Off |                    0 |
|  0%   28C    P8             16W /  300W |       1MiB /  46068MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+
$ echo $CUDA_VISIBLE_DEVICES 
0,1
$ echo $SLURM_STEP_GPUS 
1,2
$ 

These two last variables are both actually completely correct. But in my test ollama then simply runs on CPU and not on GPU. This seems to me obviously because it goes for the wrong cuda device. Devices 0 and 1 are not correct ones.

So if I ran

$ ml ollama/0.9.5
$ ollama serve > ollama.log 2>&1
$ echo "what is the speed of light?" | OLLAMA_HOST="0.0.0.0:11435" CUDA_VISIBLE_DEVICES=${SLURM_STEP_GPUS} ollama run deepseek-r1:14b > ollama.reply

Most obvious approach could be simply 'remove cgroups': but cgroups are together with SLURM working and work fine for all other software. There has to be something wrong with how ollama detects the available device.

Changing CUDA_VISIBLE_DEVICES manually from 0,1 to 1,2, or anything else, does not work. Prepending CUDA_VISIBLE_DEVICES=[with various values] in front of running ollama serve does not work. Setting it to GPU-unique-device-id, like does not work either.

Relevant log output


OS

Linux

GPU

Nvidia

CPU

Intel

Ollama version

0.9.5 and/or 0.9.3

Originally created by @scimerman on GitHub (Jul 10, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/11356 ### What is the issue? I have a cluster that contains nodes with mutliple GPUs. In our case 8 GPUs per node. Slurm neatly exposes only requested GPUs to a job. For instance, running a simple `srun -N 1 -n 1 --gres=gpu:a40:2 ...`, will reserve one node with two GPUs. Now the magic happens with cgroups. When job spins up, the control groups will show only the GPU devices that user _can actually_ use. Lets say from all available gpus (`/dev/nvidia[0-7]`), it exposes the device `/dev/nvidia[1,2]`. Therefore `nvidia-smi` will simply report ``` $ srun --qos regular -N 1 -n 1 --gres=gpu:a40:2 -t 08:00:00 --mem 38480M --pty bash -i srun: job 1479307 queued and waiting for resources srun: job 1479307 has been allocated resources $ nvidia-smi Thu Jul 10 10:23:48 2025 +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 555.42.06 Driver Version: 555.42.06 CUDA Version: 12.5 | |-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 NVIDIA A40 On | 00000000:00:0A.0 Off | 0 | | 0% 28C P8 11W / 300W | 1MiB / 46068MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ | 1 NVIDIA A40 On | 00000000:00:0B.0 Off | 0 | | 0% 28C P8 16W / 300W | 1MiB / 46068MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ +-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | No running processes found | +-----------------------------------------------------------------------------------------+ $ echo $CUDA_VISIBLE_DEVICES 0,1 $ echo $SLURM_STEP_GPUS 1,2 $ ``` These two last variables are both actually completely correct. But in my test ollama then simply runs on CPU and not on GPU. This seems to me obviously because it goes for the wrong cuda device. Devices `0` and `1` are not correct ones. So if I ran ``` $ ml ollama/0.9.5 $ ollama serve > ollama.log 2>&1 $ echo "what is the speed of light?" | OLLAMA_HOST="0.0.0.0:11435" CUDA_VISIBLE_DEVICES=${SLURM_STEP_GPUS} ollama run deepseek-r1:14b > ollama.reply ``` Most obvious approach could be simply 'remove cgroups': but cgroups are together with SLURM working and work fine for all other software. There has to be something wrong with how ollama detects the available device. Changing CUDA_VISIBLE_DEVICES manually from `0,1` to `1,2`, or anything else, does not work. Prepending `CUDA_VISIBLE_DEVICES=[with various values]` in front of running `ollama serve` does not work. Setting it to `GPU-unique-device-id`, like does not work either. ### Relevant log output ```shell ``` ### OS Linux ### GPU Nvidia ### CPU Intel ### Ollama version 0.9.5 and/or 0.9.3
GiteaMirror added the bug label 2026-05-04 18:26:44 -05:00
Author
Owner

@rick-github commented on GitHub (Jul 10, 2025):

Unset ROCR_VISIBLE_DEVICES.

<!-- gh-comment-id:3056920642 --> @rick-github commented on GitHub (Jul 10, 2025): Unset `ROCR_VISIBLE_DEVICES`.
Author
Owner

@scimerman commented on GitHub (Jul 10, 2025):

Thank you Rick.
It seems that unset ROCR_VISIBLE_DEVICES really does help... I am super confused, as why would a variable of ROCR Radeon Open Compute platform control on how ollama runs on NVidia devices 🤨 But it seems to be working. Let me double check again and confirm that this solution works, and then I'll close the ticket.

Thanks again.

<!-- gh-comment-id:3056960636 --> @scimerman commented on GitHub (Jul 10, 2025): Thank you Rick. It seems that `unset ROCR_VISIBLE_DEVICES` really does help... I am super confused, as why would a variable of ROCR `Radeon Open Compute platform` control on how ollama runs on NVidia devices 🤨 But it seems to be working. Let me double check again and confirm that this solution works, and then I'll close the ticket. Thanks again.
Author
Owner

@scimerman commented on GitHub (Jul 10, 2025):

I can confirm. Your approach Rick seems to resolve the issue ✔️

I am still surprised that AMD variable works for CUDA devices, where unsetting CUDA_VISIBLE_DEVICES does not work...

But in either case, I would recommend that https://github.com/ollama/ollama/blob/main/docs/faq.md documentation is updated accordingly.

<!-- gh-comment-id:3057145943 --> @scimerman commented on GitHub (Jul 10, 2025): I can confirm. Your approach Rick seems to resolve the issue ✔️ I am still surprised that AMD variable works for CUDA devices, where unsetting CUDA_VISIBLE_DEVICES does not work... But in either case, I would recommend that https://github.com/ollama/ollama/blob/main/docs/faq.md documentation is updated accordingly.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#69550