[GH-ISSUE #3095] Limit ollama usage of GPUs using CUDA_VISIBLE_DEVICES #48411

Closed
opened 2026-04-28 08:08:24 -05:00 by GiteaMirror · 13 comments
Owner

Originally created by @fengbolan on GitHub (Mar 13, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/3095

Originally assigned to: @dhiltgen on GitHub.

I've read the updated docs. The previous issue regarding the inability to limit OLLAMA usage of GPUs using CUDA_VISIBLE_DEVICES has not been resolved. Despite setting the environment variable CUDA_VISIBLE_DEVICES to a specific range or list of GPU IDs, OLLIMA continues to use all available GPUs during training instead of only the specified ones.

Originally created by @fengbolan on GitHub (Mar 13, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/3095 Originally assigned to: @dhiltgen on GitHub. I've read the updated docs. The previous issue regarding the inability to limit OLLAMA usage of GPUs using CUDA_VISIBLE_DEVICES has not been resolved. Despite setting the environment variable CUDA_VISIBLE_DEVICES to a specific range or list of GPU IDs, OLLIMA continues to use all available GPUs during training instead of only the specified ones.
GiteaMirror added the nvidia label 2026-04-28 08:08:24 -05:00
Author
Owner

@Tipizen commented on GitHub (Mar 13, 2024):

I've been able to use this to control which GPU Ollama is running on

export OLLAMA_HOST="localhost:1234"
CUDA_VISIBLE_DEVICES=0 ollama serve

<!-- gh-comment-id:1994267597 --> @Tipizen commented on GitHub (Mar 13, 2024): I've been able to use this to control which GPU Ollama is running on export OLLAMA_HOST="localhost:1234" CUDA_VISIBLE_DEVICES=0 ollama serve
Author
Owner

@dhiltgen commented on GitHub (Mar 13, 2024):

@fengbolan can you share a little more information about your setup? Are you running linux or windows? If linux, are you using systemd to run ollama based on our install script, or did you install via some other packaging system or process? How did you pass the variable to the server? Can you also set OLLAMA_DEBUG=1 and share a server log during the startup so we can see what it's doing and maybe get some more insight into why it's not using just the GPUs you specify?

<!-- gh-comment-id:1994498355 --> @dhiltgen commented on GitHub (Mar 13, 2024): @fengbolan can you share a little more information about your setup? Are you running linux or windows? If linux, are you using systemd to run ollama based on our install script, or did you install via some other packaging system or process? How did you pass the variable to the server? Can you also set OLLAMA_DEBUG=1 and share a server log during the startup so we can see what it's doing and maybe get some more insight into why it's not using just the GPUs you specify?
Author
Owner

@BruceMacD commented on GitHub (Mar 13, 2024):

Just a note that if you're on linux you probably need to add the environment variable to the background service, rather than in your current terminal session:
https://github.com/ollama/ollama/blob/main/docs/linux.md#adding-ollama-as-a-startup-service-recommended

<!-- gh-comment-id:1994536405 --> @BruceMacD commented on GitHub (Mar 13, 2024): Just a note that if you're on linux you probably need to add the environment variable to the background service, rather than in your current terminal session: https://github.com/ollama/ollama/blob/main/docs/linux.md#adding-ollama-as-a-startup-service-recommended
Author
Owner

@lufixSch commented on GitHub (Mar 13, 2024):

I have this issue on ROCm. On ROCm you can use either ROCM_VISIBLE_DEVICES or CUDA_VISIBLE_DEVICES.
I start ollama with CUDA_VISIBLE_DEVICES=0 ollama serve but it still detects both of my GPUs

<!-- gh-comment-id:1995252990 --> @lufixSch commented on GitHub (Mar 13, 2024): I have this issue on ROCm. On ROCm you can use either `ROCM_VISIBLE_DEVICES` or `CUDA_VISIBLE_DEVICES`. I start ollama with `CUDA_VISIBLE_DEVICES=0 ollama serve` but it still detects both of my GPUs
Author
Owner

@PlanetMacro commented on GitHub (Mar 26, 2024):

CUDA_VISIBLE_DEVICES=0 ollama run doesn't work for me on debian. It distributes almost uniform over all CUDA GPUs.

Can you post in README how to specify CUDA devices? Its not clear to me how it's supposed to be handled. For larger servers mit multiple GPUs or complex setups with other software runing on some GPUs (like in Speech2Speech applications using whisper), fine grained control over GPU loads is mandatory

<!-- gh-comment-id:2021009634 --> @PlanetMacro commented on GitHub (Mar 26, 2024): CUDA_VISIBLE_DEVICES=0 ollama run <MODEL> doesn't work for me on debian. It distributes almost uniform over all CUDA GPUs. Can you post in README how to specify CUDA devices? Its not clear to me how it's supposed to be handled. For larger servers mit multiple GPUs or complex setups with other software runing on some GPUs (like in Speech2Speech applications using whisper), fine grained control over GPU loads is mandatory
Author
Owner

@dhiltgen commented on GitHub (Mar 26, 2024):

@PlanetMacro we've recently updated the GPU docs to include an explanation. https://github.com/ollama/ollama/blob/main/docs/gpu.md#gpu-selection

<!-- gh-comment-id:2021347813 --> @dhiltgen commented on GitHub (Mar 26, 2024): @PlanetMacro we've recently updated the GPU docs to include an explanation. https://github.com/ollama/ollama/blob/main/docs/gpu.md#gpu-selection
Author
Owner

@PlanetMacro commented on GitHub (Mar 26, 2024):

In this case there might be a bug. The system does not change its GPU usage for me, it will always use all GPUs even I do something like 'CUDA_VISIBLE_DEVICES=-1 ollama run mixtral'

If I can be of help for debugging, I might need some instructions

<!-- gh-comment-id:2021553421 --> @PlanetMacro commented on GitHub (Mar 26, 2024): In this case there might be a bug. The system does not change its GPU usage for me, it will always use all GPUs even I do something like 'CUDA_VISIBLE_DEVICES=-1 ollama run mixtral' If I can be of help for debugging, I might need some instructions
Author
Owner

@dhiltgen commented on GitHub (Mar 26, 2024):

@PlanetMacro I'm not sure exactly what your objective is, but assuming you have a 2+ GPU system and you're trying to get Ollama to run on a specific GPU, please give the following a shot and share the logs.

sudo systemctl stop ollama

nvidial-smi -L
<note the UUID and replace the one below with yours>

CUDA_VISIBLE_DEVICES=GPU-452cac9f-6960-839c-4fb3-0cec83699196 OLLAMA_DEBUG=1 ollama serve
<!-- gh-comment-id:2021578075 --> @dhiltgen commented on GitHub (Mar 26, 2024): @PlanetMacro I'm not sure exactly what your objective is, but assuming you have a 2+ GPU system and you're trying to get Ollama to run on a specific GPU, please give the following a shot and share the logs. ``` sudo systemctl stop ollama nvidial-smi -L <note the UUID and replace the one below with yours> CUDA_VISIBLE_DEVICES=GPU-452cac9f-6960-839c-4fb3-0cec83699196 OLLAMA_DEBUG=1 ollama serve ```
Author
Owner

@PlanetMacro commented on GitHub (Mar 27, 2024):

user@debian-ai:~$ sudo systemctl stop ollama
[sudo] password for user:

user@debian-ai:~$ nvidia-smi -L
GPU 0: Tesla M40 24GB (UUID: GPU-81c595e9-30fa-224b-5e8e-8c32c13d7782)
GPU 1: Tesla M40 24GB (UUID: GPU-6e247760-9d88-5fa3-1849-0f693b2aaf51)
GPU 2: Tesla M40 24GB (UUID: GPU-b13936b5-d8f7-d683-5702-40bc7c2f36ca)
GPU 3: Tesla M40 24GB (UUID: GPU-15f4bee9-7c34-da21-cda0-eb1c8d0c2288)
GPU 4: Tesla M40 24GB (UUID: GPU-c904a829-2dfb-593f-0721-d62a3ebc9cd2)

user@debian-ai:~$ CUDA_VISIBLE_DEVICES=GPU-81c595e9-30fa-224b-5e8e-8c32c13d7782 OLLAMA_DEBUG=1 ollama serve
time=2024-03-27T10:15:47.929+01:00 level=INFO source=images.go:710 msg="total blobs: 0"
time=2024-03-27T10:15:47.929+01:00 level=INFO source=images.go:717 msg="total unused blobs removed: 0"
time=2024-03-27T10:15:47.929+01:00 level=INFO source=routes.go:1019 msg="Listening on 127.0.0.1:11434 (version 0.1.27)"
time=2024-03-27T10:15:47.929+01:00 level=INFO source=payload_common.go:107 msg="Extracting dynamic libraries..."
time=2024-03-27T10:15:51.471+01:00 level=INFO source=payload_common.go:146 msg="Dynamic LLM libraries [cuda_v11 rocm_v6 rocm_v5 cpu_avx cpu cpu_avx2]"
time=2024-03-27T10:15:51.471+01:00 level=DEBUG source=payload_common.go:147 msg="Override detection logic by setting OLLAMA_LLM_LIBRARY"
time=2024-03-27T10:15:51.471+01:00 level=INFO source=gpu.go:94 msg="Detecting GPU type"
time=2024-03-27T10:15:51.471+01:00 level=INFO source=gpu.go:265 msg="Searching for GPU management library libnvidia-ml.so"
time=2024-03-27T10:15:51.471+01:00 level=DEBUG source=gpu.go:283 msg="gpu management search paths: [/usr/local/cuda/lib64/libnvidia-ml.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-ml.so* /usr/lib/x86_64-linux-gnu/libnvidia-ml.so* /usr/lib/wsl/lib/libnvidia-ml.so* /usr/lib/wsl/drivers//libnvidia-ml.so /opt/cuda/lib64/libnvidia-ml.so* /usr/lib*/libnvidia-ml.so* /usr/local/lib*/libnvidia-ml.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libnvidia-ml.so* /usr/lib/aarch64-linux-gnu/libnvidia-ml.so* /opt/cuda/targets/x86_64-linux/lib/stubs/libnvidia-ml.so* /home/user/libnvidia-ml.so*]"
time=2024-03-27T10:15:51.473+01:00 level=INFO source=gpu.go:311 msg="Discovered GPU libraries: [/usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-ml.so.550.54.14]"
wiring nvidia management library functions in /usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-ml.so.550.54.14
dlsym: nvmlInit_v2
dlsym: nvmlShutdown
dlsym: nvmlDeviceGetHandleByIndex
dlsym: nvmlDeviceGetMemoryInfo
dlsym: nvmlDeviceGetCount_v2
dlsym: nvmlDeviceGetCudaComputeCapability
dlsym: nvmlSystemGetDriverVersion
dlsym: nvmlDeviceGetName
dlsym: nvmlDeviceGetSerial
dlsym: nvmlDeviceGetVbiosVersion
dlsym: nvmlDeviceGetBoardPartNumber
dlsym: nvmlDeviceGetBrand
CUDA driver version: 550.54.14
time=2024-03-27T10:15:51.483+01:00 level=INFO source=gpu.go:99 msg="Nvidia GPU detected"
time=2024-03-27T10:15:51.483+01:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
[0] CUDA device name: Tesla M40 24GB
[0] CUDA part number: 900-2G600-0010-000
[0] CUDA S/N: 0324516023641
[0] CUDA vbios version: 84.00.56.00.03
[0] CUDA brand: 2
[0] CUDA totalMem 25769803776
[0] CUDA usedMem 25654198272
[1] CUDA device name: Tesla M40 24GB
[1] CUDA part number: 900-2G600-0010-000
[1] CUDA S/N: 0323716037129
[1] CUDA vbios version: 84.00.56.00.03
[1] CUDA brand: 2
[1] CUDA totalMem 25769803776
[1] CUDA usedMem 25654198272
[2] CUDA device name: Tesla M40 24GB
[2] CUDA part number: 900-2G600-0010-000
[2] CUDA S/N: 0323416024398
[2] CUDA vbios version: 84.00.56.00.03
[2] CUDA brand: 2
[2] CUDA totalMem 25769803776
[2] CUDA usedMem 25654198272
[3] CUDA device name: Tesla M40 24GB
[3] CUDA part number: 900-2G600-0010-000
[3] CUDA S/N: 0324516103957
[3] CUDA vbios version: 84.00.56.00.03
[3] CUDA brand: 2
[3] CUDA totalMem 24159191040
[3] CUDA usedMem 24046796800
[4] CUDA device name: Tesla M40 24GB
[4] CUDA part number: 900-2G600-2210-000
[4] CUDA S/N: 0322516058117
[4] CUDA vbios version: 84.00.56.00.03
[4] CUDA brand: 2
[4] CUDA totalMem 24159191040
[4] CUDA usedMem 24046796800
time=2024-03-27T10:15:51.511+01:00 level=INFO source=gpu.go:146 msg="CUDA Compute Capability detected: 5.2"
time=2024-03-27T10:15:51.511+01:00 level=DEBUG source=gpu.go:254 msg="cuda detected 5 devices with 107336M available memory"


user@debian-ai:~$ nvidia-smi
Wed Mar 27 10:19:02 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.14 Driver Version: 550.54.14 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 Tesla M40 24GB On | 00000000:05:00.0 Off | Off |
| N/A 25C P8 17W / 250W | 0MiB / 24576MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 1 Tesla M40 24GB On | 00000000:21:00.0 Off | Off |
| N/A 25C P8 18W / 250W | 0MiB / 24576MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 2 Tesla M40 24GB On | 00000000:22:00.0 Off | Off |
| N/A 24C P8 17W / 250W | 0MiB / 24576MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 3 Tesla M40 24GB On | 00000000:41:00.0 Off | 0 |
| N/A 26C P8 17W / 250W | 0MiB / 23040MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 4 Tesla M40 24GB On | 00000000:61:00.0 Off | 0 |
| N/A 24C P8 16W / 250W | 0MiB / 23040MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+

Afaik, the CUDA_VISIBLE_DEVICES environment variable is read by the CUDA driver. So it needs to be set before the CUDA driver is initialized. However as you can see no process is using any GPU (No xorg or anything)

<!-- gh-comment-id:2022288167 --> @PlanetMacro commented on GitHub (Mar 27, 2024): user@debian-ai:~$ sudo systemctl stop ollama [sudo] password for user: user@debian-ai:~$ nvidia-smi -L GPU 0: Tesla M40 24GB (UUID: GPU-81c595e9-30fa-224b-5e8e-8c32c13d7782) GPU 1: Tesla M40 24GB (UUID: GPU-6e247760-9d88-5fa3-1849-0f693b2aaf51) GPU 2: Tesla M40 24GB (UUID: GPU-b13936b5-d8f7-d683-5702-40bc7c2f36ca) GPU 3: Tesla M40 24GB (UUID: GPU-15f4bee9-7c34-da21-cda0-eb1c8d0c2288) GPU 4: Tesla M40 24GB (UUID: GPU-c904a829-2dfb-593f-0721-d62a3ebc9cd2) user@debian-ai:~$ CUDA_VISIBLE_DEVICES=GPU-81c595e9-30fa-224b-5e8e-8c32c13d7782 OLLAMA_DEBUG=1 ollama serve time=2024-03-27T10:15:47.929+01:00 level=INFO source=images.go:710 msg="total blobs: 0" time=2024-03-27T10:15:47.929+01:00 level=INFO source=images.go:717 msg="total unused blobs removed: 0" time=2024-03-27T10:15:47.929+01:00 level=INFO source=routes.go:1019 msg="Listening on 127.0.0.1:11434 (version 0.1.27)" time=2024-03-27T10:15:47.929+01:00 level=INFO source=payload_common.go:107 msg="Extracting dynamic libraries..." time=2024-03-27T10:15:51.471+01:00 level=INFO source=payload_common.go:146 msg="Dynamic LLM libraries [cuda_v11 rocm_v6 rocm_v5 cpu_avx cpu cpu_avx2]" time=2024-03-27T10:15:51.471+01:00 level=DEBUG source=payload_common.go:147 msg="Override detection logic by setting OLLAMA_LLM_LIBRARY" time=2024-03-27T10:15:51.471+01:00 level=INFO source=gpu.go:94 msg="Detecting GPU type" time=2024-03-27T10:15:51.471+01:00 level=INFO source=gpu.go:265 msg="Searching for GPU management library libnvidia-ml.so" time=2024-03-27T10:15:51.471+01:00 level=DEBUG source=gpu.go:283 msg="gpu management search paths: [/usr/local/cuda/lib64/libnvidia-ml.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-ml.so* /usr/lib/x86_64-linux-gnu/libnvidia-ml.so* /usr/lib/wsl/lib/libnvidia-ml.so* /usr/lib/wsl/drivers/*/libnvidia-ml.so* /opt/cuda/lib64/libnvidia-ml.so* /usr/lib*/libnvidia-ml.so* /usr/local/lib*/libnvidia-ml.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libnvidia-ml.so* /usr/lib/aarch64-linux-gnu/libnvidia-ml.so* /opt/cuda/targets/x86_64-linux/lib/stubs/libnvidia-ml.so* /home/user/libnvidia-ml.so*]" time=2024-03-27T10:15:51.473+01:00 level=INFO source=gpu.go:311 msg="Discovered GPU libraries: [/usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-ml.so.550.54.14]" wiring nvidia management library functions in /usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-ml.so.550.54.14 dlsym: nvmlInit_v2 dlsym: nvmlShutdown dlsym: nvmlDeviceGetHandleByIndex dlsym: nvmlDeviceGetMemoryInfo dlsym: nvmlDeviceGetCount_v2 dlsym: nvmlDeviceGetCudaComputeCapability dlsym: nvmlSystemGetDriverVersion dlsym: nvmlDeviceGetName dlsym: nvmlDeviceGetSerial dlsym: nvmlDeviceGetVbiosVersion dlsym: nvmlDeviceGetBoardPartNumber dlsym: nvmlDeviceGetBrand CUDA driver version: 550.54.14 time=2024-03-27T10:15:51.483+01:00 level=INFO source=gpu.go:99 msg="Nvidia GPU detected" time=2024-03-27T10:15:51.483+01:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2" [0] CUDA device name: Tesla M40 24GB [0] CUDA part number: 900-2G600-0010-000 [0] CUDA S/N: 0324516023641 [0] CUDA vbios version: 84.00.56.00.03 [0] CUDA brand: 2 [0] CUDA totalMem 25769803776 [0] CUDA usedMem 25654198272 [1] CUDA device name: Tesla M40 24GB [1] CUDA part number: 900-2G600-0010-000 [1] CUDA S/N: 0323716037129 [1] CUDA vbios version: 84.00.56.00.03 [1] CUDA brand: 2 [1] CUDA totalMem 25769803776 [1] CUDA usedMem 25654198272 [2] CUDA device name: Tesla M40 24GB [2] CUDA part number: 900-2G600-0010-000 [2] CUDA S/N: 0323416024398 [2] CUDA vbios version: 84.00.56.00.03 [2] CUDA brand: 2 [2] CUDA totalMem 25769803776 [2] CUDA usedMem 25654198272 [3] CUDA device name: Tesla M40 24GB [3] CUDA part number: 900-2G600-0010-000 [3] CUDA S/N: 0324516103957 [3] CUDA vbios version: 84.00.56.00.03 [3] CUDA brand: 2 [3] CUDA totalMem 24159191040 [3] CUDA usedMem 24046796800 [4] CUDA device name: Tesla M40 24GB [4] CUDA part number: 900-2G600-2210-000 [4] CUDA S/N: 0322516058117 [4] CUDA vbios version: 84.00.56.00.03 [4] CUDA brand: 2 [4] CUDA totalMem 24159191040 [4] CUDA usedMem 24046796800 time=2024-03-27T10:15:51.511+01:00 level=INFO source=gpu.go:146 msg="CUDA Compute Capability detected: 5.2" time=2024-03-27T10:15:51.511+01:00 level=DEBUG source=gpu.go:254 msg="cuda detected 5 devices with 107336M available memory" --------------------------- user@debian-ai:~$ nvidia-smi Wed Mar 27 10:19:02 2024 +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 550.54.14 Driver Version: 550.54.14 CUDA Version: 12.4 | |-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 Tesla M40 24GB On | 00000000:05:00.0 Off | Off | | N/A 25C P8 17W / 250W | 0MiB / 24576MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ | 1 Tesla M40 24GB On | 00000000:21:00.0 Off | Off | | N/A 25C P8 18W / 250W | 0MiB / 24576MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ | 2 Tesla M40 24GB On | 00000000:22:00.0 Off | Off | | N/A 24C P8 17W / 250W | 0MiB / 24576MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ | 3 Tesla M40 24GB On | 00000000:41:00.0 Off | 0 | | N/A 26C P8 17W / 250W | 0MiB / 23040MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ | 4 Tesla M40 24GB On | 00000000:61:00.0 Off | 0 | | N/A 24C P8 16W / 250W | 0MiB / 23040MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ +-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | No running processes found | +-----------------------------------------------------------------------------------------+ Afaik, the CUDA_VISIBLE_DEVICES environment variable is read by the CUDA driver. So it needs to be set before the CUDA driver is initialized. However as you can see no process is using any GPU (No xorg or anything)
Author
Owner

@dhiltgen commented on GitHub (Mar 27, 2024):

It looks like you didn't try to load a model. I believe it should just use that one GPU, however, you have bumped into another known defect #1514 where we incorrectly include the VRAM of all the GPUs in this scenario. Assuming it does correctly only use the GPU you specified, to workaround the memory calculation bug, you can set OLLAMA_MAX_VRAM=<bytes> to bypass the automatic calculations.

<!-- gh-comment-id:2023789602 --> @dhiltgen commented on GitHub (Mar 27, 2024): It looks like you didn't try to load a model. I believe it should just use that one GPU, however, you have bumped into another known defect #1514 where we incorrectly include the VRAM of all the GPUs in this scenario. Assuming it does correctly only use the GPU you specified, to workaround the memory calculation bug, you can set `OLLAMA_MAX_VRAM=<bytes>` to bypass the automatic calculations.
Author
Owner

@PlanetMacro commented on GitHub (Mar 28, 2024):

Ah right. Loading the model indeed only used the specified GPUs. Hence this approach solved it for me.

<!-- gh-comment-id:2025985489 --> @PlanetMacro commented on GitHub (Mar 28, 2024): Ah right. Loading the model indeed only used the specified GPUs. Hence this approach solved it for me.
Author
Owner

@dhiltgen commented on GitHub (Apr 12, 2024):

It sounds like we can mark this one closed. Be aware that PR #3418 will further refine our multi-GPU handling and enable multiple models to be loaded across a pool of GPUs which likely maps to your use-case.

<!-- gh-comment-id:2052637193 --> @dhiltgen commented on GitHub (Apr 12, 2024): It sounds like we can mark this one closed. Be aware that PR #3418 will further refine our multi-GPU handling and enable multiple models to be loaded across a pool of GPUs which likely maps to your use-case.
Author
Owner

@ksaadDE commented on GitHub (Nov 4, 2025):

If someone needs help grabbing AMD GPU UUIDs:
https://github.com/ollama/ollama/issues/12945

(hint: only ROCR_VISIBLE_DEVICES works for limiting or isolating GPUs - selecting only a few or one)

<!-- gh-comment-id:3485508779 --> @ksaadDE commented on GitHub (Nov 4, 2025): If someone needs help grabbing AMD GPU UUIDs: https://github.com/ollama/ollama/issues/12945 (hint: only ROCR_VISIBLE_DEVICES works for limiting or isolating GPUs - selecting only a few or one)
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#48411