[GH-ISSUE #8546] I have multiple GPUs, but I cannot use them all。 #5514

Closed
opened 2026-04-12 16:45:57 -05:00 by GiteaMirror · 5 comments
Owner

Originally created by @dayphosphor on GitHub (Jan 23, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/8546

What is the issue?

I have four 4090 GPUs.

Image

but, only one GPU is used.

I have configured the environment variables。 /etc/systemd/system/ollama.service

Image

OS

No response

GPU

No response

CPU

No response

Ollama version

No response

Originally created by @dayphosphor on GitHub (Jan 23, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/8546 ### What is the issue? I have four 4090 GPUs. <img width="815" alt="Image" src="https://github.com/user-attachments/assets/6edf430c-a927-4010-a46b-6ffaa8288f30" /> but, only one GPU is used. I have configured the environment variables。 /etc/systemd/system/ollama.service <img width="823" alt="Image" src="https://github.com/user-attachments/assets/2c923009-c235-49cc-a126-25527ca964c2" /> ### OS _No response_ ### GPU _No response_ ### CPU _No response_ ### Ollama version _No response_
GiteaMirror added the bug label 2026-04-12 16:45:57 -05:00
Author
Owner

@rick-github commented on GitHub (Jan 23, 2025):

If the model fits on one GPU, that's all that will be used - multiple GPUs doesn't confer a performance increase, see here. If you want to force ollama to use all GPUs for a model, set OLLAMA_SCHED_SPREAD.

<!-- gh-comment-id:2609069452 --> @rick-github commented on GitHub (Jan 23, 2025): If the model fits on one GPU, that's all that will be used - multiple GPUs doesn't confer a performance increase, see [here](https://github.com/ollama/ollama/issues/7648#issuecomment-2473561990). If you want to force ollama to use all GPUs for a model, set [`OLLAMA_SCHED_SPREAD`](https://github.com/ollama/ollama/blob/ca2f9843c8c71491d5abf626c73508e5a1685cea/envconfig/config.go#L251).
Author
Owner

@UberMetroid commented on GitHub (Jan 23, 2025):

This is solved and should be marked closed.

<!-- gh-comment-id:2609828387 --> @UberMetroid commented on GitHub (Jan 23, 2025): This is solved and should be marked closed.
Author
Owner

@yxc0915 commented on GitHub (Jan 23, 2025):

40系又不支持SLI,如果是3090ti兴许还能多卡调度

<!-- gh-comment-id:2610405561 --> @yxc0915 commented on GitHub (Jan 23, 2025): 40系又不支持SLI,如果是3090ti兴许还能多卡调度
Author
Owner

@UberMetroid commented on GitHub (Jan 24, 2025):

40系又不支持SLI,如果是3090ti兴许还能多卡调度

With OLLAMA_SCHED_SPREAD you dont need SLI for multi use GPUs.

<!-- gh-comment-id:2612482930 --> @UberMetroid commented on GitHub (Jan 24, 2025): > 40系又不支持SLI,如果是3090ti兴许还能多卡调度 With [OLLAMA_SCHED_SPREAD](https://github.com/ollama/ollama/blob/ca2f9843c8c71491d5abf626c73508e5a1685cea/envconfig/config.go#L251) you dont need SLI for multi use GPUs.
Author
Owner

@mmdyu commented on GitHub (Feb 15, 2025):

If the model fits on one GPU, that's all that will be used - multiple GPUs doesn't confer a performance increase, see here. If you want to force ollama to use all GPUs for a model, set OLLAMA_SCHED_SPREAD.

After adding, the total number of GPUs should not exceed 100%

Image

<!-- gh-comment-id:2661005785 --> @mmdyu commented on GitHub (Feb 15, 2025): > If the model fits on one GPU, that's all that will be used - multiple GPUs doesn't confer a performance increase, see [here](https://github.com/ollama/ollama/issues/7648#issuecomment-2473561990). If you want to force ollama to use all GPUs for a model, set [`OLLAMA_SCHED_SPREAD`](https://github.com/ollama/ollama/blob/ca2f9843c8c71491d5abf626c73508e5a1685cea/envconfig/config.go#L251). After adding, the total number of GPUs should not exceed 100% ![Image](https://github.com/user-attachments/assets/d4d84e13-9591-44bc-8c7a-056ed6951f59)
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#5514