[GH-ISSUE #4696] Problems with using multiple GPUs #2957

Closed
opened 2026-04-12 13:20:04 -05:00 by GiteaMirror · 3 comments
Owner

Originally created by @yuwencool on GitHub (May 29, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/4696

What is the issue?

When CUDA_VISIBLE_DEVICES=1,2 is used, and Ollama pulls a model, it only uses GPU numbered 1. unless ollama runs another model, GPU numbered 2 will be used. In this way, if Ollama turns on parallelism, parallel reasoning for the same model will only be performed on gpu 1, and the parallel speed is very slow. How can Ollama use multiple GPUs for one model on inference ?

OS

Linux

GPU

Nvidia

CPU

Intel

Ollama version

0.1.38

Originally created by @yuwencool on GitHub (May 29, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/4696 ### What is the issue? When CUDA_VISIBLE_DEVICES=1,2 is used, and Ollama pulls a model, it only uses GPU numbered 1. unless ollama runs another model, GPU numbered 2 will be used. In this way, if Ollama turns on parallelism, parallel reasoning for the same model will only be performed on gpu 1, and the parallel speed is very slow. How can Ollama use multiple GPUs for one model on inference ? ### OS Linux ### GPU Nvidia ### CPU Intel ### Ollama version 0.1.38
GiteaMirror added the bug label 2026-04-12 13:20:05 -05:00
Author
Owner

@alsavu commented on GitHub (May 29, 2024):

Hello, since this was labeled as a BUG less than 12 hours ago, would it be safe to asume that this issue is still present in version 0.1.39 for Windows ?

<!-- gh-comment-id:2137991086 --> @alsavu commented on GitHub (May 29, 2024): Hello, since this was labeled as a BUG less than 12 hours ago, would it be safe to asume that this issue is still present in version 0.1.39 for Windows ?
Author
Owner

@matbeedotcom commented on GitHub (May 31, 2024):

I know this may not be what you're looking for but multi-gpu works perfectly fine for me on docker, so if you were to run it on docker it could avoid this issue for you

<!-- gh-comment-id:2142609261 --> @matbeedotcom commented on GitHub (May 31, 2024): I know this may not be what you're looking for but multi-gpu works perfectly fine for me on docker, so if you were to run it on docker it could avoid this issue for you
Author
Owner

@dhiltgen commented on GitHub (May 31, 2024):

Duplicate of #4198

<!-- gh-comment-id:2142860506 --> @dhiltgen commented on GitHub (May 31, 2024): Duplicate of #4198
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#2957