[GH-ISSUE #9810] Support Multiple GPU? #68476

Closed
opened 2026-05-04 14:06:27 -05:00 by GiteaMirror · 8 comments
Owner

Originally created by @ROBODRILL on GitHub (Mar 17, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/9810

I used ollama deepseek-r1:32b Q4_K_M model,and my GPU env is TeslaT4, Driver Version: 470.57.02 CUDA Version:11.4,8GPU,every GPU's memory is 16GB. use model,very slow to answer questions,and I use command "nvidia-sim", every GPU was been used 4GB memory, total 32GB.
I learned Q4_K_M only used 8GB;
I don't know the problem where is it

Originally created by @ROBODRILL on GitHub (Mar 17, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/9810 I used ollama deepseek-r1:32b Q4_K_M model,and my GPU env is TeslaT4, Driver Version: 470.57.02 CUDA Version:11.4,8GPU,every GPU's memory is 16GB. use model,very slow to answer questions,and I use command "nvidia-sim", every GPU was been used 4GB memory, total 32GB. I learned Q4_K_M only used 8GB; I don't know the problem where is it
GiteaMirror added the feature request label 2026-05-04 14:06:27 -05:00
Author
Owner

@rick-github commented on GitHub (Mar 17, 2025):

Server logs will help in debugging.

<!-- gh-comment-id:2728572227 --> @rick-github commented on GitHub (Mar 17, 2025): [Server logs](https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md#how-to-troubleshoot-issues) will help in debugging.
Author
Owner

@ROBODRILL commented on GitHub (Mar 17, 2025):

Server logs will help in debugging.

ollama runing in docker,the container log only show the request info

<!-- gh-comment-id:2728993293 --> @ROBODRILL commented on GitHub (Mar 17, 2025): > [Server logs](https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md#how-to-troubleshoot-issues) will help in debugging. ollama runing in docker,the container log only show the request info
Author
Owner

@rick-github commented on GitHub (Mar 17, 2025):

The log will show memory calculations and GPU assignment.

<!-- gh-comment-id:2729019248 --> @rick-github commented on GitHub (Mar 17, 2025): The log will show memory calculations and GPU assignment.
Author
Owner

@jiusi9 commented on GitHub (Mar 18, 2025):

I'm looking for an answer as well.

Can a model run multiple GPUs to support tensor parallelism?

<!-- gh-comment-id:2732304614 --> @jiusi9 commented on GitHub (Mar 18, 2025): I'm looking for an answer as well. Can a model run multiple GPUs to support tensor parallelism?
Author
Owner

@ROBODRILL commented on GitHub (Mar 19, 2025):

Thank you ,I saw the container's log,I used the DeepSeek-R1:32b,Q4_K_M model,the info "memory.required.full=33.4GiB",I could't understand this.I learned the Q4 model only needs 8Gib

<!-- gh-comment-id:2735129901 --> @ROBODRILL commented on GitHub (Mar 19, 2025): Thank you ,I saw the container's log,I used the DeepSeek-R1:32b,Q4_K_M model,the info "memory.required.full=33.4GiB",I could't understand this.I learned the Q4 model only needs 8Gib
Author
Owner

@hnedelciuc commented on GitHub (Mar 24, 2025):

Wait... this feature is not available yet? I thought this capability already existed and thought it was not working for me because of the 9070 XT support not being added yet.

Being able to run a larger model on more than one GPU simultaneously is an important feature and many would appreciate this. I assume making it work for more than one GPU of different manufacturers at once would be more difficult, but please implement it at least such that it would run a model on 2+ GPUs of the same brand (either AMD or NVidia). Making it work for different models within the same brand would also be crucial (like, running a model on a 6950XT and 9070 XT at the same time).

<!-- gh-comment-id:2746712466 --> @hnedelciuc commented on GitHub (Mar 24, 2025): Wait... this feature is not available yet? I thought this capability already existed and thought it was not working for me because of the 9070 XT support not being added yet. Being able to run a larger model on more than one GPU simultaneously is an important feature and many would appreciate this. I assume making it work for more than one GPU of different manufacturers at once would be more difficult, but please implement it at least such that it would run a model on 2+ GPUs of the same brand (either AMD or NVidia). Making it work for different models within the same brand would also be crucial (like, running a model on a 6950XT and 9070 XT at the same time).
Author
Owner

@rick-github commented on GitHub (Mar 24, 2025):

Multiple different GPUs are supported.

<!-- gh-comment-id:2747103943 --> @rick-github commented on GitHub (Mar 24, 2025): Multiple different GPUs are supported.
Author
Owner

@hnedelciuc commented on GitHub (Mar 24, 2025):

Multiple different GPUs are supported.

That's what I thought. The title of this issue is confusing.

<!-- gh-comment-id:2748522583 --> @hnedelciuc commented on GitHub (Mar 24, 2025): > Multiple different GPUs are supported. That's what I thought. The title of this issue is confusing.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#68476