[GH-ISSUE #2362] Ollama Mixtral uses only 7% of the Nvidia RTX A4000 GPU. #27130

Closed
opened 2026-04-22 04:06:19 -05:00 by GiteaMirror · 5 comments
Owner

Originally created by @nejib1 on GitHub (Feb 5, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/2362

Originally assigned to: @dhiltgen on GitHub.

Hello,

When I execute Ollama Mixtral with the Nvidia A4000 (16GB), I observe that only 7% of the GPU is utilized. Do you know why this might be happening? Additionally, the process seems somewhat slow. It appears that Ollama Mixtral is using 40% of the CPU but only 7% of the GPU.

rp9k0CV 1

Do you have any suggestions on how to increase GPU utilization instead of %?

Originally created by @nejib1 on GitHub (Feb 5, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/2362 Originally assigned to: @dhiltgen on GitHub. Hello, When I execute Ollama Mixtral with the Nvidia A4000 (16GB), I observe that only 7% of the GPU is utilized. Do you know why this might be happening? Additionally, the process seems somewhat slow. It appears that Ollama Mixtral is using 40% of the CPU but only 7% of the GPU. ![rp9k0CV 1](https://github.com/ollama/ollama/assets/10485460/cafc29e9-3068-4c44-af0d-a665c6b90ee9) Do you have any suggestions on how to increase GPU utilization instead of %?
GiteaMirror added the gpu label 2026-04-22 04:06:19 -05:00
Author
Owner

@MatMatMatMatMatMat commented on GitHub (Feb 5, 2024):

Same here on MacBook M1 Pro 32Go :
GPU usage with mixtral is 0. Really slow.
Same prompt with mistral gpu usage between 70-90% Really fast.

<!-- gh-comment-id:1927603982 --> @MatMatMatMatMatMat commented on GitHub (Feb 5, 2024): Same here on MacBook M1 Pro 32Go : GPU usage with mixtral is 0. Really slow. Same prompt with mistral gpu usage between 70-90% Really fast.
Author
Owner

@jmorganca commented on GitHub (Feb 5, 2024):

Hi @nejib1, it seems that your system is bottlenecked on the CPU since the entire model won't fit into memory (only some does, as you can see in nvidia-smi (thanks for sharing this 😊 ) it's 14.8/16.3GiB which is almost all of your VRAM

@MatMatMatMatMatMat thanks for comment – GPU offloading isn't supported in macOS (yet!) so Mixtral will run on CPU on a 32GB Macbook Pro

<!-- gh-comment-id:1927632075 --> @jmorganca commented on GitHub (Feb 5, 2024): Hi @nejib1, it seems that your system is bottlenecked on the CPU since the entire model won't fit into memory (only some does, as you can see in `nvidia-smi` (thanks for sharing this 😊 ) it's 14.8/16.3GiB which is almost all of your VRAM @MatMatMatMatMatMat thanks for comment – GPU offloading isn't supported in macOS (yet!) so Mixtral will run on CPU on a 32GB Macbook Pro
Author
Owner

@Chakit22 commented on GitHub (Feb 9, 2024):

@jmorganca Mistral is also running on my system by using maximum GPU usage but its like sometimes the GPU usage is lesser and sometimes it's higher but I seem to get a timeout error using mistral on MAC M2 Pro 16GB RAM.

<!-- gh-comment-id:1935453657 --> @Chakit22 commented on GitHub (Feb 9, 2024): @jmorganca Mistral is also running on my system by using maximum GPU usage but its like sometimes the GPU usage is lesser and sometimes it's higher but I seem to get a timeout error using mistral on MAC M2 Pro 16GB RAM.
Author
Owner

@nejib1 commented on GitHub (Feb 9, 2024):

Hi @nejib1, it seems that your system is bottlenecked on the CPU since the entire model won't fit into memory (only some does, as you can see in nvidia-smi (thanks for sharing this 😊 ) it's 14.8/16.3GiB which is almost all of your VRAM

@MatMatMatMatMatMat thanks for comment – GPU offloading isn't supported in macOS (yet!) so Mixtral will run on CPU on a 32GB Macbook Pro

Thank you for your help

<!-- gh-comment-id:1936748721 --> @nejib1 commented on GitHub (Feb 9, 2024): > Hi @nejib1, it seems that your system is bottlenecked on the CPU since the entire model won't fit into memory (only some does, as you can see in `nvidia-smi` (thanks for sharing this 😊 ) it's 14.8/16.3GiB which is almost all of your VRAM > > @MatMatMatMatMatMat thanks for comment – GPU offloading isn't supported in macOS (yet!) so Mixtral will run on CPU on a 32GB Macbook Pro Thank you for your help
Author
Owner

@dhiltgen commented on GitHub (Mar 12, 2024):

It looks like we can close this issue as resolved. If you're still having problems, please let us know.

<!-- gh-comment-id:1992625213 --> @dhiltgen commented on GitHub (Mar 12, 2024): It looks like we can close this issue as resolved. If you're still having problems, please let us know.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#27130