[GH-ISSUE #3536] Why my ollama not use gpu? #64218

Closed
opened 2026-05-03 16:38:30 -05:00 by GiteaMirror · 9 comments
Owner

Originally created by @loks666 on GitHub (Apr 8, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/3536

My ollama is use windows installer setup running.There was a problem,when I watch my tsak manager,I noticed that my gpu was not being used. I want know that's why? or say I need run what command?
I wish someone could solve my problem,thanks
image

Originally created by @loks666 on GitHub (Apr 8, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/3536 My ollama is use windows installer setup running.There was a problem,when I watch my tsak manager,I noticed that my gpu was not being used. I want know that's why? or say I need run what command? I wish someone could solve my problem,thanks ![image](https://github.com/ollama/ollama/assets/33214035/1712abc8-33ba-441e-b6e4-ec7d6a34d662)
GiteaMirror added the question label 2026-05-03 16:38:30 -05:00
Author
Owner

@aosan commented on GitHub (Apr 8, 2024):

What model are you using? I can see your memory is at 95%. GPU is fully utilised by models fitting in VRAM, models using under 11 GB would fit in your 2080Ti VRAM.

GPU usage would show up when you make a request, e.g. ollama run mistral and make a request: "why is the sky blue?"

GPU load would appear while the model is providing the response.

<!-- gh-comment-id:2042243856 --> @aosan commented on GitHub (Apr 8, 2024): What model are you using? I can see your memory is at 95%. GPU is fully utilised by models fitting in VRAM, models using under 11 GB would fit in your 2080Ti VRAM. GPU usage would show up when you make a request, e.g. `ollama run mistral` and make a request: "why is the sky blue?" GPU load would appear while the model is providing the response.
Author
Owner

@loks666 commented on GitHub (Apr 8, 2024):

Are you saying that my computer will only use the gpu if I run a model that is smaller than my gpu memory?

<!-- gh-comment-id:2042248843 --> @loks666 commented on GitHub (Apr 8, 2024): Are you saying that my computer will only use the gpu if I run a model that is smaller than my gpu memory?
Author
Owner

@aosan commented on GitHub (Apr 8, 2024):

Depends on the model, some models offload layers to GPU even if they can't fit in VRAM and are running on CPU.

For maximum performance, any model fitting in VRAM will benefit from GPU processing.

<!-- gh-comment-id:2042407607 --> @aosan commented on GitHub (Apr 8, 2024): Depends on the model, some models offload layers to GPU even if they can't fit in VRAM and are running on CPU. For maximum performance, any model fitting in VRAM will benefit from GPU processing.
Author
Owner

@loks666 commented on GitHub (Apr 11, 2024):

So when I install ollama using Windows operating system programs, the default is to use the gpu to run large models? I don't need to do anything extra?

<!-- gh-comment-id:2050028784 --> @loks666 commented on GitHub (Apr 11, 2024): So when I install ollama using Windows operating system programs, the default is to use the gpu to run large models? I don't need to do anything extra?
Author
Owner

@pdevine commented on GitHub (Apr 12, 2024):

Hi @loks666, yes, it will try to offload as many "layers" of the model onto your GPU as possible. If the model is bigger than the memory of your GPU, those extra layers will be run on the CPU instead. You can try running smaller models, or use a more quantized version of the same model.

I'm going to go ahead and close the issue, but feel free to keep commenting.

<!-- gh-comment-id:2052531436 --> @pdevine commented on GitHub (Apr 12, 2024): Hi @loks666, yes, it will try to offload as many "layers" of the model onto your GPU as possible. If the model is bigger than the memory of your GPU, those extra layers will be run on the CPU instead. You can try running smaller models, or use a more quantized version of the same model. I'm going to go ahead and close the issue, but feel free to keep commenting.
Author
Owner

@highbrow-228 commented on GitHub (Mar 17, 2025):

It doesn't help, at least in my case. I have a video card with a 20GB GPU. Cuda, nvidia toolkit are installed. However, the models do not use the GPU at all😩. I have tried llama3.2 as the small model that will fit and the big one (llama3.3).

<!-- gh-comment-id:2728760284 --> @highbrow-228 commented on GitHub (Mar 17, 2025): It doesn't help, at least in my case. I have a video card with a 20GB GPU. Cuda, nvidia toolkit are installed. However, the models do not use the GPU at all😩. I have tried llama3.2 as the small model that will fit and the big one (llama3.3).
Author
Owner

@Mohamed0Hegazi commented on GitHub (Mar 17, 2025):

بايثون اين للكتاب

<!-- gh-comment-id:2728783646 --> @Mohamed0Hegazi commented on GitHub (Mar 17, 2025): بايثون اين للكتاب
Author
Owner

@Mohamed0Hegazi commented on GitHub (Mar 17, 2025):

هل انتهيت من الكتاب
#بايثون
#الما

<!-- gh-comment-id:2728788074 --> @Mohamed0Hegazi commented on GitHub (Mar 17, 2025): هل انتهيت من الكتاب #بايثون #الما
Author
Owner

@Mohamed0Hegazi commented on GitHub (Mar 17, 2025):

لقد طلبت من بايثون صناعه الكتاب الالكتروني تنميه الذات

<!-- gh-comment-id:2728835977 --> @Mohamed0Hegazi commented on GitHub (Mar 17, 2025): لقد طلبت من بايثون صناعه الكتاب الالكتروني تنميه الذات
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#64218