[GH-ISSUE #3486] ollama not using GPU in windows while all layers offloaded to gpu #2147

Closed
opened 2026-04-12 12:22:41 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @VSR2007 on GitHub (Apr 4, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/3486

Originally assigned to: @dhiltgen on GitHub.

What is the issue?

new 1.txt
I running ollama windows. I have nvidia rtx 2000 ada generation gpu with 8gb ram. It also have 20 cores cpu with 64gb ram. Ollama some how does not use gpu for inferencing. the GPU shoots up when given a prompt for a moment (<1 s) and then stays at 0/1 %. All this while it occupies only 4.5gb of gpu ram. I am using mistral 7b.

What did you expect to see?

better inference speed with full utilization of gpu especially when gpu ram is not limiting.

Steps to reproduce

Not sure

Are there any recent changes that introduced the issue?

No response

OS

Windows

Architecture

x86

Platform

No response

Ollama version

No response

GPU

Nvidia

GPU info

tt

CPU

Intel

Other software

No response

Originally created by @VSR2007 on GitHub (Apr 4, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/3486 Originally assigned to: @dhiltgen on GitHub. ### What is the issue? [new 1.txt](https://github.com/ollama/ollama/files/14865468/new.1.txt) I running ollama windows. I have nvidia rtx 2000 ada generation gpu with 8gb ram. It also have 20 cores cpu with 64gb ram. Ollama some how does not use gpu for inferencing. the GPU shoots up when given a prompt for a moment (<1 s) and then stays at 0/1 %. All this while it occupies only 4.5gb of gpu ram. I am using mistral 7b. ### What did you expect to see? better inference speed with full utilization of gpu especially when gpu ram is not limiting. ### Steps to reproduce Not sure ### Are there any recent changes that introduced the issue? _No response_ ### OS Windows ### Architecture x86 ### Platform _No response_ ### Ollama version _No response_ ### GPU Nvidia ### GPU info ![tt](https://github.com/ollama/ollama/assets/107546824/2a5aae48-65aa-4e93-a4a5-39fe879a4be2) ### CPU Intel ### Other software _No response_
GiteaMirror added the bugnvidiawindows labels 2026-04-12 12:22:41 -05:00
Author
Owner

@dhiltgen commented on GitHub (Apr 12, 2024):

The log you shared indicates it was loading into the GPU. What sort of token rate are you seeing?

I've seen other people report that if they stop the tray application, and run the server in a terminal manually they see much better performance. We still don't understand why, but that might be an experiment to try. Also, can you look at Task Manager when you're seeing the slow performance and let us know if you see anything notable? (e.g., some AV product chewing up a lot of CPU when you run inference, or something that might explain why it's being throttled.)

<!-- gh-comment-id:2052698900 --> @dhiltgen commented on GitHub (Apr 12, 2024): The log you shared indicates it was loading into the GPU. What sort of token rate are you seeing? I've seen other people report that if they stop the tray application, and run the server in a terminal manually they see much better performance. We still don't understand why, but that might be an experiment to try. Also, can you look at Task Manager when you're seeing the slow performance and let us know if you see anything notable? (e.g., some AV product chewing up a lot of CPU when you run inference, or something that might explain why it's being throttled.)
Author
Owner

@dhiltgen commented on GitHub (May 4, 2024):

If you're still having problems and it doesn't match #3511 please share more information and I'll re-open the ticket.

<!-- gh-comment-id:2094422124 --> @dhiltgen commented on GitHub (May 4, 2024): If you're still having problems and it doesn't match #3511 please share more information and I'll re-open the ticket.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#2147