[GH-ISSUE #7472] GPU on WIndows #4750

Closed
opened 2026-04-12 15:41:40 -05:00 by GiteaMirror · 4 comments
Owner

Originally created by @godlatro on GitHub (Nov 2, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/7472

What is the issue?

I have latest Ollama desktop
Nvidia 3060
Windows 10

Try to use any models, CPU/GPU loading ~70%/20%
I load many models one by one.
I unload the extra ones with ollama stop model

Almost all models work terribly slowly. In 90% of cases, the models can't even finish writing their answer and are interrupted every time when using the web interface open-webui on docker.
But when I running Ubuntu 24 on this computer, it shows 100% loading my GPU and all models work perfectly fast.
How can I make Windows also use 100% GPU like in ubuntu?

OS

Windows

GPU

Nvidia

CPU

Intel

Ollama version

0.1.32

Originally created by @godlatro on GitHub (Nov 2, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/7472 ### What is the issue? I have latest Ollama desktop Nvidia 3060 Windows 10 Try to use any models, CPU/GPU loading ~70%/20% I load many models one by one. I unload the extra ones with ```ollama stop model``` Almost all models work terribly slowly. In 90% of cases, the models can't even finish writing their answer and are interrupted every time when using the web interface open-webui on docker. But when I running Ubuntu 24 on this computer, it shows 100% loading my GPU and all models work perfectly fast. How can I make Windows also use 100% GPU like in ubuntu? ### OS Windows ### GPU Nvidia ### CPU Intel ### Ollama version 0.1.32
GiteaMirror added the bug label 2026-04-12 15:41:40 -05:00
Author
Owner

@rick-github commented on GitHub (Nov 2, 2024):

Other applications in Windows may be using the GPU, leaving less room for ollama. Server logs may help with diagnosis, along with the output of nvidia-smi.

<!-- gh-comment-id:2453218890 --> @rick-github commented on GitHub (Nov 2, 2024): Other applications in Windows may be using the GPU, leaving less room for ollama. [Server logs](https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md#how-to-troubleshoot-issues) may help with diagnosis, along with the output of `nvidia-smi`.
Author
Owner

@godlatro commented on GitHub (Nov 3, 2024):

It seems that this is not a problem of ollama. It is open-webui that loads models differently than ollama. In the console, ollama loads the model in the GPU

<!-- gh-comment-id:2453566616 --> @godlatro commented on GitHub (Nov 3, 2024): It seems that this is not a problem of ollama. It is open-webui that loads models differently than ollama. In the console, ollama loads the model in the GPU
Author
Owner

@godlatro commented on GitHub (Nov 3, 2024):

ollama for windows loads GPU 100% in terminal like in linux, but some models like starcoder2 not work or say nothing useful

<!-- gh-comment-id:2453567116 --> @godlatro commented on GitHub (Nov 3, 2024): ollama for windows loads GPU 100% in terminal like in linux, but some models like starcoder2 not work or say nothing useful
Author
Owner

@rick-github commented on GitHub (Nov 3, 2024):

open-webui is probably using a bigger context window than the default one of 2048 tokens.

starcoder2 is generally not a chat model, it's used for FIM (fill in the middle) for completing code. starcoder2:instruct has a chat mode, but only about code.

<!-- gh-comment-id:2453580531 --> @rick-github commented on GitHub (Nov 3, 2024): open-webui is probably using a bigger context window than the default one of 2048 tokens. starcoder2 is generally not a chat model, it's used for FIM (fill in the middle) for completing code. starcoder2:instruct has a chat mode, but only about code.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#4750