[GH-ISSUE #9167] ollama start log output "detected OS VRAM overhead" #68023

Closed
opened 2026-05-04 12:16:34 -05:00 by GiteaMirror · 1 comment
Owner

Originally created by @chinafuxi on GitHub (Feb 17, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/9167

What is the issue?

my compute hava 128G RAM and 22GB VRAM, but when i run 32b model,the log output this log
time=2025-02-17T16:52:12.068+08:00 level=INFO source=gpu.go:319 msg="detected OS VRAM overhead" id=GPU-66fb2fe2-3f29-15f8-22fe-2670176a5d13 library=cuda compute=7.5 driver=12.8 name="NVIDIA GeForce RTX 2080 Ti" overhead="3.2 GiB"

then run 32b model , ollama can be used 18.3GB VRAM, should be use 20.2GB VRAM,system occupies 3.2GB,some models are load to memory,RAM uesd 2.1GB

time=2025-02-17T16:52:12.221+08:00 level=INFO source=memory.go:356 msg="offload to cuda" layers.requested=-1 layers.model=65 layers.offload=59 layers.split="" memory.available="[18.3 GiB]" memory.gpu_overhead="0 B" memory.required.full="20.2 GiB"

Image

my os version info
OS Name: Microsoft Windows 10 Pro
OS Version: 10.0.18363 N/A Build 18363

this is stop allama VRAM usage

Image

Relevant log output


OS

Windows

GPU

Nvidia

CPU

Intel

Ollama version

0.5.9

Originally created by @chinafuxi on GitHub (Feb 17, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/9167 ### What is the issue? my compute hava 128G RAM and 22GB VRAM, but when i run 32b model,the log output this log time=2025-02-17T16:52:12.068+08:00 level=INFO source=gpu.go:319 msg="detected OS VRAM overhead" id=GPU-66fb2fe2-3f29-15f8-22fe-2670176a5d13 library=cuda compute=7.5 driver=12.8 name="NVIDIA GeForce RTX 2080 Ti" overhead="3.2 GiB" then run 32b model , ollama can be used 18.3GB VRAM, should be use 20.2GB VRAM,system occupies 3.2GB,some models are load to memory,RAM uesd 2.1GB time=2025-02-17T16:52:12.221+08:00 level=INFO source=memory.go:356 msg="offload to cuda" layers.requested=-1 layers.model=65 layers.offload=59 layers.split="" memory.available="[18.3 GiB]" memory.gpu_overhead="0 B" memory.required.full="20.2 GiB" ![Image](https://github.com/user-attachments/assets/32d842f1-b9e2-4311-8824-8996c9d2695c) my os version info OS Name: Microsoft Windows 10 Pro OS Version: 10.0.18363 N/A Build 18363 this is stop allama VRAM usage ![Image](https://github.com/user-attachments/assets/fcfa3f20-6558-4e02-b07c-f642e857f59f) ### Relevant log output ```shell ``` ### OS Windows ### GPU Nvidia ### CPU Intel ### Ollama version 0.5.9
GiteaMirror added the bug label 2026-05-04 12:16:34 -05:00
Author
Owner

@chinafuxi commented on GitHub (Feb 19, 2025):

i find the problem in microsoft forum, this is win10 WDDM problem , os about 15%VRAM isreserved for os

https://answers.microsoft.com/en-us/windows/forum/all/windows-10-does-not-let-cuda-applications-to-use/cffb3fcd-5a21-46cf-8123-aa53bb8bafd6

<!-- gh-comment-id:2667259756 --> @chinafuxi commented on GitHub (Feb 19, 2025): i find the problem in microsoft forum, this is win10 WDDM problem , os about 15%VRAM isreserved for os https://answers.microsoft.com/en-us/windows/forum/all/windows-10-does-not-let-cuda-applications-to-use/cffb3fcd-5a21-46cf-8123-aa53bb8bafd6
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#68023