[PR #12490] [MERGED] Workaround broken NVIDIA iGPU free VRAM data #13843

Closed
opened 2026-04-13 00:38:23 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/12490
Author: @dhiltgen
Created: 10/3/2025
Status: Merged
Merged: 10/3/2025
Merged by: @dhiltgen

Base: mainHead: nvidia_igpu


📝 Commits (1)

  • 637aa50 Workaround broken NVIDIA iGPU free VRAM data

📊 Changes

1 file changed (+32 additions, -0 deletions)

View changed files

📝 discover/runner.go (+32 -0)

📄 Description

The CUDA APIs for reporting free VRAM are useless on NVIDIA iGPU systems as they only return the kernels actual free memory and ignore buff/cache allocations which on a typical system will quickly fill up most of the free system memory. As a result, we incorrectly think there's very little available for GPU allocations which is wrong.

Without this change, on a "warm" system: (where the kernel has had time to fill up the filesystem cache)

time=2025-10-03T16:52:19.742Z level=INFO source=types.go:111 msg="inference compute" id=GPU-190c986e-21ac-c241-342f-fc9235808ccd library=CUDA compute=12.1 name=CUDA0 description="NVIDIA GB10" libdirs=ollama,cuda_v13 driver=13.0 pci_id=01:00.f type=iGPU total="119.7 GiB" available="2.9 GiB"

With this change:

time=2025-10-03T18:23:24.889Z level=INFO source=types.go:111 msg="inference compute" id=GPU-190c986e-21ac-c241-342f-fc9235808ccd library=CUDA compute=12.1 name=CUDA0 description="NVIDIA GB10" libdirs=ollama,cuda_v13 driver=13.0 pci_id=01:00.f type=iGPU total="119.7 GiB" available="115.8 GiB"

For comparison:

% free -h
               total        used        free      shared  buff/cache   available
Mem:           119Gi       3.5Gi       4.6Gi       2.1Mi       112Gi       116Gi
Swap:             0B          0B          0B

🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/12490 **Author:** [@dhiltgen](https://github.com/dhiltgen) **Created:** 10/3/2025 **Status:** ✅ Merged **Merged:** 10/3/2025 **Merged by:** [@dhiltgen](https://github.com/dhiltgen) **Base:** `main` ← **Head:** `nvidia_igpu` --- ### 📝 Commits (1) - [`637aa50`](https://github.com/ollama/ollama/commit/637aa5048ab7fa8d093fd98bda6d5f0f79c25ad8) Workaround broken NVIDIA iGPU free VRAM data ### 📊 Changes **1 file changed** (+32 additions, -0 deletions) <details> <summary>View changed files</summary> 📝 `discover/runner.go` (+32 -0) </details> ### 📄 Description The CUDA APIs for reporting free VRAM are useless on NVIDIA iGPU systems as they only return the kernels actual free memory and ignore buff/cache allocations which on a typical system will quickly fill up most of the free system memory. As a result, we incorrectly think there's very little available for GPU allocations which is wrong. Without this change, on a "warm" system: (where the kernel has had time to fill up the filesystem cache) ``` time=2025-10-03T16:52:19.742Z level=INFO source=types.go:111 msg="inference compute" id=GPU-190c986e-21ac-c241-342f-fc9235808ccd library=CUDA compute=12.1 name=CUDA0 description="NVIDIA GB10" libdirs=ollama,cuda_v13 driver=13.0 pci_id=01:00.f type=iGPU total="119.7 GiB" available="2.9 GiB" ``` With this change: ``` time=2025-10-03T18:23:24.889Z level=INFO source=types.go:111 msg="inference compute" id=GPU-190c986e-21ac-c241-342f-fc9235808ccd library=CUDA compute=12.1 name=CUDA0 description="NVIDIA GB10" libdirs=ollama,cuda_v13 driver=13.0 pci_id=01:00.f type=iGPU total="119.7 GiB" available="115.8 GiB" ``` For comparison: ``` % free -h total used free shared buff/cache available Mem: 119Gi 3.5Gi 4.6Gi 2.1Mi 112Gi 116Gi Swap: 0B 0B 0B ``` --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-13 00:38:23 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#13843