[PR #12619] [MERGED] ml/backend/ggml: NVML fallback for unified memory GPUs #13889

Closed
opened 2026-04-13 00:39:31 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/12619
Author: @sbhavani
Created: 10/14/2025
Status: Merged
Merged: 10/15/2025
Merged by: @dhiltgen

Base: mainHead: nvml-unified-memory-fallback


📝 Commits (2)

  • 08a3f9c Simplify NVML fallback for unified memory GPUs
  • 0486a10 Add NVML fallback patch for unified memory GPUs

📊 Changes

2 files changed (+205 additions, -3 deletions)

View changed files

llama/patches/0029-NVML-fallback-for-unified-memory-GPUs.patch (+137 -0)
📝 ml/backend/ggml/ggml/src/mem_nvml.cpp (+68 -3)

📄 Description

NVML fallback logic for unified memory GPUs (e.g. NVIDIA GB10). When NVML returns NVML_ERROR_NOT_SUPPORTED for memory queries, the code now falls back to /proc/meminfo.

Why

Using /proc/meminfo's MemAvailable avoids the underreporting issue on unified memory systems described in https://nvidia.custhelp.com/app/answers/detail/a_id/5728

Testing

Tested on NVIDIA GB10 (unified memory GPU):

  • Multiple model load/unload cycles
  • Memory reporting accurate and consistent:
    • Total: 119.6 GiB
    • Available: ~115.1 GiB
    • Model VRAM usage: 1.2 GiB
  • Confirmed memory values match /proc/meminfo calculations

🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/12619 **Author:** [@sbhavani](https://github.com/sbhavani) **Created:** 10/14/2025 **Status:** ✅ Merged **Merged:** 10/15/2025 **Merged by:** [@dhiltgen](https://github.com/dhiltgen) **Base:** `main` ← **Head:** `nvml-unified-memory-fallback` --- ### 📝 Commits (2) - [`08a3f9c`](https://github.com/ollama/ollama/commit/08a3f9ca536b665aec22421f80fc39c25dbad664) Simplify NVML fallback for unified memory GPUs - [`0486a10`](https://github.com/ollama/ollama/commit/0486a109a06b8cfc0396cb4e016dc6a7c0481ed8) Add NVML fallback patch for unified memory GPUs ### 📊 Changes **2 files changed** (+205 additions, -3 deletions) <details> <summary>View changed files</summary> ➕ `llama/patches/0029-NVML-fallback-for-unified-memory-GPUs.patch` (+137 -0) 📝 `ml/backend/ggml/ggml/src/mem_nvml.cpp` (+68 -3) </details> ### 📄 Description NVML fallback logic for unified memory GPUs (e.g. NVIDIA GB10). When NVML returns `NVML_ERROR_NOT_SUPPORTED` for memory queries, the code now falls back to `/proc/meminfo`. **Why** Using /proc/meminfo's MemAvailable avoids the underreporting issue on unified memory systems described in https://nvidia.custhelp.com/app/answers/detail/a_id/5728 **Testing** Tested on NVIDIA GB10 (unified memory GPU): - Multiple model load/unload cycles - Memory reporting accurate and consistent: - Total: 119.6 GiB - Available: ~115.1 GiB - Model VRAM usage: 1.2 GiB - Confirmed memory values match /proc/meminfo calculations --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-13 00:39:31 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#13889