[PR #15385] cuda: restore integrated GPU detection from device properties #25675

Open
opened 2026-04-19 18:20:45 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/15385
Author: @zidad
Created: 4/7/2026
Status: 🔄 Open

Base: mainHead: fix/hip-igpu-integrated-detection


📝 Commits (1)

  • cff7154 cuda: restore integrated GPU detection from device properties

📊 Changes

1 file changed (+1 additions, -1 deletions)

View changed files

📝 ml/backend/ggml/ggml/src/ggml-cuda/ggml-cuda.cu (+1 -1)

📄 Description

Summary

Re-enable reading prop.integrated from the CUDA/HIP runtime instead of hardcoding it to false.

This one-line change was disabled with a comment referencing #15034, which is actually about macOS window management — not GPU output corruption — suggesting the disable was either precautionary or referenced the wrong issue.

What this fixes

With integrated=false hardcoded, multiple downstream code paths that already exist and work correctly are never activated:

  • mem_hip.cpp: GTT memory detection — reads mem_info_gtt_total when is_integrated_gpu=true, adding shared system RAM to available GPU memory
  • ggml-cuda.cu:4468: UMA allocation path — uses system memory info instead of cudaMemGetInfo for unified memory systems
  • ggml-cuda.cu:5133: already reads prop.integrated correctly for the device context, but the info struct override at line 300 prevents the scheduler from seeing it

This affects all AMD APU/iGPU users (Phoenix gfx1103, Hawk Point gfx1151, Strix gfx1200/1201, etc.) where VRAM is typically 512MB–2GB but GTT (shared system RAM) provides 16–100+ GB of GPU-accessible memory. Without this fix, ollama sees only the small VRAM, decides no model fits, and falls back to 100% CPU inference.

Test results

Tested on AMD Phoenix3 (gfx1103) with 2GB dedicated VRAM + 50GB GTT (amdgpu.gttsize=51200):

Before After
total_memory 2.0 GiB 52 GiB
integrated false true
Layers offloaded 0/65 (100% CPU) 65/65 (100% GPU)

The /info endpoint correctly reports:

{"id":"0", "backend":"ROCm", "integration":true, "description":"AMD Radeon 780M Graphics",
 "total_memory":55834574848, "free_memory":55743823872, "ComputeMajor":17, "ComputeMinor":3}

Change

-info.devices[id].integrated = false; // Temporarily disabled due to issues with corrupted output (e.g. #15034)
+info.devices[id].integrated = prop.integrated;

Fixes #5471, #6362, #12062, #12342, #13173, #13419


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/15385 **Author:** [@zidad](https://github.com/zidad) **Created:** 4/7/2026 **Status:** 🔄 Open **Base:** `main` ← **Head:** `fix/hip-igpu-integrated-detection` --- ### 📝 Commits (1) - [`cff7154`](https://github.com/ollama/ollama/commit/cff7154b83b0fdfd2cb7b2e4c01b0ee8eb2b503c) cuda: restore integrated GPU detection from device properties ### 📊 Changes **1 file changed** (+1 additions, -1 deletions) <details> <summary>View changed files</summary> 📝 `ml/backend/ggml/ggml/src/ggml-cuda/ggml-cuda.cu` (+1 -1) </details> ### 📄 Description ## Summary Re-enable reading `prop.integrated` from the CUDA/HIP runtime instead of hardcoding it to `false`. This one-line change was disabled with a comment referencing #15034, which is actually about macOS window management — not GPU output corruption — suggesting the disable was either precautionary or referenced the wrong issue. ## What this fixes With `integrated=false` hardcoded, multiple downstream code paths that already exist and work correctly are never activated: - **`mem_hip.cpp`**: GTT memory detection — reads `mem_info_gtt_total` when `is_integrated_gpu=true`, adding shared system RAM to available GPU memory - **`ggml-cuda.cu:4468`**: UMA allocation path — uses system memory info instead of `cudaMemGetInfo` for unified memory systems - **`ggml-cuda.cu:5133`**: already reads `prop.integrated` correctly for the device context, but the `info` struct override at line 300 prevents the scheduler from seeing it This affects all AMD APU/iGPU users (Phoenix gfx1103, Hawk Point gfx1151, Strix gfx1200/1201, etc.) where VRAM is typically 512MB–2GB but GTT (shared system RAM) provides 16–100+ GB of GPU-accessible memory. Without this fix, ollama sees only the small VRAM, decides no model fits, and falls back to 100% CPU inference. ## Test results Tested on AMD Phoenix3 (gfx1103) with 2GB dedicated VRAM + 50GB GTT (`amdgpu.gttsize=51200`): | | Before | After | |---|---|---| | `total_memory` | 2.0 GiB | 52 GiB | | `integrated` | `false` | `true` | | Layers offloaded | 0/65 (100% CPU) | 65/65 (100% GPU) | The `/info` endpoint correctly reports: ```json {"id":"0", "backend":"ROCm", "integration":true, "description":"AMD Radeon 780M Graphics", "total_memory":55834574848, "free_memory":55743823872, "ComputeMajor":17, "ComputeMinor":3} ``` ## Change ```diff -info.devices[id].integrated = false; // Temporarily disabled due to issues with corrupted output (e.g. #15034) +info.devices[id].integrated = prop.integrated; ``` Fixes #5471, #6362, #12062, #12342, #13173, #13419 --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-19 18:20:45 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#25675