[PR #3088] [MERGED] Fix iGPU detection for linux #9800

Closed
opened 2025-11-12 15:14:30 -06:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/3088
Author: @dhiltgen
Created: 3/13/2024
Status: Merged
Merged: 3/13/2024
Merged by: @dhiltgen

Base: mainHead: rocm_igpu_linux


📝 Commits (1)

  • 82b0c7c Fix iGPU detection for linux

📊 Changes

2 files changed (+28 additions, -14 deletions)

View changed files

📝 gpu/amd_common.go (+2 -4)
📝 gpu/amd_linux.go (+26 -10)

📄 Description

This fixes a few bugs in the new sysfs discovery logic. iGPUs are now correctly identified by their <1G VRAM reported. the sysfs IDs are off by one compared to what HIP wants due to the CPU being reported in amdgpu, but HIP only cares about GPUs.

Tested on a Ryzen 9 7900X system with an RX 7900 XTX. The amdgpu driver exposes 3 nodes, 0 is CPU, 1 is the discrete GPU, and 2 is the iGPU. This logic now correctly detects this system and sets the visible devices properly.

Example scenario 1:

% OLLAMA_DEBUG=1 ./ollama-linux-amd64 serve
..
time=2024-03-13T00:03:55.440Z level=DEBUG source=amd_linux.go:110 msg="rocm supported GPU types [gfx1030 gfx1100 gfx1101 gfx1102 gfx900 gfx906 gfx908 gfx90a gfx940 gfx941 gfx942]"
time=2024-03-13T00:03:55.440Z level=INFO source=amd_linux.go:119 msg="amdgpu [0] gfx1100 is supported"
time=2024-03-13T00:03:55.440Z level=WARN source=amd_linux.go:114 msg="amdgpu [1] gfx1036 is not supported by /tmp/ollama2436258655/rocm [gfx1030 gfx1100 gfx1101 gfx1102 gfx900 gfx906 gfx908 gfx90a gfx940 gfx941 gfx942]"
time=2024-03-13T00:03:55.440Z level=WARN source=amd_linux.go:116 msg="See https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md for HSA_OVERRIDE_GFX_VERSION usage"
time=2024-03-13T00:03:55.440Z level=DEBUG source=amd_linux.go:152 msg="discovering VRAM for amdgpu devices"
time=2024-03-13T00:03:55.440Z level=DEBUG source=amd_linux.go:171 msg="amdgpu devices [0 1]"
time=2024-03-13T00:03:55.441Z level=INFO source=amd_linux.go:246 msg="[0] amdgpu totalMemory 24560M"
time=2024-03-13T00:03:55.441Z level=INFO source=amd_linux.go:247 msg="[0] amdgpu freeMemory  24560M"
time=2024-03-13T00:03:55.441Z level=INFO source=amd_common.go:54 msg="Setting HIP_VISIBLE_DEVICES=0"
time=2024-03-13T00:03:55.441Z level=DEBUG source=gpu.go:180 msg="rocm detected 1 devices with 22104M available memory"

If I force the override to bypass the compatibility check, the new iGPU detection logic kicks in (both of these scenarios work)

% OLLAMA_DEBUG=1 HSA_OVERRIDE_GFX_VERSION=11.0.0 ./ollama-linux-amd64 serve
...
time=2024-03-13T00:04:59.799Z level=DEBUG source=amd_linux.go:123 msg="skipping rocm gfx compatibility check with HSA_OVERRIDE_GFX_VERSION=11.0.0"
time=2024-03-13T00:04:59.799Z level=DEBUG source=amd_linux.go:152 msg="discovering VRAM for amdgpu devices"
time=2024-03-13T00:04:59.799Z level=DEBUG source=amd_linux.go:171 msg="amdgpu devices [0 1]"
time=2024-03-13T00:04:59.799Z level=INFO source=amd_linux.go:246 msg="[0] amdgpu totalMemory 24560M"
time=2024-03-13T00:04:59.799Z level=INFO source=amd_linux.go:247 msg="[0] amdgpu freeMemory  24560M"
time=2024-03-13T00:04:59.799Z level=INFO source=amd_linux.go:217 msg="amdgpu [1] appears to be an iGPU with 512M reported total memory, skipping"
time=2024-03-13T00:04:59.799Z level=INFO source=amd_common.go:54 msg="Setting HIP_VISIBLE_DEVICES=0"
time=2024-03-13T00:04:59.799Z level=DEBUG source=gpu.go:180 msg="rocm detected 1 devices with 22104M available memory"

🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/3088 **Author:** [@dhiltgen](https://github.com/dhiltgen) **Created:** 3/13/2024 **Status:** ✅ Merged **Merged:** 3/13/2024 **Merged by:** [@dhiltgen](https://github.com/dhiltgen) **Base:** `main` ← **Head:** `rocm_igpu_linux` --- ### 📝 Commits (1) - [`82b0c7c`](https://github.com/ollama/ollama/commit/82b0c7c27e50c17b6fa5418caf56d32da28a9049) Fix iGPU detection for linux ### 📊 Changes **2 files changed** (+28 additions, -14 deletions) <details> <summary>View changed files</summary> 📝 `gpu/amd_common.go` (+2 -4) 📝 `gpu/amd_linux.go` (+26 -10) </details> ### 📄 Description This fixes a few bugs in the new sysfs discovery logic. iGPUs are now correctly identified by their <1G VRAM reported. the sysfs IDs are off by one compared to what HIP wants due to the CPU being reported in amdgpu, but HIP only cares about GPUs. Tested on a Ryzen 9 7900X system with an RX 7900 XTX. The amdgpu driver exposes 3 nodes, 0 is CPU, 1 is the discrete GPU, and 2 is the iGPU. This logic now correctly detects this system and sets the visible devices properly. Example scenario 1: ``` % OLLAMA_DEBUG=1 ./ollama-linux-amd64 serve .. time=2024-03-13T00:03:55.440Z level=DEBUG source=amd_linux.go:110 msg="rocm supported GPU types [gfx1030 gfx1100 gfx1101 gfx1102 gfx900 gfx906 gfx908 gfx90a gfx940 gfx941 gfx942]" time=2024-03-13T00:03:55.440Z level=INFO source=amd_linux.go:119 msg="amdgpu [0] gfx1100 is supported" time=2024-03-13T00:03:55.440Z level=WARN source=amd_linux.go:114 msg="amdgpu [1] gfx1036 is not supported by /tmp/ollama2436258655/rocm [gfx1030 gfx1100 gfx1101 gfx1102 gfx900 gfx906 gfx908 gfx90a gfx940 gfx941 gfx942]" time=2024-03-13T00:03:55.440Z level=WARN source=amd_linux.go:116 msg="See https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md for HSA_OVERRIDE_GFX_VERSION usage" time=2024-03-13T00:03:55.440Z level=DEBUG source=amd_linux.go:152 msg="discovering VRAM for amdgpu devices" time=2024-03-13T00:03:55.440Z level=DEBUG source=amd_linux.go:171 msg="amdgpu devices [0 1]" time=2024-03-13T00:03:55.441Z level=INFO source=amd_linux.go:246 msg="[0] amdgpu totalMemory 24560M" time=2024-03-13T00:03:55.441Z level=INFO source=amd_linux.go:247 msg="[0] amdgpu freeMemory 24560M" time=2024-03-13T00:03:55.441Z level=INFO source=amd_common.go:54 msg="Setting HIP_VISIBLE_DEVICES=0" time=2024-03-13T00:03:55.441Z level=DEBUG source=gpu.go:180 msg="rocm detected 1 devices with 22104M available memory" ``` If I force the override to bypass the compatibility check, the new iGPU detection logic kicks in (both of these scenarios work) ``` % OLLAMA_DEBUG=1 HSA_OVERRIDE_GFX_VERSION=11.0.0 ./ollama-linux-amd64 serve ... time=2024-03-13T00:04:59.799Z level=DEBUG source=amd_linux.go:123 msg="skipping rocm gfx compatibility check with HSA_OVERRIDE_GFX_VERSION=11.0.0" time=2024-03-13T00:04:59.799Z level=DEBUG source=amd_linux.go:152 msg="discovering VRAM for amdgpu devices" time=2024-03-13T00:04:59.799Z level=DEBUG source=amd_linux.go:171 msg="amdgpu devices [0 1]" time=2024-03-13T00:04:59.799Z level=INFO source=amd_linux.go:246 msg="[0] amdgpu totalMemory 24560M" time=2024-03-13T00:04:59.799Z level=INFO source=amd_linux.go:247 msg="[0] amdgpu freeMemory 24560M" time=2024-03-13T00:04:59.799Z level=INFO source=amd_linux.go:217 msg="amdgpu [1] appears to be an iGPU with 512M reported total memory, skipping" time=2024-03-13T00:04:59.799Z level=INFO source=amd_common.go:54 msg="Setting HIP_VISIBLE_DEVICES=0" time=2024-03-13T00:04:59.799Z level=DEBUG source=gpu.go:180 msg="rocm detected 1 devices with 22104M available memory" ``` --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the
pull-request
label 2025-11-12 15:14:30 -06:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama-ollama#9800
No description provided.