[PR #2264] [CLOSED] Add support for MIG mode detection and use #73134

Closed
opened 2026-05-05 04:48:36 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/2264
Author: @waTeim
Created: 1/30/2024
Status: Closed

Base: mainHead: main


📝 Commits (2)

  • 310ed37 Add support for MIG mode detection and using the MIG device
  • e97dac2 Map all cuda devices with their MIG instances, refactor, add some comments

📊 Changes

2 files changed (+259 additions, -81 deletions)

View changed files

📝 gpu/gpu_info_cuda.c (+247 -81)
📝 gpu/gpu_info_cuda.h (+12 -0)

📄 Description

The issue here is that when the startup code checks for the capabilities of the GPU so it can allocate resources (in particular memory), it mistakenly uses the host GPU for its check rather than the MIG instance. This PR modifies the algorithm of cuda GPU detection. Essentially for each host GPU, check it that GPU supports MIG and if MIG is enabled, and if yes then iterate over all MIG instances. This results in a deviceMAP

typedef struct {
  unsigned numDevices;
  nvmlDevice_t **layout;
} deviceMap_t;

Later, that map can be iterated over. layout[i][0] is a pointer to the ith host GPU. layout[i][j + 1] will is the jth MIG instance of host GPU i. A value of (void*)0 marks the end of the MIG instance list. There can only be 7 total MIG instances per host GPU, so the size of the pointer array for each host is set to 9. Both cuda_check_vram and cuda_compute_capability were updated to use this new data structure.

MIG-related API calls were added to enable this see multi GPU management for details

Addresses #1500


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/2264 **Author:** [@waTeim](https://github.com/waTeim) **Created:** 1/30/2024 **Status:** ❌ Closed **Base:** `main` ← **Head:** `main` --- ### 📝 Commits (2) - [`310ed37`](https://github.com/ollama/ollama/commit/310ed371c4207ba10aec21c2b8f521911e20ee99) Add support for MIG mode detection and using the MIG device - [`e97dac2`](https://github.com/ollama/ollama/commit/e97dac2ef3614893a2b6447a829fc1695807c771) Map all cuda devices with their MIG instances, refactor, add some comments ### 📊 Changes **2 files changed** (+259 additions, -81 deletions) <details> <summary>View changed files</summary> 📝 `gpu/gpu_info_cuda.c` (+247 -81) 📝 `gpu/gpu_info_cuda.h` (+12 -0) </details> ### 📄 Description The issue here is that when the startup code checks for the capabilities of the GPU so it can allocate resources (in particular memory), it mistakenly uses the host GPU for its check rather than the MIG instance. This PR modifies the algorithm of cuda GPU detection. Essentially for each host GPU, check it that GPU supports MIG and if MIG is enabled, and if yes then iterate over all MIG instances. This results in a deviceMAP typedef struct { unsigned numDevices; nvmlDevice_t **layout; } deviceMap_t; Later, that map can be iterated over. `layout[i][0]` is a pointer to the ith host GPU. layout[i][j + 1] will is the jth MIG instance of host GPU **i**. A value of `(void*)0` marks the end of the MIG instance list. There can only be 7 total MIG instances per host GPU, so the size of the pointer array for each host is set to 9. Both `cuda_check_vram` and `cuda_compute_capability` were updated to use this new data structure. MIG-related API calls were added to enable this see [multi GPU management](https://docs.nvidia.com/deploy/archive/R520/nvml-api/group__nvmlMultiInstanceGPU.html) for details Addresses #1500 --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-05-05 04:48:36 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#73134