[PR #4517] [MERGED] Enhanced GPU discovery and multi-gpu support with concurrency #37385

Closed
opened 2026-04-22 22:05:54 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/4517
Author: @dhiltgen
Created: 5/18/2024
Status: Merged
Merged: 6/14/2024
Merged by: @dhiltgen

Base: mainHead: gpu_incremental


📝 Commits (10+)

  • efac488 Revert "Limit GPU lib search for now (#4777)"
  • fb9cdfa Fix server.cpp for the new cuda build macros
  • b32ebb4 Use DRM driver for VRAM info for amd
  • 43ed358 Refine GPU discovery to bootstrap once
  • 206797b Fix concurrency integration test to work locally
  • 6fd04ca Improve multi-gpu handling at the limit
  • 5e8ff55 Support forced spreading for multi GPU
  • 68dfc62 refined test timing
  • 48702dd Harden unload for empty runners
  • 4e2b7e1 Refactor intel gpu discovery

📊 Changes

31 files changed (+1814 additions, -692 deletions)

View changed files

📝 envconfig/config.go (+12 -0)
📝 gpu/amd_linux.go (+131 -75)
📝 gpu/amd_windows.go (+47 -16)
📝 gpu/cpu_common.go (+4 -9)
📝 gpu/gpu.go (+334 -160)
📝 gpu/gpu_darwin.go (+12 -1)
📝 gpu/gpu_info.h (+2 -0)
gpu/gpu_info_cpu.c (+0 -45)
📝 gpu/gpu_info_cudart.c (+3 -1)
📝 gpu/gpu_info_cudart.h (+2 -1)
📝 gpu/gpu_info_nvcuda.c (+38 -3)
📝 gpu/gpu_info_nvcuda.h (+2 -1)
gpu/gpu_info_nvml.c (+104 -0)
gpu/gpu_info_nvml.h (+48 -0)
📝 gpu/gpu_info_oneapi.c (+166 -123)
📝 gpu/gpu_info_oneapi.h (+34 -42)
gpu/gpu_linux.go (+89 -0)
gpu/gpu_windows.go (+55 -0)
📝 gpu/types.go (+50 -3)
📝 integration/concurrency_test.go (+59 -17)

...and 11 more files

📄 Description

Carries (and obsoletes if we move this one forward first) #4266 and #4441

This refines our GPU discovery to split it into bootstrapping where we discover information about the GPUs once at startup, and then incrementally refresh just free space information, instead of fully rediscovering the GPUs over and over.

Fixes #3158
Fixes #4198
Fixes #3765


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/4517 **Author:** [@dhiltgen](https://github.com/dhiltgen) **Created:** 5/18/2024 **Status:** ✅ Merged **Merged:** 6/14/2024 **Merged by:** [@dhiltgen](https://github.com/dhiltgen) **Base:** `main` ← **Head:** `gpu_incremental` --- ### 📝 Commits (10+) - [`efac488`](https://github.com/ollama/ollama/commit/efac488675bb4118b6dab2587e481e8640102fad) Revert "Limit GPU lib search for now (#4777)" - [`fb9cdfa`](https://github.com/ollama/ollama/commit/fb9cdfa72335993e454af1c0eee2235f2a7e88a4) Fix server.cpp for the new cuda build macros - [`b32ebb4`](https://github.com/ollama/ollama/commit/b32ebb4f2990817403484d50974077a5c52a4677) Use DRM driver for VRAM info for amd - [`43ed358`](https://github.com/ollama/ollama/commit/43ed358f9a894c92a24d06346ad81dc76aa52cfb) Refine GPU discovery to bootstrap once - [`206797b`](https://github.com/ollama/ollama/commit/206797bda4685288de9a775b1536b9cbde0a7246) Fix concurrency integration test to work locally - [`6fd04ca`](https://github.com/ollama/ollama/commit/6fd04ca922e5da7ef8c52d86118fc58b798a7e4a) Improve multi-gpu handling at the limit - [`5e8ff55`](https://github.com/ollama/ollama/commit/5e8ff556cb23d80e41cf5c018775b10a431e31ba) Support forced spreading for multi GPU - [`68dfc62`](https://github.com/ollama/ollama/commit/68dfc6236a320efae53f6ad01b79ff92906dc77b) refined test timing - [`48702dd`](https://github.com/ollama/ollama/commit/48702dd149360fbffc83043a3010ae8bc5246809) Harden unload for empty runners - [`4e2b7e1`](https://github.com/ollama/ollama/commit/4e2b7e181d069166134be5391974b7a49ca08890) Refactor intel gpu discovery ### 📊 Changes **31 files changed** (+1814 additions, -692 deletions) <details> <summary>View changed files</summary> 📝 `envconfig/config.go` (+12 -0) 📝 `gpu/amd_linux.go` (+131 -75) 📝 `gpu/amd_windows.go` (+47 -16) 📝 `gpu/cpu_common.go` (+4 -9) 📝 `gpu/gpu.go` (+334 -160) 📝 `gpu/gpu_darwin.go` (+12 -1) 📝 `gpu/gpu_info.h` (+2 -0) ➖ `gpu/gpu_info_cpu.c` (+0 -45) 📝 `gpu/gpu_info_cudart.c` (+3 -1) 📝 `gpu/gpu_info_cudart.h` (+2 -1) 📝 `gpu/gpu_info_nvcuda.c` (+38 -3) 📝 `gpu/gpu_info_nvcuda.h` (+2 -1) ➕ `gpu/gpu_info_nvml.c` (+104 -0) ➕ `gpu/gpu_info_nvml.h` (+48 -0) 📝 `gpu/gpu_info_oneapi.c` (+166 -123) 📝 `gpu/gpu_info_oneapi.h` (+34 -42) ➕ `gpu/gpu_linux.go` (+89 -0) ➕ `gpu/gpu_windows.go` (+55 -0) 📝 `gpu/types.go` (+50 -3) 📝 `integration/concurrency_test.go` (+59 -17) _...and 11 more files_ </details> ### 📄 Description Carries (and obsoletes if we move this one forward first) #4266 and #4441 This refines our GPU discovery to split it into bootstrapping where we discover information about the GPUs once at startup, and then incrementally refresh just free space information, instead of fully rediscovering the GPUs over and over. Fixes #3158 Fixes #4198 Fixes #3765 --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-22 22:05:54 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#37385