[PR #7378] Expose ROCm P2P workaround as runtime config #74696

Open
opened 2026-05-05 06:56:46 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/7378
Author: @dhiltgen
Created: 10/26/2024
Status: 🔄 Open

Base: mainHead: runtime_p2p_workaround


📝 Commits (1)

  • c404f38 Expose ROCm P2P workaround as runtime config

📊 Changes

4 files changed (+86 additions, -10 deletions)

View changed files

📝 docs/gpu.md (+9 -1)
📝 envconfig/config.go (+2 -0)
📝 llama/ggml-cuda.cu (+9 -9)
llama/patches/0013-no-peer-copy-workaround-at-runtime.patch (+66 -0)

📄 Description

We have toggled GGML_CUDA_NO_PEER_COPY back and forth a few times over the past few months and ultimately it seems that with the workaround, systems with less physical RAM than VRAM crash, and without the workaround, some multi-GPU systems generate gibberish. With the current llama.cpp code, we can't ship a single binary and keep both sets of users happy.

Instead of compiling in support for the P2P workaround, make it available as a runtime configuration so we can ship a single binary that will work in both cases (small system memory and multi-GPU with buggy P2P copy)

Marking draft until I can answer a few questions:

  • I'm hoping we can find a simpler env var or setting in ROCm to avoid carrying this patch
  • Failing the above, if we can find a way to "know" that P2P will be buggy on a system, I'd prefer to enable automatically instead of requiring the user to set this manually, but I'm not sure if that will be possible or reliable, and this setting will have a negative performance impact if P2P is working properly

Fixes #6595
Fixes #5629
Fixes #7433
Fixes #7461


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/7378 **Author:** [@dhiltgen](https://github.com/dhiltgen) **Created:** 10/26/2024 **Status:** 🔄 Open **Base:** `main` ← **Head:** `runtime_p2p_workaround` --- ### 📝 Commits (1) - [`c404f38`](https://github.com/ollama/ollama/commit/c404f38a18409d3ed1a0dfd4a3d186d9fb027c6e) Expose ROCm P2P workaround as runtime config ### 📊 Changes **4 files changed** (+86 additions, -10 deletions) <details> <summary>View changed files</summary> 📝 `docs/gpu.md` (+9 -1) 📝 `envconfig/config.go` (+2 -0) 📝 `llama/ggml-cuda.cu` (+9 -9) ➕ `llama/patches/0013-no-peer-copy-workaround-at-runtime.patch` (+66 -0) </details> ### 📄 Description We have toggled GGML_CUDA_NO_PEER_COPY back and forth a few times over the past few months and ultimately it seems that **with the workaround**, systems with less physical RAM than VRAM crash, and **without the workaround**, some multi-GPU systems generate gibberish. With the current llama.cpp code, we can't ship a single binary and keep both sets of users happy. Instead of compiling in support for the P2P workaround, make it available as a runtime configuration so we can ship a single binary that will work in both cases (small system memory and multi-GPU with buggy P2P copy) Marking draft until I can answer a few questions: - I'm hoping we can find a simpler env var or setting in ROCm to avoid carrying this patch - Failing the above, if we can find a way to "know" that P2P will be buggy on a system, I'd prefer to enable automatically instead of requiring the user to set this manually, but I'm not sure if that will be possible or reliable, and this setting will have a negative performance impact if P2P is working properly Fixes #6595 Fixes #5629 Fixes #7433 Fixes #7461 --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-05-05 06:56:46 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#74696