[GH-ISSUE #8635] Use of System Ram over RDMA in GPU to allow for GPU acceleration on lower VRAM hardware. #67648

Open
opened 2026-05-04 11:12:28 -05:00 by GiteaMirror · 0 comments
Owner

Originally created by @SlinkierElm5611 on GitHub (Jan 28, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/8635

Hi all!

I'm a GPU dev who has been messing around with Ollama for some self hosting. I was wondering if there is any reason Ollama has not been able to take advantage of GPU acceleration while using system RAM through RDMA(reBar). I have done system ram access through RDMA on GPU for real time processing and have had better results than CPU side tasks despite the increase in data latency when going over PCIE.

I look forward to hearing from you!

Originally created by @SlinkierElm5611 on GitHub (Jan 28, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/8635 Hi all! I'm a GPU dev who has been messing around with Ollama for some self hosting. I was wondering if there is any reason Ollama has not been able to take advantage of GPU acceleration while using system RAM through RDMA(reBar). I have done system ram access through RDMA on GPU for real time processing and have had better results than CPU side tasks despite the increase in data latency when going over PCIE. I look forward to hearing from you!
GiteaMirror added the feature request label 2026-05-04 11:12:28 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#67648