[GH-ISSUE #1740] PowerInfer Enhancement #994

Closed
opened 2026-04-12 10:42:30 -05:00 by GiteaMirror · 1 comment
Owner

Originally created by @iplayfast on GitHub (Dec 29, 2023).
Original GitHub issue: https://github.com/ollama/ollama/issues/1740

I keep seeing posts about powerinfer https://github.com/SJTU-IPADS/PowerInfer which (if I understand it) keeps often used terms in gpu memory and seldom used terms in cpu memory. This results in an 11x speed up.

It looks like models need to be updated to use this, so it's a pain. BUT.... 11x speed up.

I wonder if the model could be updated automatically so after download it revises it, and stores it.

Anyways, just for interest sake. 11X speed up!!!

Originally created by @iplayfast on GitHub (Dec 29, 2023). Original GitHub issue: https://github.com/ollama/ollama/issues/1740 I keep seeing posts about powerinfer https://github.com/SJTU-IPADS/PowerInfer which (if I understand it) keeps often used terms in gpu memory and seldom used terms in cpu memory. This results in an 11x speed up. It looks like models need to be updated to use this, so it's a pain. BUT.... 11x speed up. I wonder if the model could be updated automatically so after download it revises it, and stores it. Anyways, just for interest sake. **11X speed up!!!**
Author
Owner

@iplayfast commented on GitHub (Dec 29, 2023):

I see that llama.cpp has already discussed this, and the consensuses is that it's

  1. cherry picked results, actual results will be 3-4x speedup.
  2. code is not complete and needs to be refined.
  3. They think it's interesting and will integrate some of the ideas once it's a bit more stable.

So I guess it's coming from llama.cpp in the distant future.

<!-- gh-comment-id:1872344178 --> @iplayfast commented on GitHub (Dec 29, 2023): I see that llama.cpp has already discussed this, and the consensuses is that it's 1. cherry picked results, actual results will be 3-4x speedup. 2. code is not complete and needs to be refined. 3. They think it's interesting and will integrate some of the ideas once it's a bit more stable. So I guess it's coming from llama.cpp in the distant future.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#994