[GH-ISSUE #8306] Improve speed on cpu-only #51830

Closed
opened 2026-04-28 21:02:20 -05:00 by GiteaMirror · 1 comment
Owner

Originally created by @ErfolgreichCharismatisch on GitHub (Jan 4, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/8306

What is the issue?

Llamafile is much faster on cpu than ollama, what takes ollama 33 minutes takes llamafile 3 minutes with the same model.
llamafile crashes unfortunately after reusing it and spins its wheels staying at 100% CPU for hours.

I'd rather use a stable ollama, but you must work on speed on CPU

OS

Linux

CPU

Intel

Other resources

See also this post from this source:

New CPU inference speed gains of 30% to 500% via Llamafile

https://youtu.be/-mRi-B3t6fA

This video of a talk given few days ago discusses techniques used to increase CPU inference speed.

Of particular interest to me is the Threadripper speedups mentioned at 10:30 ish

"if you have a threadripper you're going to see better performance than ever, almost like a GPU"

The slide shows a speedup of 300 tok/s --> 2400 tok/s which is if I'm not mistaken, a 700% gain

Granted it's not too meaningful without knowing which model they were testing it on, but still, this is great news, especially together with the intro speaker's position asserting the importance of open source ai

Originally created by @ErfolgreichCharismatisch on GitHub (Jan 4, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/8306 ### What is the issue? Llamafile is much faster on cpu than ollama, what takes ollama 33 minutes takes llamafile 3 minutes with the same model. llamafile crashes unfortunately after reusing it and spins its wheels staying at 100% CPU for hours. I'd rather use a stable ollama, but you must work on speed on CPU ### OS Linux ### CPU Intel ### Other resources See also this post from [this](https://www.reddit.com/r/LocalLLaMA/comments/1e6v8qb/new_cpu_inference_speed_gains_of_30_to_500_via/) source: > New CPU inference speed gains of 30% to 500% via Llamafile > > https://youtu.be/-mRi-B3t6fA > > This video of a talk given few days ago discusses techniques used to increase CPU inference speed. > > Of particular interest to me is the Threadripper speedups mentioned at 10:30 ish > > "if you have a threadripper you're going to see better performance than ever, almost like a GPU" > > The slide shows a speedup of 300 tok/s --> 2400 tok/s which is if I'm not mistaken, a 700% gain > > Granted it's not too meaningful without knowing which model they were testing it on, but still, this is great news, especially together with the intro speaker's position asserting the importance of open source ai
GiteaMirror added the feature request label 2026-04-28 21:02:20 -05:00
Author
Owner

@rick-github commented on GitHub (Jan 5, 2025):

dupe #8305

<!-- gh-comment-id:2571448916 --> @rick-github commented on GitHub (Jan 5, 2025): dupe #8305
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#51830