[GH-ISSUE #10862] Ollama Multi-threading on aarch64 CPU compared to Llama.cpp #84821

Open
opened 2026-05-09 21:49:21 -05:00 by GiteaMirror · 0 comments
Owner

Originally created by @luentong on GitHub (May 26, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/10862

I run DeepSeek-R1-Distill-Qwen-7B, Float16 inference using both Ollama and Llama.cpp, and my machine has 24 physical cores.

Ollama creates 37 threads, while 24 of them has non-0% CPU usage, but mostly around 15% - 10% CPU usage.

Image

Llama.cpp creates 25 threads, while 24 of them has close to 100% CPU usage, the 25th thread is the main thread waiting for the other 24 threads, presumably.

Image

Although the CPU usage stats differ significantly, the inference performance remain similar. For Ollama, it's around 9 tokens/s. For Llama.cpp, it's around 9-10 tokens/s.

Anyone knows what kind of abstractions Ollama put on Llama.cpp to make this happen? Thank you

Originally created by @luentong on GitHub (May 26, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/10862 I run DeepSeek-R1-Distill-Qwen-7B, Float16 inference using both Ollama and Llama.cpp, and my machine has 24 physical cores. Ollama creates 37 threads, while 24 of them has non-0% CPU usage, but mostly around 15% - 10% CPU usage. ![Image](https://github.com/user-attachments/assets/603695e0-734e-4d92-95a4-ca3e3d764723) Llama.cpp creates 25 threads, while 24 of them has close to 100% CPU usage, the 25th thread is the main thread waiting for the other 24 threads, presumably. ![Image](https://github.com/user-attachments/assets/21e43bdc-31cb-4dc5-8ea6-92c9f4128cf0) Although the CPU usage stats differ significantly, the inference performance remain similar. For Ollama, it's around 9 tokens/s. For Llama.cpp, it's around 9-10 tokens/s. Anyone knows what kind of abstractions Ollama put on Llama.cpp to make this happen? Thank you
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#84821