[GH-ISSUE #6993] llama3.1:70b CPU battle neck? #4427

Closed
opened 2026-04-12 15:21:50 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @jasonliuspark123 on GitHub (Sep 27, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/6993

Originally assigned to: @dhiltgen on GitHub.

What is the issue?

CPU only use one core at 100%, while gpu cores mostly run at at less than 20%.

Model is not responding at good speed.

I'm wondering if this usage of one cpu core becomes the bottle neck for the performance.

I have read https://github.com/ggerganov/llama.cpp/issues/8684, but have not seen an exact answer.

c24f866aff701f57270e97aa4b667cd

922760ada75a01e64b76b9c323825fd

OS

Linux

GPU

Nvidia

CPU

Intel

Ollama version

0.3.10

Originally created by @jasonliuspark123 on GitHub (Sep 27, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/6993 Originally assigned to: @dhiltgen on GitHub. ### What is the issue? CPU only use one core at 100%, while gpu cores mostly run at at less than 20%. Model is not responding at good speed. I'm wondering if this usage of one cpu core becomes the bottle neck for the performance. I have read https://github.com/ggerganov/llama.cpp/issues/8684, but have not seen an exact answer. ![c24f866aff701f57270e97aa4b667cd](https://github.com/user-attachments/assets/ddf4b9fe-7d60-4a4c-9c10-ee857afecf9d) ![922760ada75a01e64b76b9c323825fd](https://github.com/user-attachments/assets/280d8540-0a1d-4e0d-a93c-75ba8ac1500e) ### OS Linux ### GPU Nvidia ### CPU Intel ### Ollama version 0.3.10
GiteaMirror added the performancebug labels 2026-04-12 15:21:50 -05:00
Author
Owner

@jasonliuspark123 commented on GitHub (Sep 27, 2024):

I understand GPU is doing the actural calculating work. In my case, would it make sense that cpu hasn't send enough requests to GPU to utilize all the computinng power?

<!-- gh-comment-id:2378336869 --> @jasonliuspark123 commented on GitHub (Sep 27, 2024): I understand GPU is doing the actural calculating work. In my case, would it make sense that cpu hasn't send enough requests to GPU to utilize all the computinng power?
Author
Owner

@dhiltgen commented on GitHub (Sep 28, 2024):

Duplicate of #6913

<!-- gh-comment-id:2381022318 --> @dhiltgen commented on GitHub (Sep 28, 2024): Duplicate of #6913
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#4427