[GH-ISSUE #275] Slow performance on Intel CPU #118

Closed
opened 2026-04-12 09:39:10 -05:00 by GiteaMirror · 12 comments
Owner

Originally created by @jmorganca on GitHub (Aug 4, 2023).
Original GitHub issue: https://github.com/ollama/ollama/issues/275

When running on an i7-6700K CPU, and 32GB of memory, the performance was very slow

ollama run wizard-vicuna --verbose
>>> Hello
    
I hope you're doing well today. May I know your name and purpose of calling?

total duration:       1m57.311123082s
load duration:        3.703261258s
sample count:         21 token(s)
sample duration:      11.928ms
sample rate:          1760.56 tokens/s
prompt eval count:    13 token(s)
prompt eval duration: 44.866549s
prompt eval rate:     0.29 tokens/s
eval count:           20 token(s)
eval duration:        1m8.72493s
eval rate:            0.29 tokens/s
Originally created by @jmorganca on GitHub (Aug 4, 2023). Original GitHub issue: https://github.com/ollama/ollama/issues/275 When running on an i7-6700K CPU, and 32GB of memory, the performance was very slow ``` ollama run wizard-vicuna --verbose >>> Hello I hope you're doing well today. May I know your name and purpose of calling? total duration: 1m57.311123082s load duration: 3.703261258s sample count: 21 token(s) sample duration: 11.928ms sample rate: 1760.56 tokens/s prompt eval count: 13 token(s) prompt eval duration: 44.866549s prompt eval rate: 0.29 tokens/s eval count: 20 token(s) eval duration: 1m8.72493s eval rate: 0.29 tokens/s ```
GiteaMirror added the bugperformance labels 2026-04-12 09:39:10 -05:00
Author
Owner

@BSChuang commented on GitHub (Aug 4, 2023):

Some additional PC information:
CPU: Intel® Core™ i7-6700 CPU @ 3.40GHz × 8
RAM: 32.0 GiB
GPU: Mesa Intel® HD Graphics 530 (SKL GT2)
OS: Ubuntu 22.04.2 LTS

Our initial guess is the GPU is too poor, but the LLM isn't configured to use GPU (as of yet), and the GPU isn't under any load during evaluation, so that is most likely not the issue.

<!-- gh-comment-id:1664940098 --> @BSChuang commented on GitHub (Aug 4, 2023): Some additional PC information: CPU: Intel® Core™ i7-6700 CPU @ 3.40GHz × 8 RAM: 32.0 GiB GPU: Mesa Intel® HD Graphics 530 (SKL GT2) OS: Ubuntu 22.04.2 LTS Our initial guess is the GPU is too poor, but the LLM isn't configured to use GPU (as of yet), and the GPU isn't under any load during evaluation, so that is most likely not the issue.
Author
Owner

@BSChuang commented on GitHub (Aug 4, 2023):

I believe the problem is with Ubuntu, I downloaded ollama on windows and have been seeing significant performance increases. I also tested ollama on WSL (also Ubuntu) and saw similar slow performance akin to the initial test.

>>> hello
Welcome to the chatbot. How may I assist you?

total duration:       9.3766134s
load duration:        1.9104ms
sample count:         14 token(s)
sample duration:      10.419ms
sample rate:          1343.70 tokens/s
prompt eval count:    1 token(s)
eval count:           14 token(s)
eval duration:        9.361045s
eval rate:            1.50 tokens/s```
<!-- gh-comment-id:1665675777 --> @BSChuang commented on GitHub (Aug 4, 2023): I believe the problem is with Ubuntu, I downloaded ollama on windows and have been seeing significant performance increases. I also tested ollama on WSL (also Ubuntu) and saw similar slow performance akin to the initial test. ```./ollama run wizard-vicuna --verbose >>> hello Welcome to the chatbot. How may I assist you? total duration: 9.3766134s load duration: 1.9104ms sample count: 14 token(s) sample duration: 10.419ms sample rate: 1343.70 tokens/s prompt eval count: 1 token(s) eval count: 14 token(s) eval duration: 9.361045s eval rate: 1.50 tokens/s```
Author
Owner

@boneitis commented on GitHub (Aug 6, 2023):

Nothing substantial to add; just wanted to chime in to say I have sort-of similar hardware and would be happy to help with having tests also run on my hardware.

Don't know if this is to be expected (as I'm pretty new to tinkering with AI chat bots, let alone using them), but I'm getting similar crunch times running up to minutes on my deployment:

llama2-uncensored, 7B
i7-7700 (non-K)
16 32 GB memory
RTX 2060 12GB (although my understanding is Ollama doesn't currently tap into GPU)
Mint 21.2 (offshoot of Ubuntu 22 LTS)

I may upgrade to 32 GB if I'm getting enough kicks out of playing with these bots. (I have upgraded to 32 GB.) If it helps with Ollama project dev/testing, no problem swapping out models and stuff. For the time being, I intend to leave it running on a fairly active yet casual Discord server I administer.

<!-- gh-comment-id:1666695467 --> @boneitis commented on GitHub (Aug 6, 2023): Nothing substantial to add; just wanted to chime in to say I have sort-of similar hardware and would be happy to help with having tests also run on my hardware. Don't know if this is to be expected (as I'm pretty new to tinkering with AI chat bots, let alone using them), but I'm getting similar crunch times running up to minutes on my deployment: llama2-uncensored, 7B i7-7700 (non-K) ~~16~~ 32 GB memory RTX 2060 12GB (although my understanding is Ollama doesn't currently tap into GPU) Mint 21.2 (offshoot of Ubuntu 22 LTS) ~~I may upgrade to 32 GB if I'm getting enough kicks out of playing with these bots.~~ (I have upgraded to 32 GB.) If it helps with Ollama project dev/testing, no problem swapping out models and stuff. For the time being, I intend to leave it running on a fairly active yet casual Discord server I administer.
Author
Owner

@jmorganca commented on GitHub (Oct 28, 2023):

Hi folks, this has been open for quite awhile, and there have been quite a few improvements to Ollama's performance – including Nvidia GPU support. I'll close this for now but feel free to re-open if you are still seeing painfully slow responses on modern or semi-modern hardware (<1 t/s)

<!-- gh-comment-id:1783907169 --> @jmorganca commented on GitHub (Oct 28, 2023): Hi folks, this has been open for quite awhile, and there have been quite a few improvements to Ollama's performance – including Nvidia GPU support. I'll close this for now but feel free to re-open if you are still seeing painfully slow responses on modern or semi-modern hardware (<1 t/s)
Author
Owner

@paulwababu commented on GitHub (Jan 2, 2024):

I’m still getting similar issues even when running small models like 3B models, I’m using Ubuntu and I have 32 GB of ram on googlecloud

<!-- gh-comment-id:1874507201 --> @paulwababu commented on GitHub (Jan 2, 2024): I’m still getting similar issues even when running small models like 3B models, I’m using Ubuntu and I have 32 GB of ram on googlecloud
Author
Owner

@uptopoint commented on GitHub (Jan 30, 2024):

I am having same problem, i have 3.7Ghz cpu , But Ollama takes 40 seconds just to tell a joke. I have my other friends running ollama on windows on much slower cpu and it works much better. how to find out whats the issue, there is some issue here for sure.

<!-- gh-comment-id:1916388648 --> @uptopoint commented on GitHub (Jan 30, 2024): I am having same problem, i have 3.7Ghz cpu , But Ollama takes 40 seconds just to tell a joke. I have my other friends running ollama on windows on much slower cpu and it works much better. how to find out whats the issue, there is some issue here for sure.
Author
Owner

@tak099 commented on GitHub (Feb 18, 2024):

How to increase the speed of the response?

<!-- gh-comment-id:1951059924 --> @tak099 commented on GitHub (Feb 18, 2024): How to increase the speed of the response?
Author
Owner

@tak099 commented on GitHub (Feb 18, 2024):

image This is insane...
<!-- gh-comment-id:1951082658 --> @tak099 commented on GitHub (Feb 18, 2024): <img width="805" alt="image" src="https://github.com/ollama/ollama/assets/143394573/255a4b10-a9c6-43c5-8fb1-5c5cdd8e99eb"> This is insane...
Author
Owner

@OmarZidan1997 commented on GitHub (Feb 29, 2024):

Getting the same as people mentioning above. I bought intel laptop for $2.2k with i7 13th gen, 32 gb ram. Takes ages to run simple command. I think it has to be with ubuntu being restricted from winOS to actually take controll most of the ram. I am not ubuntu expert but wll make research on some configuration possibilities.

<!-- gh-comment-id:1971924546 --> @OmarZidan1997 commented on GitHub (Feb 29, 2024): Getting the same as people mentioning above. I bought intel laptop for $2.2k with i7 13th gen, 32 gb ram. Takes ages to run simple command. I think it has to be with ubuntu being restricted from winOS to actually take controll most of the ram. I am not ubuntu expert but wll make research on some configuration possibilities.
Author
Owner

@protono commented on GitHub (Mar 22, 2024):

11th Gen Intel(R) Core(TM) i5-1145G7 @ 2.60GHz
6.6.19-1-MANJARO
ollama server llama2 --verbose
>>> hello
total duration: 3.698377939s

<!-- gh-comment-id:2015082905 --> @protono commented on GitHub (Mar 22, 2024): `11th Gen Intel(R) Core(TM) i5-1145G7 @ 2.60GHz` `6.6.19-1-MANJARO` `ollama server llama2 --verbose` `>>> hello` `total duration: 3.698377939s`
Author
Owner

@OmarZidan1997 commented on GitHub (Mar 22, 2024):

11th Gen Intel(R) Core(TM) i5-1145G7 @ 2.60GHz 6.6.19-1-MANJARO ollama server llama2 --verbose >>> hello total duration: 3.698377939s

Are you running on GPU or mainly CPU power?

<!-- gh-comment-id:2015371275 --> @OmarZidan1997 commented on GitHub (Mar 22, 2024): > `11th Gen Intel(R) Core(TM) i5-1145G7 @ 2.60GHz` `6.6.19-1-MANJARO` `ollama server llama2 --verbose` `>>> hello` `total duration: 3.698377939s` Are you running on GPU or mainly CPU power?
Author
Owner

@protono commented on GitHub (Mar 22, 2024):

I'm using a lenovo t14 gen 2 laptop with irisx graphics.

<!-- gh-comment-id:2015409877 --> @protono commented on GitHub (Mar 22, 2024): I'm using a lenovo t14 gen 2 laptop with irisx graphics.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#118