[GH-ISSUE #5680] Extremely slow on Mac M1 chip #29301

Closed
opened 2026-04-22 08:03:22 -05:00 by GiteaMirror · 7 comments
Owner

Originally created by @lulunac27a on GitHub (Jul 13, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/5680

Originally assigned to: @jmorganca on GitHub.

What is the issue?

I tried chatting using Llama from Meta AI, when the answer is generating, my computer is so slow and sometimes freezes (like my mouse not moving when I move the trackpad). It takes few minutes to completely generate an answer from a question. I use Apple M1 chip with 8GB of RAM memory.

OS

macOS

GPU

Apple

CPU

Apple

Ollama version

0.2.2

Originally created by @lulunac27a on GitHub (Jul 13, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/5680 Originally assigned to: @jmorganca on GitHub. ### What is the issue? I tried chatting using Llama from Meta AI, when the answer is generating, my computer is so slow and sometimes freezes (like my mouse not moving when I move the trackpad). It takes few minutes to completely generate an answer from a question. I use Apple M1 chip with 8GB of RAM memory. ### OS macOS ### GPU Apple ### CPU Apple ### Ollama version 0.2.2
GiteaMirror added the bug label 2026-04-22 08:03:22 -05:00
Author
Owner

@rick-github commented on GitHub (Jul 13, 2024):

What was the size of the model? Most models are multi-gigabytes in size, if you tried to run one of these with 8G of RAM then I imagine that it will cause a lot of swap activity. Try running a smaller model (eg, https://ollama.com/library/qwen2:0.5b) and see if performance improves.

<!-- gh-comment-id:2227099513 --> @rick-github commented on GitHub (Jul 13, 2024): What was the size of the model? Most models are multi-gigabytes in size, if you tried to run one of these with 8G of RAM then I imagine that it will cause a lot of swap activity. Try running a smaller model (eg, https://ollama.com/library/qwen2:0.5b) and see if performance improves.
Author
Owner

@lulunac27a commented on GitHub (Jul 14, 2024):

llama3 with 8B size

<!-- gh-comment-id:2227157868 --> @lulunac27a commented on GitHub (Jul 14, 2024): llama3 with 8B size
Author
Owner

@sudochia commented on GitHub (Jul 14, 2024):

Try https://ollama.com/library/tinyllama and see how it goes?

<!-- gh-comment-id:2227208562 --> @sudochia commented on GitHub (Jul 14, 2024): Try https://ollama.com/library/tinyllama and see how it goes?
Author
Owner

@igorschlum commented on GitHub (Jul 15, 2024):

@lulunac27a you will need a 16GB mac, but to learn Tinyllama is good. You can also use llama3:8b-instruct-q2_K and restart your Mac before and use few apps to let max amount of memory to Ollama and the model.

<!-- gh-comment-id:2227661021 --> @igorschlum commented on GitHub (Jul 15, 2024): @lulunac27a you will need a 16GB mac, but to learn Tinyllama is good. You can also use llama3:8b-instruct-q2_K and restart your Mac before and use few apps to let max amount of memory to Ollama and the model.
Author
Owner

@dhiltgen commented on GitHub (Jul 23, 2024):

As others have pointed out, llama3 is going to be large for your system if you have anything else using memory.

> ollama ps
NAME         	ID          	SIZE  	PROCESSOR	UNTIL
llama3:latest	a6990ed6be41	5.5 GB	100% GPU 	4 minutes from now

On an 8G Mac, only ~5.7GB is available for VRAM usage, not leaving much buffer, so while it does work, it is pushing the limits of your system.

<!-- gh-comment-id:2246293807 --> @dhiltgen commented on GitHub (Jul 23, 2024): As others have pointed out, llama3 is going to be large for your system if you have anything else using memory. ``` > ollama ps NAME ID SIZE PROCESSOR UNTIL llama3:latest a6990ed6be41 5.5 GB 100% GPU 4 minutes from now ``` On an 8G Mac, only ~5.7GB is available for VRAM usage, not leaving much buffer, so while it does work, it is pushing the limits of your system.
Author
Owner

@piotrszczesniak commented on GitHub (Sep 25, 2024):

Try https://ollama.com/library/tinyllama and see how it goes?

@sudochia - this model is way quicker to respond, thanks!

<!-- gh-comment-id:2375199305 --> @piotrszczesniak commented on GitHub (Sep 25, 2024): > Try https://ollama.com/library/tinyllama and see how it goes? @sudochia - this model is way quicker to respond, thanks!
Author
Owner

@igorschlum commented on GitHub (Sep 26, 2024):

Hi @piotrszczesniak, you can try pulling llama3.2 which is a smaller LLM and may respond better than tinyllama.

<!-- gh-comment-id:2377017426 --> @igorschlum commented on GitHub (Sep 26, 2024): Hi @piotrszczesniak, you can try pulling [llama3.2](https://ollama.com/library/llama3.2) which is a smaller LLM and may respond better than tinyllama.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#29301