[GH-ISSUE #3404] Command R model works very slow on MAC #48606

Closed
opened 2026-04-28 08:56:13 -05:00 by GiteaMirror · 10 comments
Owner

Originally created by @Zig1375 on GitHub (Mar 29, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/3404

Originally assigned to: @dhiltgen on GitHub.

What is the issue?

The Command R model runs very slowly on a Mac (with an M2 Pro CPU and 32GB of RAM). It utilizes only 80-90% of the CPU, out of a possible 1200% (which results in processing about 1 token every 20-30 seconds).

However, on a Windows 11 machine (equipped with an Nvidia 4070 GPU), it runs very quickly (processing about 5-10 tokens per second).

Previously, with other models, the situation was the opposite. The Mac ran much faster, even surpassing the performance of the Windows machine.

What did you expect to see?

Mac works faster, at least a few tokens per second.

Steps to reproduce

Install Command R model on mac m2 pro.

Are there any recent changes that introduced the issue?

No response

OS

macOS

Architecture

arm64

Platform

No response

Ollama version

0.1.30

GPU

Apple

GPU info

No response

CPU

Apple

Other software

No response

Originally created by @Zig1375 on GitHub (Mar 29, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/3404 Originally assigned to: @dhiltgen on GitHub. ### What is the issue? The Command R model runs very slowly on a Mac (with an M2 Pro CPU and 32GB of RAM). It utilizes only 80-90% of the CPU, out of a possible 1200% (which results in processing about 1 token every 20-30 seconds). However, on a Windows 11 machine (equipped with an Nvidia 4070 GPU), it runs very quickly (processing about 5-10 tokens per second). Previously, with other models, the situation was the opposite. The Mac ran much faster, even surpassing the performance of the Windows machine. ### What did you expect to see? Mac works faster, at least a few tokens per second. ### Steps to reproduce Install Command R model on mac m2 pro. ### Are there any recent changes that introduced the issue? _No response_ ### OS macOS ### Architecture arm64 ### Platform _No response_ ### Ollama version 0.1.30 ### GPU Apple ### GPU info _No response_ ### CPU Apple ### Other software _No response_
GiteaMirror added the bug label 2026-04-28 08:56:13 -05:00
Author
Owner

@ylca0 commented on GitHub (Apr 3, 2024):

This appears to be due to insufficient memory: #2324

<!-- gh-comment-id:2034240386 --> @ylca0 commented on GitHub (Apr 3, 2024): This appears to be due to insufficient memory: #2324
Author
Owner

@Zig1375 commented on GitHub (Apr 3, 2024):

I've just checked this... On my mac there are about 10GB RAM free!!!
Why does it not use all the memory?
Also, before I used models bigger than this one and they worked quite fast

<!-- gh-comment-id:2034268275 --> @Zig1375 commented on GitHub (Apr 3, 2024): I've just checked this... On my mac there are about 10GB RAM free!!! Why does it not use all the memory? Also, before I used models bigger than this one and they worked quite fast
Author
Owner

@igorschlum commented on GitHub (Apr 6, 2024):

How much memory do you have in Total on your computer. I have 32GB on my Mac. I can test a script and let you know if it is slow also on my computer.

<!-- gh-comment-id:2040798997 --> @igorschlum commented on GitHub (Apr 6, 2024): How much memory do you have in Total on your computer. I have 32GB on my Mac. I can test a script and let you know if it is slow also on my computer.
Author
Owner

@ylca0 commented on GitHub (Apr 6, 2024):

@igorschlum I have 36GB memory. ollama uses pure cpu calculations when I run the 70b 4bit model, and the speed is around 20-30 seconds per token. I tried to run it using llama.cpp, but the memory usage was too high and the computer crashed.

<!-- gh-comment-id:2040986192 --> @ylca0 commented on GitHub (Apr 6, 2024): @igorschlum I have 36GB memory. ollama uses pure cpu calculations when I run the 70b 4bit model, and the speed is around 20-30 seconds per token. I tried to run it using llama.cpp, but the memory usage was too high and the computer crashed.
Author
Owner

@sourcedexter commented on GitHub (Apr 16, 2024):

I have the same issue on Mac M1, plenty of memory is free and yet the gneration is very very slow. i have tested other models like Llama2, codegemma, and those are pretty fast.

<!-- gh-comment-id:2058624627 --> @sourcedexter commented on GitHub (Apr 16, 2024): I have the same issue on Mac M1, plenty of memory is free and yet the gneration is very very slow. i have tested other models like Llama2, codegemma, and those are pretty fast.
Author
Owner

@SmileSydney commented on GitHub (Jun 1, 2024):

Having the same issue M2 with 64GB. Activity Monitor shows 9GB used.

<!-- gh-comment-id:2143335168 --> @SmileSydney commented on GitHub (Jun 1, 2024): Having the same issue M2 with 64GB. Activity Monitor shows 9GB used.
Author
Owner

@dhiltgen commented on GitHub (Jun 1, 2024):

Can you try again on the latest release? On a 32G M2 I'm seeing ~7 TPS and

% ollama ps
NAME            	ID          	SIZE 	PROCESSOR     	UNTIL
command-r:latest	b8cdfff0263c	24 GB	6%/94% CPU/GPU	4 minutes from now

Apple reserves a portion of RAM for the OS and wont allow VRAM beyond a certain level.

I haven't tried, but you can experiment with sudo sysctl iogpu.wired_limit_mb=XXXX to allow more GPU usage, but you may starve the OS and cause instability.

<!-- gh-comment-id:2143611898 --> @dhiltgen commented on GitHub (Jun 1, 2024): Can you try again on the latest release? On a 32G M2 I'm seeing ~7 TPS and ``` % ollama ps NAME ID SIZE PROCESSOR UNTIL command-r:latest b8cdfff0263c 24 GB 6%/94% CPU/GPU 4 minutes from now ``` Apple reserves a portion of RAM for the OS and wont allow VRAM beyond a certain level. I haven't tried, but you can experiment with `sudo sysctl iogpu.wired_limit_mb=XXXX` to allow more GPU usage, but you may starve the OS and cause instability.
Author
Owner

@Zig1375 commented on GitHub (Jun 2, 2024):

I'll be able to test it tomorrow

<!-- gh-comment-id:2143817216 --> @Zig1375 commented on GitHub (Jun 2, 2024): I'll be able to test it tomorrow
Author
Owner

@Zig1375 commented on GitHub (Jun 2, 2024):

@dhiltgen
Wow! It works very very fast now! Thank you!

<!-- gh-comment-id:2144008275 --> @Zig1375 commented on GitHub (Jun 2, 2024): @dhiltgen Wow! It works very very fast now! Thank you!
Author
Owner

@igorschlum commented on GitHub (Jun 2, 2024):

@Zig1375 good news, if the issue is solve, could you please close it?

<!-- gh-comment-id:2144012993 --> @igorschlum commented on GitHub (Jun 2, 2024): @Zig1375 good news, if the issue is solve, could you please close it?
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#48606