[GH-ISSUE #5923] Slow Model Loading Speed on macOS System #3697

Open
opened 2026-04-12 14:30:45 -05:00 by GiteaMirror · 3 comments
Owner

Originally created by @ghost on GitHub (Jul 24, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/5923

What is the issue?

I am experiencing slow model loading speeds when using Ollama on my macOS system. Here are the specifications of my setup:

macOS Version: 14.5
Processor: M3 Max
Memory: 128GB
Storage: 2TB (with performance on par with the 8TB version)
Ollama version: 0.2.8

Despite having sufficient hardware capabilities, the model loading speed typically hovers around 700MB/s. During the loading process, I do not observe any component (CPU, Disk, Memory, GPU) being fully utilized or experiencing high usage. Could you please help me understand the reason for this bottleneck and suggest any potential solutions or optimizations?

Thank you for your assistance.

OS

macOS

GPU

Apple

CPU

Apple

Ollama version

0.2.8

Originally created by @ghost on GitHub (Jul 24, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/5923 ### What is the issue? I am experiencing slow model loading speeds when using Ollama on my macOS system. Here are the specifications of my setup: macOS Version: 14.5 Processor: M3 Max Memory: 128GB Storage: 2TB (with performance on par with the 8TB version) Ollama version: 0.2.8 Despite having sufficient hardware capabilities, the model loading speed typically hovers around 700MB/s. During the loading process, I do not observe any component (CPU, Disk, Memory, GPU) being fully utilized or experiencing high usage. Could you please help me understand the reason for this bottleneck and suggest any potential solutions or optimizations? Thank you for your assistance. ### OS macOS ### GPU Apple ### CPU Apple ### Ollama version 0.2.8
GiteaMirror added the bug label 2026-04-12 14:30:45 -05:00
Author
Owner

@igorschlum commented on GitHub (Jul 24, 2024):

Hi @Yuhuadi,

I have a MacStation with 192 GB of RAM. When I run llama3:70b in the terminal and check the Activity Monitor app, the average data transfer rate is 920 MB/s.

The speed of 920 MB/s is quite good for an internal SSD on macOS. It indicates that your system is efficiently handling the data transfer requirements for running llama3:70b, which should contribute to smooth performance.

<!-- gh-comment-id:2248990343 --> @igorschlum commented on GitHub (Jul 24, 2024): Hi @Yuhuadi, I have a MacStation with 192 GB of RAM. When I run llama3:70b in the terminal and check the Activity Monitor app, the average data transfer rate is 920 MB/s. The speed of 920 MB/s is quite good for an internal SSD on macOS. It indicates that your system is efficiently handling the data transfer requirements for running llama3:70b, which should contribute to smooth performance.
Author
Owner

@ghost commented on GitHub (Jul 28, 2024):

Hi @Yuhuadi,

I have a MacStation with 192 GB of RAM. When I run llama3:70b in the terminal and check the Activity Monitor app, the average data transfer rate is 920 MB/s.

The speed of 920 MB/s is quite good for an internal SSD on macOS. It indicates that your system is efficiently handling the data transfer requirements for running llama3:70b, which should contribute to smooth performance.

As far as I know, the bandwidth of Mac's hard drives far exceeds this number. Even when connected externally via USB-C, speeds can reach several GB/s.

<!-- gh-comment-id:2254345485 --> @ghost commented on GitHub (Jul 28, 2024): > Hi @Yuhuadi, > > I have a MacStation with 192 GB of RAM. When I run llama3:70b in the terminal and check the Activity Monitor app, the average data transfer rate is 920 MB/s. > > The speed of 920 MB/s is quite good for an internal SSD on macOS. It indicates that your system is efficiently handling the data transfer requirements for running llama3:70b, which should contribute to smooth performance. As far as I know, the bandwidth of Mac's hard drives far exceeds this number. Even when connected externally via USB-C, speeds can reach several GB/s.
Author
Owner

@igorschlum commented on GitHub (Jul 28, 2024):

@Yuhuadi Ollama not only read the file from the HardDrive, the data have to be transfered to the GPU and do several tasks. Could be great if it's faster, but limitation can be on llama.cpp
It shloud be interesting to test other app on mac that can run LLM and see if they load LLM faster that Ollama and then improve Ollama.

<!-- gh-comment-id:2254461551 --> @igorschlum commented on GitHub (Jul 28, 2024): @Yuhuadi Ollama not only read the file from the HardDrive, the data have to be transfered to the GPU and do several tasks. Could be great if it's faster, but limitation can be on llama.cpp It shloud be interesting to test other app on mac that can run LLM and see if they load LLM faster that Ollama and then improve Ollama.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#3697