[GH-ISSUE #1692] Mac OS Sonoma crashes completely when loading LLM #953

Closed
opened 2026-04-12 10:39:12 -05:00 by GiteaMirror · 6 comments
Owner

Originally created by @sanctimon on GitHub (Dec 24, 2023).
Original GitHub issue: https://github.com/ollama/ollama/issues/1692

I have pulled the model (dolphin-mixtral:latest) and when I attempt to run, the entire machine freezes. A few minutes later it restarts.

Specs: MacBook Pro M1 Pro. 16GB RAM.

With Activity Monitor on, it seems to be filling up the RAM quite quickly before the crash.

Originally created by @sanctimon on GitHub (Dec 24, 2023). Original GitHub issue: https://github.com/ollama/ollama/issues/1692 I have pulled the model (dolphin-mixtral:latest) and when I attempt to run, the entire machine freezes. A few minutes later it restarts. Specs: MacBook Pro M1 Pro. 16GB RAM. With Activity Monitor on, it seems to be filling up the RAM quite quickly before the crash.
GiteaMirror added the bug label 2026-04-12 10:39:12 -05:00
Author
Owner

@igorschlum commented on GitHub (Dec 24, 2023):

Hi @sanctimon
When you enter
Ollama --version
What version number shows up?
If it's 0.0.0 thad is because you installed Ollama with brew install Ollama
And brew script is not consistent and install an older version.
Download the app from ollama.ai home page and retry.

<!-- gh-comment-id:1868470217 --> @igorschlum commented on GitHub (Dec 24, 2023): Hi @sanctimon When you enter Ollama --version What version number shows up? If it's 0.0.0 thad is because you installed Ollama with brew install Ollama And brew script is not consistent and install an older version. Download the app from ollama.ai home page and retry.
Author
Owner

@BruceMacD commented on GitHub (Dec 24, 2023):

Hi @sanctimon, Ollama shouldn't crash your computer (this part is a bug), but in this case you don't have enough RAM to run dolphin-mixtral, mixtral based models require a good amount of memory. In this case probably like ~30GB at a guess.

<!-- gh-comment-id:1868528708 --> @BruceMacD commented on GitHub (Dec 24, 2023): Hi @sanctimon, Ollama shouldn't crash your computer (this part is a bug), but in this case you don't have enough RAM to run `dolphin-mixtral`, mixtral based models require a good amount of memory. In this case probably like ~30GB at a guess.
Author
Owner

@sanctimon commented on GitHub (Dec 24, 2023):

I see, thank you both.

  1. I agree that Ollama should not cause the equivalent of BSOD in Macs; it should simply throw an error that memory to run a model is insufficient.
  2. Is there no way to use cache or swap for these cases, particularly when the SSD in these new MacBook Pros is so frightfully fast?
<!-- gh-comment-id:1868606885 --> @sanctimon commented on GitHub (Dec 24, 2023): I see, thank you both. 1. I agree that Ollama should not cause the equivalent of BSOD in Macs; it should simply throw an error that memory to run a model is insufficient. 2. Is there no way to use cache or swap for these cases, particularly when the SSD in these new MacBook Pros is so frightfully fast?
Author
Owner

@BruceMacD commented on GitHub (Dec 29, 2023):

@sanctimon Good question, running a model should currently do some memory mapping which will use your storage (SSD). However for token inference the whole model needs to pass through the CPU/GPU, given that each tensor in the model is involved in the inference process for every token.

<!-- gh-comment-id:1872100490 --> @BruceMacD commented on GitHub (Dec 29, 2023): @sanctimon Good question, running a model should currently do some memory mapping which will use your storage (SSD). However for token inference the whole model needs to pass through the CPU/GPU, given that each tensor in the model is involved in the inference process for every token.
Author
Owner

@igorschlum commented on GitHub (Jan 9, 2024):

@sanctimon did you tried with 0.1.18? Memory handling is a little different and it could tell you if you don't have enough memory.

<!-- gh-comment-id:1883309431 --> @igorschlum commented on GitHub (Jan 9, 2024): @sanctimon did you tried with 0.1.18? Memory handling is a little different and it could tell you if you don't have enough memory.
Author
Owner

@pdevine commented on GitHub (Jan 25, 2024):

@sanctimon Sorry that you saw this. There have been a bunch of fixes here for improving memory management, and I think it should be working now. Can you try again w/ 0.1.20 (or 0.1.21 when it comes out)?

I'll go ahead and close the issue for now.

<!-- gh-comment-id:1911122905 --> @pdevine commented on GitHub (Jan 25, 2024): @sanctimon Sorry that you saw this. There have been a bunch of fixes here for improving memory management, and I *think* it should be working now. Can you try again w/ 0.1.20 (or 0.1.21 when it comes out)? I'll go ahead and close the issue for now.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#953