[PR #12808] darwin: improve free memory reporting #45210

Open
opened 2026-04-25 00:54:38 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/12808
Author: @dhiltgen
Created: 10/28/2025
Status: 🔄 Open

Base: mainHead: darwin_mem


📝 Commits (1)

  • 94e62f3 darwin: improve free memory reporting

📊 Changes

5 files changed (+85 additions, -12 deletions)

View changed files

📝 discover/gpu_info_darwin.m (+8 -4)
📝 discover/runner.go (+0 -5)
llama/patches/0030-reduce-metal-free-memory-based-on-system-free-memory.patch (+49 -0)
📝 llama/patches/0032-interleave-multi-rope.patch (+2 -2)
📝 ml/backend/ggml/ggml/src/ggml-metal/ggml-metal-device.m (+26 -1)

📄 Description

We have been incorrectly reporting free space which can lead to stability problems if excessive VRAM is allocated. (graphics flickering, windows failing to render properly, which only clear up after a reboot.)

Before this change, on a 128G M3 mac I saw log lines like the following:

time=2025-10-28T13:30:11.774-07:00 level=INFO source=types.go:40 msg="inference compute" id=0 library=Metal compute=0.0 name=Metal description="Apple M3 Max" libdirs="" driver=0.0 pci_id=00:00.0 type=discrete total="96.0 GiB" available="96.0 GiB"
...
time=2025-10-28T13:30:37.564-07:00 level=INFO source=server.go:455 msg="system memory" total="128.0 GiB" free="69.2 GiB" free_swap="0 B"

However, other tools show ~77% used memory, and only ~29G free which is more accurate.

With this change:

time=2025-10-28T13:32:14.566-07:00 level=INFO source=types.go:40 msg="inference compute" id=0 library=Metal compute=0.0 name=Metal description="Apple M3 Max" libdirs="" driver=0.0 pci_id=00:00.0 type=discrete total="96.0 GiB" available="29.0 GiB"
...
time=2025-10-28T13:32:28.495-07:00 level=INFO source=server.go:455 msg="system memory" total="128.0 GiB" free="29.1 GiB" free_swap="0 B"

Before we would be more aggressive in causing the OS to start swapping, up to a point, but sometimes this causes stability problems. With this change, systems that are low on free space will load fewer layers into GPU, and use more CPU, so it's possible this might have a performance impact in some cases if the swapping would have worked.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/12808 **Author:** [@dhiltgen](https://github.com/dhiltgen) **Created:** 10/28/2025 **Status:** 🔄 Open **Base:** `main` ← **Head:** `darwin_mem` --- ### 📝 Commits (1) - [`94e62f3`](https://github.com/ollama/ollama/commit/94e62f3d098d35f5424ea33f9074a40808a575ed) darwin: improve free memory reporting ### 📊 Changes **5 files changed** (+85 additions, -12 deletions) <details> <summary>View changed files</summary> 📝 `discover/gpu_info_darwin.m` (+8 -4) 📝 `discover/runner.go` (+0 -5) ➕ `llama/patches/0030-reduce-metal-free-memory-based-on-system-free-memory.patch` (+49 -0) 📝 `llama/patches/0032-interleave-multi-rope.patch` (+2 -2) 📝 `ml/backend/ggml/ggml/src/ggml-metal/ggml-metal-device.m` (+26 -1) </details> ### 📄 Description We have been incorrectly reporting free space which can lead to stability problems if excessive VRAM is allocated. (graphics flickering, windows failing to render properly, which only clear up after a reboot.) Before this change, on a 128G M3 mac I saw log lines like the following: ``` time=2025-10-28T13:30:11.774-07:00 level=INFO source=types.go:40 msg="inference compute" id=0 library=Metal compute=0.0 name=Metal description="Apple M3 Max" libdirs="" driver=0.0 pci_id=00:00.0 type=discrete total="96.0 GiB" available="96.0 GiB" ... time=2025-10-28T13:30:37.564-07:00 level=INFO source=server.go:455 msg="system memory" total="128.0 GiB" free="69.2 GiB" free_swap="0 B" ``` However, other tools show ~77% used memory, and only ~29G free which is more accurate. With this change: ``` time=2025-10-28T13:32:14.566-07:00 level=INFO source=types.go:40 msg="inference compute" id=0 library=Metal compute=0.0 name=Metal description="Apple M3 Max" libdirs="" driver=0.0 pci_id=00:00.0 type=discrete total="96.0 GiB" available="29.0 GiB" ... time=2025-10-28T13:32:28.495-07:00 level=INFO source=server.go:455 msg="system memory" total="128.0 GiB" free="29.1 GiB" free_swap="0 B" ``` Before we would be more aggressive in causing the OS to start swapping, up to a point, but sometimes this causes stability problems. With this change, systems that are low on free space will load fewer layers into GPU, and use more CPU, so it's possible this might have a performance impact in some cases if the swapping would have worked. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-25 00:54:38 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#45210