[PR #15065] [MERGED] mlx: fix KV cache snapshot memory leak #77292

Closed
opened 2026-05-05 09:57:41 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/15065
Author: @jessegross
Created: 3/25/2026
Status: Merged
Merged: 3/26/2026
Merged by: @jessegross

Base: mainHead: jessegross/memleak


📝 Commits (1)

  • d432aab mlx: fix KV cache snapshot memory leak

📊 Changes

3 files changed (+9 additions, -16 deletions)

View changed files

📝 x/mlxrunner/cache/cache.go (+6 -6)
📝 x/mlxrunner/mlx/array.go (+3 -1)
📝 x/mlxrunner/mlx/ops_extra.go (+0 -9)

📄 Description

mlx.Copy shares the backing buffer with its source (via copy_shared_buffer) rather than allocating independent storage. When used to snapshot a slice of the KV cache, the snapshot array holds the entire original cache buffer alive through the shared data pointer — even after eval detaches the computation graph.

Replace Copy with Contiguous in Snapshot and Split. Contiguous allocates a compact buffer when the source buffer is significantly larger than the logical slice (Contiguous::eval checks buffer_size > nbytes + 16384), which is always the case for KV cache slices.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/15065 **Author:** [@jessegross](https://github.com/jessegross) **Created:** 3/25/2026 **Status:** ✅ Merged **Merged:** 3/26/2026 **Merged by:** [@jessegross](https://github.com/jessegross) **Base:** `main` ← **Head:** `jessegross/memleak` --- ### 📝 Commits (1) - [`d432aab`](https://github.com/ollama/ollama/commit/d432aabe6953f7345bbd73f97f1b9f260b6953e1) mlx: fix KV cache snapshot memory leak ### 📊 Changes **3 files changed** (+9 additions, -16 deletions) <details> <summary>View changed files</summary> 📝 `x/mlxrunner/cache/cache.go` (+6 -6) 📝 `x/mlxrunner/mlx/array.go` (+3 -1) 📝 `x/mlxrunner/mlx/ops_extra.go` (+0 -9) </details> ### 📄 Description mlx.Copy shares the backing buffer with its source (via copy_shared_buffer) rather than allocating independent storage. When used to snapshot a slice of the KV cache, the snapshot array holds the entire original cache buffer alive through the shared data pointer — even after eval detaches the computation graph. Replace Copy with Contiguous in Snapshot and Split. Contiguous allocates a compact buffer when the source buffer is significantly larger than the logical slice (Contiguous::eval checks buffer_size > nbytes + 16384), which is always the case for KV cache slices. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-05-05 09:57:41 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#77292