[PR #10291] [MERGED] ggml: Free ggml_backend_buffer_t when releasing buffer #13200

Closed
opened 2026-04-13 00:20:39 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/10291
Author: @jessegross
Created: 4/15/2025
Status: Merged
Merged: 4/15/2025
Merged by: @jessegross

Base: mainHead: jessegross/leak


📝 Commits (1)

  • c486b2c ggml: Free ggml_backend_buffer_t when releasing buffer

📊 Changes

6 files changed (+224 additions, -53 deletions)

View changed files

llama/patches/0001-cuda.patch (+0 -47)
llama/patches/0001-ggml-backend-malloc-and-free-using-the-same-compiler.patch (+210 -0)
📝 llama/patches/0006-conditional-fattn.patch (+2 -2)
📝 llama/patches/0008-add-unpad-operator.patch (+3 -3)
📝 ml/backend/ggml/ggml/src/ggml-backend.cpp (+7 -1)
📝 ml/backend/ggml/ggml/src/ggml-cuda/ggml-cuda.cu (+2 -0)

📄 Description

When ggml_backend_buffer_free() is called, the device memory is released but not all backends consistently release the actual ggml_backend_buffer_t in system RAM, causing a memory leak.

Bug #10040


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/10291 **Author:** [@jessegross](https://github.com/jessegross) **Created:** 4/15/2025 **Status:** ✅ Merged **Merged:** 4/15/2025 **Merged by:** [@jessegross](https://github.com/jessegross) **Base:** `main` ← **Head:** `jessegross/leak` --- ### 📝 Commits (1) - [`c486b2c`](https://github.com/ollama/ollama/commit/c486b2c0206ddf031f3f55d6a11abe76bfa56f28) ggml: Free ggml_backend_buffer_t when releasing buffer ### 📊 Changes **6 files changed** (+224 additions, -53 deletions) <details> <summary>View changed files</summary> ➖ `llama/patches/0001-cuda.patch` (+0 -47) ➕ `llama/patches/0001-ggml-backend-malloc-and-free-using-the-same-compiler.patch` (+210 -0) 📝 `llama/patches/0006-conditional-fattn.patch` (+2 -2) 📝 `llama/patches/0008-add-unpad-operator.patch` (+3 -3) 📝 `ml/backend/ggml/ggml/src/ggml-backend.cpp` (+7 -1) 📝 `ml/backend/ggml/ggml/src/ggml-cuda/ggml-cuda.cu` (+2 -0) </details> ### 📄 Description When ggml_backend_buffer_free() is called, the device memory is released but not all backends consistently release the actual ggml_backend_buffer_t in system RAM, causing a memory leak. Bug #10040 --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-13 00:20:39 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#13200