[PR #10226] [MERGED] ggml: Fix memory leak on input tensors #23725

Closed
opened 2026-04-19 17:10:22 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/10226
Author: @jessegross
Created: 4/10/2025
Status: Merged
Merged: 4/11/2025
Merged by: @jessegross

Base: mainHead: jessegross/leak


📝 Commits (3)

  • aab6020 ggml: Use pointer receivers for Context
  • a448106 ggml: Don't allocate CPU buffers as CUDA Host buffers
  • 62a5458 ggml: Fix memory leak on input tensors

📊 Changes

1 file changed (+35 additions, -26 deletions)

View changed files

📝 ml/backend/ggml/ggml.go (+35 -26)

📄 Description

For every forward pass through the model, we need to allocate input tensors: tokens, images, positions, outputs and masks. These get allocated in system memory.

However, when we close the context that the tensors were allocated through, the metadata gets freed but the actual backend memory does not. This results in a significant memory leak.

This makes it so that all the memory allocated through a context gets freed when it is closed.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/10226 **Author:** [@jessegross](https://github.com/jessegross) **Created:** 4/10/2025 **Status:** ✅ Merged **Merged:** 4/11/2025 **Merged by:** [@jessegross](https://github.com/jessegross) **Base:** `main` ← **Head:** `jessegross/leak` --- ### 📝 Commits (3) - [`aab6020`](https://github.com/ollama/ollama/commit/aab60209bfc9434da19ddbd03236e487cf789d76) ggml: Use pointer receivers for Context - [`a448106`](https://github.com/ollama/ollama/commit/a448106b57e617c0d6ebc9cc23203ead6a6058b5) ggml: Don't allocate CPU buffers as CUDA Host buffers - [`62a5458`](https://github.com/ollama/ollama/commit/62a5458582a9e835dc6d6c83869cfa99b2f7fd21) ggml: Fix memory leak on input tensors ### 📊 Changes **1 file changed** (+35 additions, -26 deletions) <details> <summary>View changed files</summary> 📝 `ml/backend/ggml/ggml.go` (+35 -26) </details> ### 📄 Description For every forward pass through the model, we need to allocate input tensors: tokens, images, positions, outputs and masks. These get allocated in system memory. However, when we close the context that the tensors were allocated through, the metadata gets freed but the actual backend memory does not. This results in a significant memory leak. This makes it so that all the memory allocated through a context gets freed when it is closed. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-19 17:10:22 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#23725