[PR #11090] [MERGED] New Memory Management #23977

Closed
opened 2026-04-19 17:18:58 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/11090
Author: @jessegross
Created: 6/16/2025
Status: Merged
Merged: 8/14/2025
Merged by: @jessegross

Base: mainHead: jessegross/memory


📝 Commits (1)

  • 76ed257 llm: New memory management

📊 Changes

26 files changed (+1855 additions, -895 deletions)

View changed files

📝 discover/amd_linux.go (+34 -31)
📝 envconfig/config.go (+3 -0)
📝 fs/ggml/ggml.go (+2 -0)
📝 llama/llama.go (+16 -0)
📝 llama/patches/0016-add-C-API-for-mtmd_input_text.patch (+0 -0)
llama/patches/0016-temporary-prevent-rocm-cuda-mixed-loading.patch (+0 -32)
📝 llama/patches/0017-no-power-throttling-win32-with-gnuc.patch (+0 -0)
📝 llama/patches/0018-BF16-macos-version-guard.patch (+0 -0)
📝 llama/patches/0019-Enable-CUDA-Graphs-for-gemma3n.patch (+0 -0)
📝 llama/patches/0020-Disable-ggml-blas-on-macos-v13-and-older.patch (+0 -0)
📝 llama/patches/0021-fix-mtmd-audio.cpp-build-on-windows.patch (+0 -0)
📝 llama/patches/0022-ggml-No-alloc-mode.patch (+0 -0)
📝 llm/memory.go (+74 -20)
📝 llm/memory_test.go (+5 -5)
📝 llm/server.go (+868 -179)
📝 llm/server_test.go (+169 -0)
📝 ml/backend.go (+154 -8)
📝 ml/backend/ggml/ggml.go (+121 -103)
📝 ml/backend/ggml/ggml/src/ggml-backend-reg.cpp (+2 -10)
📝 runner/llamarunner/runner.go (+86 -60)

...and 6 more files

📄 Description

This changes the memory allocation strategy from upfront estimation to tracking actual allocations done by the engine and reacting to that. The goal is avoid issues caused by both under-estimation (crashing) and over-estimation (low performance due to under-utilized GPUs).

It is currently opt-in and can be enabled for models running on the Ollama engine by setting OLLAMA_NEW_ESTIMATES=1. Behavior in other cases is unchanged and will continue to use the existing estimates.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/11090 **Author:** [@jessegross](https://github.com/jessegross) **Created:** 6/16/2025 **Status:** ✅ Merged **Merged:** 8/14/2025 **Merged by:** [@jessegross](https://github.com/jessegross) **Base:** `main` ← **Head:** `jessegross/memory` --- ### 📝 Commits (1) - [`76ed257`](https://github.com/ollama/ollama/commit/76ed257c0c4c3d2a622b50b4db020c4ff0ba4640) llm: New memory management ### 📊 Changes **26 files changed** (+1855 additions, -895 deletions) <details> <summary>View changed files</summary> 📝 `discover/amd_linux.go` (+34 -31) 📝 `envconfig/config.go` (+3 -0) 📝 `fs/ggml/ggml.go` (+2 -0) 📝 `llama/llama.go` (+16 -0) 📝 `llama/patches/0016-add-C-API-for-mtmd_input_text.patch` (+0 -0) ➖ `llama/patches/0016-temporary-prevent-rocm-cuda-mixed-loading.patch` (+0 -32) 📝 `llama/patches/0017-no-power-throttling-win32-with-gnuc.patch` (+0 -0) 📝 `llama/patches/0018-BF16-macos-version-guard.patch` (+0 -0) 📝 `llama/patches/0019-Enable-CUDA-Graphs-for-gemma3n.patch` (+0 -0) 📝 `llama/patches/0020-Disable-ggml-blas-on-macos-v13-and-older.patch` (+0 -0) 📝 `llama/patches/0021-fix-mtmd-audio.cpp-build-on-windows.patch` (+0 -0) 📝 `llama/patches/0022-ggml-No-alloc-mode.patch` (+0 -0) 📝 `llm/memory.go` (+74 -20) 📝 `llm/memory_test.go` (+5 -5) 📝 `llm/server.go` (+868 -179) 📝 `llm/server_test.go` (+169 -0) 📝 `ml/backend.go` (+154 -8) 📝 `ml/backend/ggml/ggml.go` (+121 -103) 📝 `ml/backend/ggml/ggml/src/ggml-backend-reg.cpp` (+2 -10) 📝 `runner/llamarunner/runner.go` (+86 -60) _...and 6 more files_ </details> ### 📄 Description This changes the memory allocation strategy from upfront estimation to tracking actual allocations done by the engine and reacting to that. The goal is avoid issues caused by both under-estimation (crashing) and over-estimation (low performance due to under-utilized GPUs). It is currently opt-in and can be enabled for models running on the Ollama engine by setting OLLAMA_NEW_ESTIMATES=1. Behavior in other cases is unchanged and will continue to use the existing estimates. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-19 17:18:58 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#23977