[PR #3241] [MERGED] update memory estimations for gpu offloading #57798

Closed
opened 2026-04-29 12:31:39 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/3241
Author: @mxyng
Created: 3/19/2024
Status: Merged
Merged: 4/1/2024
Merged by: @mxyng

Base: mainHead: mxyng/mem


📝 Commits (1)

  • 91b3e4d update memory calcualtions

📊 Changes

7 files changed (+121 additions, -85 deletions)

View changed files

📝 format/bytes.go (+16 -1)
📝 gpu/gpu.go (+11 -14)
📝 gpu/types.go (+3 -0)
📝 llm/dyn_ext_server.go (+2 -2)
📝 llm/ggml.go (+11 -0)
📝 llm/llm.go (+73 -63)
📝 server/routes.go (+5 -5)

📄 Description

take into account memory footprint of each layer

  1. replace percentage overhead with static overhead of 377 MiB for cuda and rocm
  2. add projector memory footprint to estimation
  3. add layer footprint to estimation on a per-layer basis, including output layers
  4. replace static kv memory footprint with pro-rated footprint based on how many layers to offload
  5. set minimum context length for multimodal models to 2048
  6. report memory requirements as structured logs

🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/3241 **Author:** [@mxyng](https://github.com/mxyng) **Created:** 3/19/2024 **Status:** ✅ Merged **Merged:** 4/1/2024 **Merged by:** [@mxyng](https://github.com/mxyng) **Base:** `main` ← **Head:** `mxyng/mem` --- ### 📝 Commits (1) - [`91b3e4d`](https://github.com/ollama/ollama/commit/91b3e4d282fb8bcda65127b3bedb6f39dd4b7e41) update memory calcualtions ### 📊 Changes **7 files changed** (+121 additions, -85 deletions) <details> <summary>View changed files</summary> 📝 `format/bytes.go` (+16 -1) 📝 `gpu/gpu.go` (+11 -14) 📝 `gpu/types.go` (+3 -0) 📝 `llm/dyn_ext_server.go` (+2 -2) 📝 `llm/ggml.go` (+11 -0) 📝 `llm/llm.go` (+73 -63) 📝 `server/routes.go` (+5 -5) </details> ### 📄 Description take into account memory footprint of each layer 1. replace percentage overhead with static overhead of 377 MiB for cuda and rocm 2. add projector memory footprint to estimation 3. add layer footprint to estimation on a per-layer basis, including output layers 4. replace static kv memory footprint with pro-rated footprint based on how many layers to offload 5. set minimum context length for multimodal models to 2048 6. report memory requirements as structured logs --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-29 12:31:39 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#57798