[PR #5922] [MERGED] Introduce GPU Overhead env var #11964

Closed
opened 2026-04-12 23:44:16 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/5922
Author: @dhiltgen
Created: 7/24/2024
Status: Merged
Merged: 9/5/2024
Merged by: @dhiltgen

Base: mainHead: gpu_overhead


📝 Commits (1)

  • cde1aa9 Introduce GPU Overhead env var

📊 Changes

3 files changed (+28 additions, -3 deletions)

View changed files

📝 cmd/cmd.go (+1 -0)
📝 envconfig/config.go (+20 -0)
📝 llm/memory.go (+7 -3)

📄 Description

Provide a mechanism for users to set aside an amount of VRAM on each GPU to make room for other applications they want to start after Ollama, or workaround memory prediction bugs

Concurrency and multi-GPU support made the older OLLAMA_MAX_VRAM setting untenable (many users have multi-GPU setups with different sizes of VRAM) so this was removed in #5855 This new setting uses an "overhead" value so we can subtract that from the reported Free space on each GPU for more reasonable/consistent results.

      OLLAMA_GPU_OVERHEAD        Reserve a portion of VRAM per GPU (bytes)

🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/5922 **Author:** [@dhiltgen](https://github.com/dhiltgen) **Created:** 7/24/2024 **Status:** ✅ Merged **Merged:** 9/5/2024 **Merged by:** [@dhiltgen](https://github.com/dhiltgen) **Base:** `main` ← **Head:** `gpu_overhead` --- ### 📝 Commits (1) - [`cde1aa9`](https://github.com/ollama/ollama/commit/cde1aa9e2fb13358aa991d599b22f841e6ca932b) Introduce GPU Overhead env var ### 📊 Changes **3 files changed** (+28 additions, -3 deletions) <details> <summary>View changed files</summary> 📝 `cmd/cmd.go` (+1 -0) 📝 `envconfig/config.go` (+20 -0) 📝 `llm/memory.go` (+7 -3) </details> ### 📄 Description Provide a mechanism for users to set aside an amount of VRAM on each GPU to make room for other applications they want to start after Ollama, or workaround memory prediction bugs Concurrency and multi-GPU support made the older `OLLAMA_MAX_VRAM` setting untenable (many users have multi-GPU setups with different sizes of VRAM) so this was removed in #5855 This new setting uses an "overhead" value so we can subtract that from the reported Free space on each GPU for more reasonable/consistent results. ``` OLLAMA_GPU_OVERHEAD Reserve a portion of VRAM per GPU (bytes) ``` --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-12 23:44:16 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#11964