[PR #2964] [MERGED] Allow setting max vram for workarounds #11015

Closed
opened 2026-04-12 23:18:44 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/2964
Author: @dhiltgen
Created: 3/7/2024
Status: Merged
Merged: 3/7/2024
Merged by: @dhiltgen

Base: mainHead: mem_limit_var


📝 Commits (1)

  • be33017 Allow setting max vram for workarounds

📊 Changes

2 files changed (+24 additions, -0 deletions)

View changed files

📝 gpu/gpu.go (+9 -0)
📝 gpu/gpu_darwin.go (+15 -0)

📄 Description

Until we get all the memory calculations correct, this can provide and escape valve for users to workaround out of memory crashes.

Example usage (windows)

$env:OLLAMA_MAX_VRAM="3221225472"
ollama.exe serve
...
time=2024-03-06T16:52:04.246-08:00 level=INFO source=gpu.go:251 msg="user override OLLAMA_MAX_VRAM=3221225472"
...
llm_load_tensors: offloading 20 repeating layers to GPU
llm_load_tensors: offloaded 20/33 layers to GPU
llm_load_tensors:        CPU buffer size =  3647.87 MiB
llm_load_tensors:      CUDA0 buffer size =  2171.88 MiB

This was llama2 which would have normally fit entirely on my GPU.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/2964 **Author:** [@dhiltgen](https://github.com/dhiltgen) **Created:** 3/7/2024 **Status:** ✅ Merged **Merged:** 3/7/2024 **Merged by:** [@dhiltgen](https://github.com/dhiltgen) **Base:** `main` ← **Head:** `mem_limit_var` --- ### 📝 Commits (1) - [`be33017`](https://github.com/ollama/ollama/commit/be330174dd712353103887d16e9ec321f52422c0) Allow setting max vram for workarounds ### 📊 Changes **2 files changed** (+24 additions, -0 deletions) <details> <summary>View changed files</summary> 📝 `gpu/gpu.go` (+9 -0) 📝 `gpu/gpu_darwin.go` (+15 -0) </details> ### 📄 Description Until we get all the memory calculations correct, this can provide and escape valve for users to workaround out of memory crashes. Example usage (windows) ```powershell $env:OLLAMA_MAX_VRAM="3221225472" ollama.exe serve ... time=2024-03-06T16:52:04.246-08:00 level=INFO source=gpu.go:251 msg="user override OLLAMA_MAX_VRAM=3221225472" ... llm_load_tensors: offloading 20 repeating layers to GPU llm_load_tensors: offloaded 20/33 layers to GPU llm_load_tensors: CPU buffer size = 3647.87 MiB llm_load_tensors: CUDA0 buffer size = 2171.88 MiB ``` This was llama2 which would have normally fit entirely on my GPU. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-12 23:18:44 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#11015