[PR #14342] re-add OLLAMA_MAX_VRAM #45878

Open
opened 2026-04-25 01:29:31 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/14342
Author: @machineonamission
Created: 2/21/2026
Status: 🔄 Open

Base: mainHead: main


📝 Commits (5)

  • 6fd5f3e re-add OLLAMA_MAX_VRAM
  • d2abd1c Merge branch 'ollama:main' into main
  • c8edf04 Merge branch 'ollama:main' into main
  • 15f8d50 Merge branch 'umain'
  • 89ba810 Merge branch 'ollama:main' into main

📊 Changes

5 files changed (+11 additions, -4 deletions)

View changed files

📝 cmd/cmd.go (+1 -0)
📝 envconfig/config.go (+2 -0)
📝 llm/server.go (+2 -0)
📝 server/routes.go (+1 -1)
📝 server/sched.go (+5 -3)

📄 Description

OLLAMA_MAX_VRAM is a parameter that once existed to limit the maximum amount of vram it consumes. it was deprecated because "it was a hack" and "GPU_OVERHEAD" should be used instead, but I personally find utility in this (reserving a section of a server VRAM for ollama). Ollama can now, with this param set, do its "auto calculate how much to put into vram" thing, but for any vram size, instead of manually fiddling with num_gpu for every single model. It is per-gpu now, which is documented.

Tested and working on Windows 11


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/14342 **Author:** [@machineonamission](https://github.com/machineonamission) **Created:** 2/21/2026 **Status:** 🔄 Open **Base:** `main` ← **Head:** `main` --- ### 📝 Commits (5) - [`6fd5f3e`](https://github.com/ollama/ollama/commit/6fd5f3e4456d30303e417e8d5b1de15b2366a930) re-add OLLAMA_MAX_VRAM - [`d2abd1c`](https://github.com/ollama/ollama/commit/d2abd1c9f4e1d380e3beb6b773ffe5c1f84ccae2) Merge branch 'ollama:main' into main - [`c8edf04`](https://github.com/ollama/ollama/commit/c8edf04a485ae1f640f406dcd8d51870788ad5c3) Merge branch 'ollama:main' into main - [`15f8d50`](https://github.com/ollama/ollama/commit/15f8d5010ee15b87f67235f4a31bff7eb37d2d87) Merge branch 'umain' - [`89ba810`](https://github.com/ollama/ollama/commit/89ba810624eb6daa8d2caaee2436d74885512638) Merge branch 'ollama:main' into main ### 📊 Changes **5 files changed** (+11 additions, -4 deletions) <details> <summary>View changed files</summary> 📝 `cmd/cmd.go` (+1 -0) 📝 `envconfig/config.go` (+2 -0) 📝 `llm/server.go` (+2 -0) 📝 `server/routes.go` (+1 -1) 📝 `server/sched.go` (+5 -3) </details> ### 📄 Description `OLLAMA_MAX_VRAM` is a parameter that once existed to limit the maximum amount of vram it consumes. it was deprecated because "it was a hack" and "GPU_OVERHEAD" should be used instead, but I personally find utility in this (reserving a section of a server VRAM for ollama). Ollama can now, with this param set, do its "auto calculate how much to put into vram" thing, but for any vram size, instead of manually fiddling with num_gpu for every single model. It is per-gpu now, which is documented. Tested and working on Windows 11 --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-25 01:29:31 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#45878