[GH-ISSUE #13261] Make low_vram_mode threshold configurable #34526

Closed
opened 2026-04-22 18:10:46 -05:00 by GiteaMirror · 3 comments
Owner

Originally created by @flrtemis on GitHub (Nov 28, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/13261

What is the issue?

ollama serve drops into low_vram_mode on any machine where the OS reports <20 GiB VRAM, even when the GPU is a high-end 12/16/24 GB card that runs models perfectly. The trigger is a hard-coded literal in server/routes.go:

var lowVRAMThreshold uint64 = 20 * format.GibiByte

Because Windows often reports only ~1518GiB free on a 2432GB card, those users are permanently stuck in low_vram_mode with no escape hatch.

Clarification: the OS isnt stealing VRAM; it reports what remains after display buffers/drivers. The problem is the fixed 20GiB cutoff instead of a configurable value.

Expected behavior

Allow the cutoff to be set via env/config so normal GPU mode stays active unless memory truly falls below the users chosen margin.

Proposed fix

    1. Add var LowVRAMThreshold = Uint64("OLLAMA_LOW_VRAM_THRESHOLD", 20*format.GibiByte) in envconfig/config.go and register it in AsMap().
    2. In server/routes.go, read envconfig.LowVRAMThreshold() instead of the literal.
    3. Document OLLAMA_LOW_VRAM_THRESHOLD (bytes) in the FAQ.

Reference implementation + build steps: https://github.com/flrtemis/ollama (current HEAD 730f0a8). Setting OLLAMA_LOW_VRAM_THRESHOLD=12000000000 keeps a 16 GB RTX 5070 Ti in normal mode.


### Relevant log output

```shell
time=2025-11-27T17:52:11.110-05:00 level=INFO source=routes.go:1545 msg="server config" env="... OLLAMA_LOW_VRAM_THRESHOLD:12000000000 ..."

OS

Windows

GPU

Nvidia

CPU

Intel

Ollama version

0.3.10 (custom build from commit 730f0a8; “ollama --version” currently reports 0.0.0 after the rebuild)

Originally created by @flrtemis on GitHub (Nov 28, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/13261 ### What is the issue? `ollama serve` drops into low_vram_mode on any machine where the OS reports <20 GiB VRAM, even when the GPU is a high-end 12/16/24 GB card that runs models perfectly. The trigger is a hard-coded literal in server/routes.go: ```go var lowVRAMThreshold uint64 = 20 * format.GibiByte Because Windows often reports only ~15–18 GiB “free” on a 24–32 GB card, those users are permanently stuck in low_vram_mode with no escape hatch. Clarification: the OS isn’t “stealing” VRAM; it reports what remains after display buffers/drivers. The problem is the fixed 20 GiB cutoff instead of a configurable value. Expected behavior Allow the cutoff to be set via env/config so normal GPU mode stays active unless memory truly falls below the user’s chosen margin. Proposed fix 1. Add var LowVRAMThreshold = Uint64("OLLAMA_LOW_VRAM_THRESHOLD", 20*format.GibiByte) in envconfig/config.go and register it in AsMap(). 2. In server/routes.go, read envconfig.LowVRAMThreshold() instead of the literal. 3. Document OLLAMA_LOW_VRAM_THRESHOLD (bytes) in the FAQ. Reference implementation + build steps: https://github.com/flrtemis/ollama (current HEAD 730f0a8). Setting OLLAMA_LOW_VRAM_THRESHOLD=12000000000 keeps a 16 GB RTX 5070 Ti in normal mode. ### Relevant log output ```shell time=2025-11-27T17:52:11.110-05:00 level=INFO source=routes.go:1545 msg="server config" env="... OLLAMA_LOW_VRAM_THRESHOLD:12000000000 ..." ``` ### OS Windows ### GPU Nvidia ### CPU Intel ### Ollama version 0.3.10 (custom build from commit 730f0a8; “ollama --version” currently reports 0.0.0 after the rebuild)
GiteaMirror added the bug label 2026-04-22 18:10:46 -05:00
Author
Owner

@rick-github commented on GitHub (Nov 28, 2025):

https://github.com/ollama/ollama/issues/12143#issuecomment-3242462236

<!-- gh-comment-id:3588244189 --> @rick-github commented on GitHub (Nov 28, 2025): https://github.com/ollama/ollama/issues/12143#issuecomment-3242462236
Author
Owner

@flrtemis commented on GitHub (Dec 3, 2025):

@rick-github https://flrtemis.github.io/OllamaSourceCodeRebuilt/3d.html

<!-- gh-comment-id:3605132130 --> @flrtemis commented on GitHub (Dec 3, 2025): @rick-github https://flrtemis.github.io/OllamaSourceCodeRebuilt/3d.html
Author
Owner

@rick-github commented on GitHub (Dec 3, 2025):

You mis-understand the purpose of low VRAM mode. Processing is not slower in low VRAM mode. All that happens is that for a few selected architectures, the default value of num_ctx is a lower value. You can achieve the same result by setting OLLAMA_CONTEXT_LENGTH=8192.

<!-- gh-comment-id:3605960386 --> @rick-github commented on GitHub (Dec 3, 2025): You mis-understand the purpose of low VRAM mode. Processing is not slower in low VRAM mode. All that happens is that for a few [selected architectures](https://github.com/ollama/ollama/blob/cc9555aff0f220748dc761a4302cfaea7c62c9fe/server/routes.go#L146), the default value of `num_ctx` is a lower value. You can achieve the same result by setting `OLLAMA_CONTEXT_LENGTH=8192`.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#34526