[PR #14743] [CLOSED] server: use conservative VRAM-based default context lengths #40689

Closed
opened 2026-04-23 01:32:42 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/14743
Author: @dhiltgen
Created: 3/9/2026
Status: Closed

Base: mainHead: perf_2


📝 Commits (1)

  • 11185fa server: use conservative VRAM-based default context lengths

📊 Changes

3 files changed (+17 additions, -6 deletions)

View changed files

📝 envconfig/config.go (+1 -1)
📝 server/routes.go (+6 -3)
📝 server/routes_options_test.go (+10 -2)

📄 Description

The previous tiered defaults (4k/32k/256k) were too aggressive for large models, causing CPU offload and severe performance degradation.

Tested worst-case models at each VRAM tier to find spill points:

~24 GiB VRAM: qwq 32B spills at 16k → default 8k
51.8 GiB VRAM (64GB Mac): llama3.1:70b spills at 32k → default 16k
96.0 GiB VRAM (128GB Mac): deepseek-r1:70b spills at 128k → default 64k

New tiers: 4k / 8k / 16k / 64k (was 4k / 32k / 256k)


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/14743 **Author:** [@dhiltgen](https://github.com/dhiltgen) **Created:** 3/9/2026 **Status:** ❌ Closed **Base:** `main` ← **Head:** `perf_2` --- ### 📝 Commits (1) - [`11185fa`](https://github.com/ollama/ollama/commit/11185fa91dadae3269fc78e91dc18117cea4ef71) server: use conservative VRAM-based default context lengths ### 📊 Changes **3 files changed** (+17 additions, -6 deletions) <details> <summary>View changed files</summary> 📝 `envconfig/config.go` (+1 -1) 📝 `server/routes.go` (+6 -3) 📝 `server/routes_options_test.go` (+10 -2) </details> ### 📄 Description The previous tiered defaults (4k/32k/256k) were too aggressive for large models, causing CPU offload and severe performance degradation. Tested worst-case models at each VRAM tier to find spill points: ~24 GiB VRAM: qwq 32B spills at 16k → default 8k 51.8 GiB VRAM (64GB Mac): llama3.1:70b spills at 32k → default 16k 96.0 GiB VRAM (128GB Mac): deepseek-r1:70b spills at 128k → default 64k New tiers: 4k / 8k / 16k / 64k (was 4k / 32k / 256k) --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-23 01:32:42 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#40689