[PR #15044] [MERGED] launch: warning when server context length is below 64k for local models #20257

Closed
opened 2026-04-16 07:31:37 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/15044
Author: @hoyyeva
Created: 3/24/2026
Status: Merged
Merged: 3/27/2026
Merged by: @ParthSareen

Base: mainHead: hoyyeva/local-model-server-context-length-warning


📝 Commits (10+)

📊 Changes

6 files changed (+377 additions, -1 deletions)

View changed files

📝 api/types.go (+2 -1)
📝 cmd/launch/launch.go (+8 -0)
📝 cmd/launch/launch_test.go (+231 -0)
📝 cmd/launch/models.go (+72 -0)
📝 server/routes.go (+8 -0)
📝 server/routes_cloud_test.go (+56 -0)

📄 Description

  • Add context_length field to /api/status response, reflecting the server's effective default context length (OLLAMA_CONTEXT_LENGTH env var, falling back to the VRAM-based default)
  • After model selection in ollama launch, warn the user if the effective context length is below 64k tokens for any selected local model
  • Cloud-only model selections skip the check since they don't use the local server's context window
  • Also checks per-model num_ctx overrides from Modelfiles — if a model's Modelfile sets a low num_ctx, the warning fires even when the server default is sufficient, with guidance to use an official model instead

Example warnings

  • Server context too low:
Warning: context window is 4096 tokens (recommended: 64000+)
Increase it in Ollama App Settings or with OLLAMA_CONTEXT_LENGTH=64000 ollama
serve
  • Modelfile num_ctx override too low:
Warning: context window is 4096 tokens (recommended: 64000+)
Consider using an official model and increase the context length to 64000 in
Ollama App Settings.

🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/15044 **Author:** [@hoyyeva](https://github.com/hoyyeva) **Created:** 3/24/2026 **Status:** ✅ Merged **Merged:** 3/27/2026 **Merged by:** [@ParthSareen](https://github.com/ParthSareen) **Base:** `main` ← **Head:** `hoyyeva/local-model-server-context-length-warning` --- ### 📝 Commits (10+) - [`9c180db`](https://github.com/ollama/ollama/commit/9c180dbf0d840831c5b11063b41c02a15ca06cf6) launch: warning when server context length is below 64k for local models - [`d1a35a2`](https://github.com/ollama/ollama/commit/d1a35a2b129f3eb02450991356b064afeace58da) clean up - [`937bed7`](https://github.com/ollama/ollama/commit/937bed749c44d61e58dc5251c717d493729ecb46) Update cmd/launch/models.go - [`6ea62bd`](https://github.com/ollama/ollama/commit/6ea62bdd29e854d5df9a4b13d2e57e3572ace6c3) Update cmd/launch/models.go - [`e9a2309`](https://github.com/ollama/ollama/commit/e9a23093c87e1e1e272a0d95e798133fa6d5e123) address comments - [`2ee2f32`](https://github.com/ollama/ollama/commit/2ee2f32dda5ef93d99111ec4871deb0afe084c79) warning based on different os - [`ba87039`](https://github.com/ollama/ollama/commit/ba87039e9961bae601f537736eee68f83b0bc1cb) fix test - [`8e34568`](https://github.com/ollama/ollama/commit/8e34568c766da2edf4b8f20128cca3b20442a7e3) address comment - [`f220895`](https://github.com/ollama/ollama/commit/f2208956c7e168afb52ab990fa70b3cb215d20eb) address comments - [`d2da4d1`](https://github.com/ollama/ollama/commit/d2da4d142793fc253a864443514404a904729ae3) update the warning to be yellow ### 📊 Changes **6 files changed** (+377 additions, -1 deletions) <details> <summary>View changed files</summary> 📝 `api/types.go` (+2 -1) 📝 `cmd/launch/launch.go` (+8 -0) 📝 `cmd/launch/launch_test.go` (+231 -0) 📝 `cmd/launch/models.go` (+72 -0) 📝 `server/routes.go` (+8 -0) 📝 `server/routes_cloud_test.go` (+56 -0) </details> ### 📄 Description - Add context_length field to `/api/status` response, reflecting the server's effective default context length (OLLAMA_CONTEXT_LENGTH env var, falling back to the VRAM-based default) - After model selection in ollama launch, warn the user if the effective context length is below 64k tokens for any selected local model - Cloud-only model selections skip the check since they don't use the local server's context window - Also checks per-model num_ctx overrides from Modelfiles — if a model's Modelfile sets a low num_ctx, the warning fires even when the server default is sufficient, with guidance to use an official model instead **Example warnings** * Server context too low: ``` Warning: context window is 4096 tokens (recommended: 64000+) Increase it in Ollama App Settings or with OLLAMA_CONTEXT_LENGTH=64000 ollama serve ``` * Modelfile num_ctx override too low: ``` Warning: context window is 4096 tokens (recommended: 64000+) Consider using an official model and increase the context length to 64000 in Ollama App Settings. ``` --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-16 07:31:37 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#20257