[GH-ISSUE #9749] Feature Request: Enforce Fixed num_ctx Parameter to Optimize Model Loading Performance #68426

Closed
opened 2026-05-04 13:55:32 -05:00 by GiteaMirror · 1 comment
Owner

Originally created by @mlibre on GitHub (Mar 14, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/9749

Hey,

Currently, while the OLLAMA_CONTEXT_LENGTH environment variable sets a default context window size, clients can override this by specifying the num_ctx parameter in API requests. This flexibility can lead to performance issues, especially when Ollama is used as a code assistance server. Different extensions specifying varying num_ctx values cause Ollama to unload and reload the same model with different context sizes, resulting in significant delays and a sluggish user experience.

I am proposing the addition of a new environment variable like OLLAMA_FIXED_CONTEXT_LENGTH, to ignore num_ctx value across all client requests

Originally created by @mlibre on GitHub (Mar 14, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/9749 Hey, Currently, while the `OLLAMA_CONTEXT_LENGTH` environment variable sets a default context window size, clients can override this by specifying the `num_ctx` parameter in API requests. This flexibility can lead to performance issues, especially when Ollama is used as a code assistance server. Different extensions specifying varying `num_ctx` values cause Ollama to unload and reload the same model with different context sizes, resulting in significant delays and a sluggish user experience. I am proposing the addition of a new environment variable like `OLLAMA_FIXED_CONTEXT_LENGTH`, to ignore `num_ctx` value across all client requests
GiteaMirror added the feature request label 2026-05-04 13:55:33 -05:00
Author
Owner

@nickthecook commented on GitHub (May 28, 2025):

This would be useful.

The Ollama server knows how much VRAM is available. Every client app shouldn't need to. This should be set and enforced in Ollama.

<!-- gh-comment-id:2917135769 --> @nickthecook commented on GitHub (May 28, 2025): This would be useful. The Ollama server knows how much VRAM is available. Every client app shouldn't need to. This should be set and enforced in Ollama.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#68426