[GH-ISSUE #9977] How can we prevent creating a new complete model instance in GPU when using different context lengths? #6536

Closed
opened 2026-04-12 18:09:07 -05:00 by GiteaMirror · 1 comment
Owner

Originally created by @jaybom on GitHub (Mar 25, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/9977

What is the issue?

How can we prevent creating a new complete model instance in GPU when using different context lengths?

Relevant log output


OS

Windows

GPU

Nvidia

CPU

No response

Ollama version

0.5.7

Originally created by @jaybom on GitHub (Mar 25, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/9977 ### What is the issue? How can we prevent creating a new complete model instance in GPU when using different context lengths? ### Relevant log output ```shell ``` ### OS Windows ### GPU Nvidia ### CPU _No response_ ### Ollama version 0.5.7
GiteaMirror added the bug label 2026-04-12 18:09:07 -05:00
Author
Owner

@rick-github commented on GitHub (Mar 25, 2025):

Currently, a change in context length requires a model reload. To prevent this, the clients would need to be configured with the same context length. PRs like #8935 or #9978 would offer alternative mechanisms for dealing with this.

<!-- gh-comment-id:2751179863 --> @rick-github commented on GitHub (Mar 25, 2025): Currently, a change in context length requires a model reload. To prevent this, the clients would need to be configured with the same context length. PRs like #8935 or #9978 would offer alternative mechanisms for dealing with this.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#6536