feat: Max Context Length Mode #5640

New Issue

GiteaMirror · 2025-11-11T16:26:58-06:00

GiteaMirror commented

2025-11-11 16:26:58 -06:00

Originally created by @Notbici on GitHub (Jun 25, 2025).

Check Existing Issues

I have searched the existing issues and discussions.

Problem Description

When I download a model, its always set to 2048 or something close to that in num_ctx.

I've got to go in and change it, I know majority of the userbase are on consumer GPUs so this is a good compromise for them, I don't know where to stop this behavior.

The other problem is if you set num_ctx PER chat, then the second message triggers the chat window title generation which uses a different context length, this causes the model to unload making the 3rd message slow before it stays. Could be a separate bug.

Desired Solution you'd like

Have a mode, like a admin setting or mode that simply always uses max context length.

With a large machine with a lot of vram, there's rarely a model that I have problems loading, its more the contrary where I might turn it down on specific models.

Alternatives Considered

just editing every model I have in the Admin -> Models tab and setting num_ctx there.

Additional Context

No response

Originally created by @Notbici on GitHub (Jun 25, 2025). ### Check Existing Issues - [x] I have searched the existing issues and discussions. ### Problem Description When I download a model, its always set to 2048 or something close to that in num_ctx. I've got to go in and change it, I know majority of the userbase are on consumer GPUs so this is a good compromise for them, I don't know where to stop this behavior. The other problem is if you set num_ctx PER chat, then the second message triggers the chat window title generation which uses a different context length, this causes the model to unload making the 3rd message slow before it stays. Could be a separate bug. ### Desired Solution you'd like Have a mode, like a admin setting or mode that simply always uses max context length. With a large machine with a lot of vram, there's rarely a model that I have problems loading, its more the contrary where I might turn it down on specific models. ### Alternatives Considered just editing every model I have in the Admin -> Models tab and setting num_ctx there. ### Additional Context _No response_

GiteaMirror closed this issue

2025-11-11 16:26:58 -06:00

GiteaMirror commented

2025-11-11 16:26:59 -06:00

@tjbck commented on GitHub (Jun 26, 2025):

Like you mentioned you can configure from the model editor level or from Ollama.

@tjbck commented on GitHub (Jun 26, 2025): Like you mentioned you can configure from the model editor level or from Ollama.

GiteaMirror referenced this issue

2025-11-11 17:58:36 -06:00

[PR #5640] [MERGED] refac: messages render optimisation #8519

Sign in to join this conversation.