[GH-ISSUE #12463] Can Ollama support a 256K context length? #8279

Closed
opened 2026-04-12 20:49:31 -05:00 by GiteaMirror · 7 comments
Owner

Originally created by @wwshs on GitHub (Oct 1, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/12463

Originally assigned to: @jmorganca on GitHub.

Some large models (e.g., Qwen3) support a maximum text length of 256K (262,144 tokens), but the context length setting in Ollama's settings page only allows up to 128K.
Can Ollama support a 256K context length? If so, how can it be configured?

Originally created by @wwshs on GitHub (Oct 1, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/12463 Originally assigned to: @jmorganca on GitHub. Some large models (e.g., Qwen3) support a maximum text length of 256K (262,144 tokens), but the context length setting in Ollama's settings page only allows up to 128K. Can Ollama support a 256K context length? If so, how can it be configured?
GiteaMirror added the appfeature request labels 2026-04-12 20:49:31 -05:00
Author
Owner

@pinghe commented on GitHub (Oct 1, 2025):

The parameter num_ctx is configured in the interface.

<!-- gh-comment-id:3355555117 --> @pinghe commented on GitHub (Oct 1, 2025): The parameter num_ctx is configured in the interface.
Author
Owner

@wwshs commented on GitHub (Oct 1, 2025):

The parameter num_ctx is configured in the interface.

Could you please explain a bit more in detail? Where exactly do I set the parameter num_ctx?

<!-- gh-comment-id:3355811151 --> @wwshs commented on GitHub (Oct 1, 2025): > The parameter num_ctx is configured in the interface. Could you please explain a bit more in detail? Where exactly do I set the parameter num_ctx?
Author
Owner

@dan-and commented on GitHub (Oct 1, 2025):

Either use the Modelfile solution: read the documentation on how to tune your model by using Modelfile ( https://ollama.readthedocs.io/en/modelfile/#table-of-contents ) and have an eye on "num_ctx" in the documenation

or do it interactive with:

ollama run "modelname"
inside the chat window:
/set parameter num_ctx VALUE
/save "new modelname"

<!-- gh-comment-id:3357367978 --> @dan-and commented on GitHub (Oct 1, 2025): Either use the Modelfile solution: read the documentation on how to tune your model by using Modelfile ( https://ollama.readthedocs.io/en/modelfile/#table-of-contents ) and have an eye on "num_ctx" in the documenation or do it interactive with: ollama run "modelname" inside the chat window: /set parameter num_ctx VALUE /save "new modelname"
Author
Owner

@rick-github commented on GitHub (Oct 1, 2025):

I believe the OP is talking about the context length slider in the Windows/MacOS app.

<!-- gh-comment-id:3357882115 --> @rick-github commented on GitHub (Oct 1, 2025): I believe the OP is talking about the context length slider in the Windows/MacOS app.
Author
Owner

@wwshs commented on GitHub (Oct 2, 2025):

Either use the Modelfile solution: read the documentation on how to tune your model by using Modelfile ( https://ollama.readthedocs.io/en/modelfile/#table-of-contents ) and have an eye on "num_ctx" in the documenation

or do it interactive with:

ollama run "modelname" inside the chat window: /set parameter num_ctx VALUE /save "new modelname"

Yes, what I meant by "the context length setting in Ollama's settings page only allows up to 128K" specifically refers to "the context length slider in the Windows/MacOS app." I'm actually using the Windows desktop application interface.

Following the suggestions from pinghe and dan-and, I set the num_ctx parameter directly within my conversations. Specifically, I configured the "num_ctx (Ollama)" parameter on the conversation page in Open WebUI.

Through testing, I've confirmed that when the value of num_ctx set in the conversation differs from the context length configured in the Ollama desktop app's settings page on Windows, the num_ctx value takes precedence. In other words, this method indeed allows temporarily overriding the context length for a given conversation—as long as it doesn't exceed the maximum context length supported by the model itself.

However, I've encountered an issue:

Whenever the num_ctx value set in the conversation differs from the context length specified in Ollama’s Windows app settings, the model gets reloaded after each conversation completes (i.e., once the model finishes generating the full response). During the conversation, the model loads using the num_ctx value. But immediately after the conversation ends, the model is unloaded from Ollama and then reloaded again using the context length defined in the Ollama desktop app’s settings. Then, when I start a new conversation, the model is unloaded once more and reloaded again using the num_ctx value specified in that new conversation.

As a result, the model undergoes a full unload-and-reload cycle both at the start and at the end of every conversation, which significantly slows down the process.

I’m wondering if there’s any way to resolve this issue through user-side configuration.

Alternatively, it might still be very helpful if the developers could increase the maximum allowed value for the context length slider in Ollama’s settings page—perhaps setting it to the highest context length supported by any model in Ollama’s ecosystem.

<!-- gh-comment-id:3359059464 --> @wwshs commented on GitHub (Oct 2, 2025): > Either use the Modelfile solution: read the documentation on how to tune your model by using Modelfile ( https://ollama.readthedocs.io/en/modelfile/#table-of-contents ) and have an eye on "num_ctx" in the documenation > > or do it interactive with: > > ollama run "modelname" inside the chat window: /set parameter num_ctx VALUE /save "new modelname" Yes, what I meant by "the context length setting in Ollama's settings page only allows up to 128K" specifically refers to "the context length slider in the Windows/MacOS app." I'm actually using the Windows desktop application interface. Following the suggestions from pinghe and dan-and, I set the num_ctx parameter directly within my conversations. Specifically, I configured the "num_ctx (Ollama)" parameter on the conversation page in Open WebUI. Through testing, I've confirmed that when the value of num_ctx set in the conversation differs from the context length configured in the Ollama desktop app's settings page on Windows, the num_ctx value takes precedence. In other words, this method indeed allows temporarily overriding the context length for a given conversation—as long as it doesn't exceed the maximum context length supported by the model itself. However, I've encountered an issue: Whenever the num_ctx value set in the conversation differs from the context length specified in Ollama’s Windows app settings, the model gets reloaded after each conversation completes (i.e., once the model finishes generating the full response). During the conversation, the model loads using the num_ctx value. But immediately after the conversation ends, the model is unloaded from Ollama and then reloaded again using the context length defined in the Ollama desktop app’s settings. Then, when I start a new conversation, the model is unloaded once more and reloaded again using the num_ctx value specified in that new conversation. As a result, the model undergoes a full unload-and-reload cycle both at the start and at the end of every conversation, which significantly slows down the process. I’m wondering if there’s any way to resolve this issue through user-side configuration. Alternatively, it might still be very helpful if the developers could increase the maximum allowed value for the context length slider in Ollama’s settings page—perhaps setting it to the highest context length supported by any model in Ollama’s ecosystem.
Author
Owner

@pdevine commented on GitHub (Oct 2, 2025):

@jmorganca I think you may have already fixed this.

<!-- gh-comment-id:3362492463 --> @pdevine commented on GitHub (Oct 2, 2025): @jmorganca I think you may have already fixed this.
Author
Owner

@hoyyeva commented on GitHub (Oct 20, 2025):

Closing this issue as it is fixed

<!-- gh-comment-id:3423741133 --> @hoyyeva commented on GitHub (Oct 20, 2025): Closing this issue as it is fixed
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#8279