[GH-ISSUE #3643] how to change the max input token length when I run ‘’ollama run gemma:7b-instruct-v1.1-fp16‘’ #2246

Closed
opened 2026-04-12 12:31:14 -05:00 by GiteaMirror · 8 comments
Owner

Originally created by @dh12306 on GitHub (Apr 15, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/3643

the default input token lens is 2048 ? how can I change it because the gemma can support more input tokens

Originally created by @dh12306 on GitHub (Apr 15, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/3643 the default input token lens is 2048 ? how can I change it because the gemma can support more input tokens
Author
Owner

@dims commented on GitHub (Apr 15, 2024):

tip from @jmorganca here - https://github.com/ollama/ollama/issues/3644#issuecomment-2057646417

<!-- gh-comment-id:2057649187 --> @dims commented on GitHub (Apr 15, 2024): tip from @jmorganca here - https://github.com/ollama/ollama/issues/3644#issuecomment-2057646417
Author
Owner

@jmorganca commented on GitHub (Apr 17, 2024):

Hi all, to change the max token length you can use /set parameter num_ctx <context size> e.g. 4096, 8192 or more

Hope this helps!

<!-- gh-comment-id:2060137664 --> @jmorganca commented on GitHub (Apr 17, 2024): Hi all, to change the max token length you can use `/set parameter num_ctx <context size>` e.g. `4096`, `8192` or more Hope this helps!
Author
Owner

@0sengseng0 commented on GitHub (Apr 18, 2024):

Hi all, to change the max token length you can use /set parameter num_ctx <context size> e.g. 4096, 8192 or more大家好,要更改最大令牌长度,您可以使用/set parameter num_ctx <context size>例如40968192或更多

Hope this helps! 希望这会有所帮助!

As I increased the token size, it seemed like every time I asked a question, the model would restart before answering。 /set parameter num_ctx 40960

<!-- gh-comment-id:2063563014 --> @0sengseng0 commented on GitHub (Apr 18, 2024): > Hi all, to change the max token length you can use `/set parameter num_ctx <context size>` e.g. `4096`, `8192` or more大家好,要更改最大令牌长度,您可以使用`/set parameter num_ctx <context size>`例如`4096`,`8192`或更多 > > Hope this helps! 希望这会有所帮助! As I increased the token size, it seemed like every time I asked a question, the model would restart before answering。 /set parameter num_ctx 40960
Author
Owner

@BoyuanGao commented on GitHub (Aug 1, 2024):

Hi all, to change the max token length you can use /set parameter num_ctx <context size> e.g. 4096, 8192 or more

Hope this helps!

Hi @jmorganca Would you please tell me how to execute this on windows?

<!-- gh-comment-id:2261810477 --> @BoyuanGao commented on GitHub (Aug 1, 2024): > Hi all, to change the max token length you can use `/set parameter num_ctx <context size>` e.g. `4096`, `8192` or more > > Hope this helps! Hi @jmorganca Would you please tell me how to execute this on windows?
Author
Owner

@vish01 commented on GitHub (Aug 28, 2024):

If you're running this using the Ollama JS package, you can pass it in the options property:

await ollama.generate({
  model: "llama3.1",
  prompt: finalPrompt,
  stream: true,
  options: {
    num_ctx: xx
  },
})
<!-- gh-comment-id:2313875078 --> @vish01 commented on GitHub (Aug 28, 2024): If you're running this using the Ollama JS package, you can pass it in the `options` property: ``` await ollama.generate({ model: "llama3.1", prompt: finalPrompt, stream: true, options: { num_ctx: xx }, }) ```
Author
Owner

@vitustockholm commented on GitHub (Aug 30, 2024):

Hi all, to change the max token length you can use /set parameter num_ctx <context size> e.g. 4096, 8192 or more
Hope this helps!

Hi @jmorganca Would you please tell me how to execute this on windows?

use webui to change this parameter

<!-- gh-comment-id:2322122472 --> @vitustockholm commented on GitHub (Aug 30, 2024): > > Hi all, to change the max token length you can use `/set parameter num_ctx <context size>` e.g. `4096`, `8192` or more > > Hope this helps! > > Hi @jmorganca Would you please tell me how to execute this on windows? use webui to change this parameter
Author
Owner

@Arslan-Mehmood1 commented on GitHub (Dec 2, 2024):

as I faced this issue while using ollama with langchain, I checked the api reference for loading model and found then I can set parameter 'num_ctx' for context window length.

https://python.langchain.com/api_reference/ollama/llms/langchain_ollama.llms.OllamaLLM.html#langchain_ollama.llms.OllamaLLM.num_ctx

<!-- gh-comment-id:2511065858 --> @Arslan-Mehmood1 commented on GitHub (Dec 2, 2024): as I faced this issue while using ollama with langchain, I checked the api reference for loading model and found then I can set parameter 'num_ctx' for context window length. https://python.langchain.com/api_reference/ollama/llms/langchain_ollama.llms.OllamaLLM.html#langchain_ollama.llms.OllamaLLM.num_ctx
Author
Owner

@PasaOpasen commented on GitHub (Nov 12, 2025):

is there a way to globally limit this value from environment ?

<!-- gh-comment-id:3520924289 --> @PasaOpasen commented on GitHub (Nov 12, 2025): is there a way to globally limit this value from environment ?
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#2246