[GH-ISSUE #10012] num_ctx #68621

Closed
opened 2026-05-04 14:38:23 -05:00 by GiteaMirror · 5 comments
Owner

Originally created by @20246688 on GitHub (Mar 27, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/10012

What does the context support capacity consist of? For example, is the num_ctx in Ollama used to set the context? But why can it output the last sentence of an 8000-word document when I ask it to? I don't understand this num_ctx.

Originally created by @20246688 on GitHub (Mar 27, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/10012 What does the context support capacity consist of? For example, is the num_ctx in Ollama used to set the context? But why can it output the last sentence of an 8000-word document when I ask it to? I don't understand this num_ctx.
GiteaMirror added the question label 2026-05-04 14:38:24 -05:00
Author
Owner

@20246688 commented on GitHub (Mar 27, 2025):

The default value of the model is 2048, isn't it?

<!-- gh-comment-id:2757468234 --> @20246688 commented on GitHub (Mar 27, 2025): The default value of the model is 2048, isn't it?
Author
Owner

@forReason commented on GitHub (Mar 27, 2025):

by the last, you mean the one at the bottom or at the top?
the context window refers to the last tokens, so you should ask for something on the first page

The default context length can be specified in the model, so your context length may be different by the model you downloaded. Try setting it explicitly to ensure it is 2048

<!-- gh-comment-id:2757476553 --> @forReason commented on GitHub (Mar 27, 2025): by `the last`, you mean the one at the bottom or at the top? the context window refers to the last tokens, so you should ask for something on the first page The default context length can be specified in the model, so your context length may be different by the model you downloaded. Try setting it explicitly to ensure it is 2048
Author
Owner

@rick-github commented on GitHub (Mar 27, 2025):

The context is the set of tokens used in inference. The 8000-word document is converted to tokens, where each token represents 2-4 characters (varies depending on the model tokenizer). num_ctx sets the size of the context buffer, so if the input was converted into more than num_ctx tokens, the input is truncated to fit. The truncation happens at the start of the buffer, so tokens corresponding to the last sentence are retained. The length of the context cache can also be set with OLLAMA_CONTEXT_LENGTH.

<!-- gh-comment-id:2757609479 --> @rick-github commented on GitHub (Mar 27, 2025): The context is the set of tokens used in inference. The 8000-word document is converted to tokens, where each token represents 2-4 characters (varies depending on the model tokenizer). `num_ctx` sets the size of the context buffer, so if the input was converted into more than `num_ctx` tokens, the input is truncated to fit. The truncation happens at the start of the buffer, so tokens corresponding to the last sentence are retained. The length of the context cache can also be set with [`OLLAMA_CONTEXT_LENGTH`](https://github.com/ollama/ollama/blob/main/docs/faq.md#how-can-i-specify-the-context-window-size).
Author
Owner

@pdevine commented on GitHub (Mar 27, 2025):

I'll close the issue as answered (thank you @rick-github !)

<!-- gh-comment-id:2759724335 --> @pdevine commented on GitHub (Mar 27, 2025): I'll close the issue as answered (thank you @rick-github !)
Author
Owner

@20246688 commented on GitHub (Mar 28, 2025):

I tested Qwen2.5-7B and found that with the default settings, when summarizing a 10,000-word document, it could output the first and last sentence, but the overall summary quality was really bad.

Then, I set num_ctx=8192, and it still output the first and last sentence, but the summary quality improved slightly. However, the results were still pretty disappointing. @forReason @rick-github

<!-- gh-comment-id:2759998493 --> @20246688 commented on GitHub (Mar 28, 2025): I tested Qwen2.5-7B and found that with the default settings, when summarizing a 10,000-word document, it could output the first and last sentence, but the overall summary quality was really bad. Then, I set num_ctx=8192, and it still output the first and last sentence, but the summary quality improved slightly. However, the results were still pretty disappointing. @forReason @rick-github
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#68621