[GH-ISSUE #3644] Is the model's PROMPT maximum number of tokens determined by the inference tool? #2247

Closed
opened 2026-04-12 12:31:18 -05:00 by GiteaMirror · 3 comments
Owner

Originally created by @17Reset on GitHub (Apr 15, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/3644

When I use ollama to reason about my Smuag-72B's model, there is no output when the input prompt has 150tokens, but the output is normal when scaled down to about 100.

Originally created by @17Reset on GitHub (Apr 15, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/3644 When I use ollama to reason about my Smuag-72B's model, there is no output when the input prompt has 150tokens, but the output is normal when scaled down to about 100.
Author
Owner

@jmorganca commented on GitHub (Apr 15, 2024):

Hi there, the context window defaults to 2048 and can be overridden with the num_ctx option: https://github.com/ollama/ollama/blob/main/docs/api.md#generate-request-with-options

<!-- gh-comment-id:2057646417 --> @jmorganca commented on GitHub (Apr 15, 2024): Hi there, the context window defaults to 2048 and can be overridden with the `num_ctx` option: https://github.com/ollama/ollama/blob/main/docs/api.md#generate-request-with-options
Author
Owner

@17Reset commented on GitHub (Apr 16, 2024):

Again it doesn't work, even if I boost the num_ctx number it still doesn't answer anything, when the prompt is too long there is no output, is this model related?

<!-- gh-comment-id:2058163356 --> @17Reset commented on GitHub (Apr 16, 2024): Again it doesn't work, even if I boost the num_ctx number it still doesn't answer anything, when the prompt is too long there is no output, is this model related?
Author
Owner

@realquatro commented on GitHub (Apr 29, 2024):

Hi there, the context window defaults to 2048 and can be overridden with the num_ctx option: https://github.com/ollama/ollama/blob/main/docs/api.md#generate-request-with-options

It does work on http://localhost:11434/api/generate
But how can I increase the context window when I use http://localhost:11434/v1/chat/completions ?

<!-- gh-comment-id:2082198464 --> @realquatro commented on GitHub (Apr 29, 2024): > Hi there, the context window defaults to 2048 and can be overridden with the `num_ctx` option: https://github.com/ollama/ollama/blob/main/docs/api.md#generate-request-with-options It does work on http://localhost:11434/api/generate 。 But how can I increase the context window when I use http://localhost:11434/v1/chat/completions ?
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#2247