[GH-ISSUE #7691] [Docs] Incorrect default value for num_predict? #30671

Closed
opened 2026-04-22 10:33:52 -05:00 by GiteaMirror · 1 comment
Owner

Originally created by @owboson on GitHub (Nov 15, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/7691

The API documentation (https://github.com/ollama/ollama/blob/main/docs/api.md#generate-a-chat-completion) refers to https://github.com/ollama/ollama/blob/main/docs/modelfile.md#valid-parameters-and-values for more information about the parameters that can be specified in the options field of a chat completion request.

However, the default value for the num_predict parameter described there either doesn't apply to calls made via the python library (in which case the docs should emphasise this) or is incorrect.

For num_predict, the docs say:

Maximum number of tokens to predict when generating text. (Default: 128, -1 = infinite generation, -2 = fill context)

I initially wondered how Ollama could generate responses much longer than 128 tokens (without me specifying a value for the parameter). After adding a debug statement in router.go, I noticed that the server received a value of -1 for num_predict, which matches my previous observations.

As a consequence, the documentation is either misleading or gives an incorrect default value.

Originally created by @owboson on GitHub (Nov 15, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/7691 The API documentation (https://github.com/ollama/ollama/blob/main/docs/api.md#generate-a-chat-completion) refers to https://github.com/ollama/ollama/blob/main/docs/modelfile.md#valid-parameters-and-values for more information about the parameters that can be specified in the `options` field of a chat completion request. However, the default value for the `num_predict` parameter described there either doesn't apply to calls made via the python library (in which case the docs should emphasise this) or is incorrect. For `num_predict`, the docs say: > Maximum number of tokens to predict when generating text. (Default: 128, -1 = infinite generation, -2 = fill context) I initially wondered how Ollama could generate responses much longer than 128 tokens (without me specifying a value for the parameter). After adding a debug statement in `router.go`, I noticed that the server received a value of -1 for `num_predict`, which matches my previous observations. As a consequence, the documentation is either misleading or gives an incorrect default value.
GiteaMirror added the documentation label 2026-04-22 10:33:52 -05:00
Author
Owner

@rick-github commented on GitHub (Nov 15, 2024):

The documentation is wrong, the default value is -1.

d875e99e46/api/types.go (L590)

<!-- gh-comment-id:2479856306 --> @rick-github commented on GitHub (Nov 15, 2024): The documentation is wrong, the default value is -1. https://github.com/ollama/ollama/blob/d875e99e4639dc07af90b2e3ea0d175e2e692efb/api/types.go#L590
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#30671