[GH-ISSUE #4230] Unfinished sentences when setting num_predict parameter #28399

Closed
opened 2026-04-22 06:34:01 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @mariomorvan on GitHub (May 7, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/4230

What is the issue?

I have tried multiple values of num_predict between 30 and 100 and two models (llama2 and llava).
In all cases the last sentence is often cut short, making it quite inconvenient to use in applications.
Not entirely sure if it is a bug or just expected behaviour that should be handled otherwise (prompt engineering, perplexity, postprocessing...).

The problem has already been mentioned by several people in this issue https://github.com/langgenius/dify/issues/2461#issuecomment-2047412964

OS

macOS

GPU

Intel

CPU

Intel

Ollama version

0.1.33

Originally created by @mariomorvan on GitHub (May 7, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/4230 ### What is the issue? I have tried multiple values of **_num_predict_** between 30 and 100 and two models (llama2 and llava). In all cases the last sentence is often cut short, making it quite inconvenient to use in applications. Not entirely sure if it is a bug or just expected behaviour that should be handled otherwise (prompt engineering, perplexity, postprocessing...). The problem has already been mentioned by several people in this issue https://github.com/langgenius/dify/issues/2461#issuecomment-2047412964 ### OS macOS ### GPU Intel ### CPU Intel ### Ollama version 0.1.33
GiteaMirror added the bug label 2026-04-22 06:34:01 -05:00
Author
Owner

@jmorganca commented on GitHub (May 7, 2024):

Hi @mariomorvan thanks for the issue. This is expected behavior given num_predict decide show many tokens (roughly, words) will be output back. You'd want to leave enough for at least one complete sentence.

That said, I totally understand you'd like to limit the length and receive a complete answer. This is something we'll consider in the future! A good tip for this is to mention the length of the response in the prompt. For example answer this question in a single sentence of no more than 10 words - the language model will often oblige :)

<!-- gh-comment-id:2098877841 --> @jmorganca commented on GitHub (May 7, 2024): Hi @mariomorvan thanks for the issue. This is expected behavior given num_predict decide show many tokens (roughly, words) will be output back. You'd want to leave enough for at least one complete sentence. That said, I totally understand you'd like to limit the length _and_ receive a complete answer. This is something we'll consider in the future! A good tip for this is to mention the length of the response in the prompt. For example `answer this question in a single sentence of no more than 10 words` - the language model will often oblige :)
Author
Owner

@mariomorvan commented on GitHub (May 8, 2024):

Thanks - looks like a useful and often effective workaround

Hi @mariomorvan thanks for the issue. This is expected behavior given num_predict decide show many tokens (roughly, words) will be output back. You'd want to leave enough for at least one complete sentence.

That said, I totally understand you'd like to limit the length and receive a complete answer. This is something we'll consider in the future! A good tip for this is to mention the length of the response in the prompt. For example answer this question in a single sentence of no more than 10 words - the language model will often oblige :)

<!-- gh-comment-id:2100886754 --> @mariomorvan commented on GitHub (May 8, 2024): Thanks - looks like a useful and often effective workaround > Hi @mariomorvan thanks for the issue. This is expected behavior given num_predict decide show many tokens (roughly, words) will be output back. You'd want to leave enough for at least one complete sentence. > > That said, I totally understand you'd like to limit the length _and_ receive a complete answer. This is something we'll consider in the future! A good tip for this is to mention the length of the response in the prompt. For example `answer this question in a single sentence of no more than 10 words` - the language model will often oblige :)
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#28399