[GH-ISSUE #2997] Can I force ollama to produce shorter responses? #27599

Closed
opened 2026-04-22 05:03:59 -05:00 by GiteaMirror · 3 comments
Owner

Originally created by @Anirudh257 on GitHub (Mar 8, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/2997

Hi,
I want to use the llama2 model available in Ollama to produce shorter outputs. I want to use max_new_tokens, max_length
parameters in https://huggingface.co/docs/transformers/en/main_classes/text_generation. Can I prompt the LLM to generate shorter sequences while keeping the meaning same?

There are some approaches given in https://www.reddit.com/r/LocalLLaMA/comments/14k7f5w/any_way_to_limit_the_output_to_a_specific_line/ but they don't work well.

Originally created by @Anirudh257 on GitHub (Mar 8, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/2997 Hi, I want to use the llama2 model available in Ollama to produce shorter outputs. I want to use ``max_new_tokens``, ``max_length`` parameters in https://huggingface.co/docs/transformers/en/main_classes/text_generation. Can I prompt the LLM to generate shorter sequences while keeping the meaning same? There are some approaches given in https://www.reddit.com/r/LocalLLaMA/comments/14k7f5w/any_way_to_limit_the_output_to_a_specific_line/ but they don't work well.
Author
Owner

@aosan commented on GitHub (Mar 11, 2024):

One easy way to shorten answers would be to create a new model based on your llama2 model of choice and define your brevity instructions in the Makefile, under the SYSTEM section.

For example, I used prompt instructions suitable for a SYSTEM section:

ollama run llama2 "Please tell me a joke"

>>> Why don't scientists trust atoms? Because they make up everything! 😂

ollama run llama2 "Please tell me a joke. Don't use more than 5 words"

>>> Sure, here is a short joke:

>>> Kangaroo walks into a bar.
<!-- gh-comment-id:1988610109 --> @aosan commented on GitHub (Mar 11, 2024): One easy way to shorten answers would be to create a new model based on your llama2 model of choice and define your brevity instructions in the Makefile, under the SYSTEM section. For example, I used prompt instructions suitable for a SYSTEM section: ``` ollama run llama2 "Please tell me a joke" >>> Why don't scientists trust atoms? Because they make up everything! 😂 ollama run llama2 "Please tell me a joke. Don't use more than 5 words" >>> Sure, here is a short joke: >>> Kangaroo walks into a bar. ```
Author
Owner

@jmorganca commented on GitHub (Mar 11, 2024):

Hi there, you can pass num_predict to the API

{
  "options": {
    "num_predict": 25
  }
}

However, this will truncate messages, so you should also make sure to tell the LLM to answer with short responses in the prompt.

<!-- gh-comment-id:1989547517 --> @jmorganca commented on GitHub (Mar 11, 2024): Hi there, you can pass `num_predict` to the API ``` { "options": { "num_predict": 25 } } ``` However, this will truncate messages, so you should also make sure to tell the LLM to answer with short responses in the prompt.
Author
Owner

@Anirudh257 commented on GitHub (Mar 13, 2024):

Thanks, @aosan and @jmorganca. This was pretty helpful!

<!-- gh-comment-id:1995081530 --> @Anirudh257 commented on GitHub (Mar 13, 2024): Thanks, @aosan and @jmorganca. This was pretty helpful!
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#27599