[GH-ISSUE #6433] Manage output length? #66080

Closed
opened 2026-05-03 23:52:43 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @nic0711 on GitHub (Aug 20, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/6433

Hi. Is it possible to manage the output length?
I use ollama with llama3.1-instruction-8q and fabric.

My problem is that the output is too short and ends the last sentence unfinished.

The biggest/longest output was 1200 characters...

Can I change this?
Thank you.

Originally created by @nic0711 on GitHub (Aug 20, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/6433 Hi. Is it possible to manage the output length? I use ollama with llama3.1-instruction-8q and fabric. My problem is that the output is too short and ends the last sentence unfinished. The biggest/longest output was 1200 characters... Can I change this? Thank you.
GiteaMirror added the question label 2026-05-03 23:52:43 -05:00
Author
Owner

@pdevine commented on GitHub (Aug 23, 2024):

@nic0711 you can extend the context length. In the CLI you can just type:

>>> /set parameter num_ctx 8192

Which would set the context length to 8192. You will need to have more VRAM (if you're using a GPU) in order to support a larger context otherwise some layers of the model may be offloaded onto your CPU. Llama3.1 can go up to a 128k context length.

<!-- gh-comment-id:2307923840 --> @pdevine commented on GitHub (Aug 23, 2024): @nic0711 you can extend the context length. In the CLI you can just type: ``` >>> /set parameter num_ctx 8192 ``` Which would set the context length to 8192. You will need to have more VRAM (if you're using a GPU) in order to support a larger context otherwise some layers of the model may be offloaded onto your CPU. Llama3.1 can go up to a 128k context length.
Author
Owner

@DraculaVladimir commented on GitHub (Jun 29, 2025):

@nic0711 you can extend the context length. In the CLI you can just type:

>>> /set parameter num_ctx 8192

Which would set the context length to 8192. You will need to have more VRAM (if you're using a GPU) in order to support a larger context otherwise some layers of the model may be offloaded onto your CPU. Llama3.1 can go up to a 128k context length.

The man had asked for OUTPUT length not for INPUT.

I find your answer inadequate.

<!-- gh-comment-id:3016363491 --> @DraculaVladimir commented on GitHub (Jun 29, 2025): > [@nic0711](https://github.com/nic0711) you can extend the context length. In the CLI you can just type: > > ``` > >>> /set parameter num_ctx 8192 > ``` > > Which would set the context length to 8192. You will need to have more VRAM (if you're using a GPU) in order to support a larger context otherwise some layers of the model may be offloaded onto your CPU. Llama3.1 can go up to a 128k context length. The man had asked for OUTPUT length not for INPUT. I find your answer inadequate.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#66080