[GH-ISSUE #8099] ollama run silently truncating prompt #51689

Closed
opened 2026-04-28 20:45:18 -05:00 by GiteaMirror · 4 comments
Owner

Originally created by @daniel-j-h on GitHub (Dec 14, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/8099

What is the issue?

The documentation shows us how to use ollama run to summarize a file, see

$ ollama run llama3.2 "Summarize this file: $(cat README.md)"

https://github.com/ollama/ollama?tab=readme-ov-file#pass-the-prompt-as-an-argument

What's not obvious here is that by default the prompt (and therefore the file passed in) gets truncated to 2048 tokens.

There's a warning in the server's logs but it's not clear to users of the ollama run command

time=2024-12-14T18:01:09.338Z level=WARN source=runner.go:129 msg="truncating input prompt" limit=2048 prompt=3858 keep=5 new=2048

This behavior is not obvious to users and easy to run into without even realizing.

It seems like it's not possible to change this behavior currently with ollama run. What's the best way forward here? Can we add a parameter to ollama run or should ollama run issue a warning for users?

OS

Linux

GPU

Other

CPU

Intel

Ollama version

0.5.1

Originally created by @daniel-j-h on GitHub (Dec 14, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/8099 ### What is the issue? The documentation shows us how to use `ollama run` to summarize a file, see $ ollama run llama3.2 "Summarize this file: $(cat README.md)" https://github.com/ollama/ollama?tab=readme-ov-file#pass-the-prompt-as-an-argument What's not obvious here is that by default the prompt (and therefore the file passed in) gets truncated to 2048 tokens. There's a warning in the server's logs but it's not clear to users of the ollama run command > time=2024-12-14T18:01:09.338Z level=WARN source=runner.go:129 msg="truncating input prompt" limit=2048 prompt=3858 keep=5 new=2048 This behavior is not obvious to users and easy to run into without even realizing. It seems like it's not possible to change this behavior currently with ollama run. What's the best way forward here? Can we add a parameter to ollama run or should ollama run issue a warning for users? ### OS Linux ### GPU Other ### CPU Intel ### Ollama version 0.5.1
GiteaMirror added the bug label 2026-04-28 20:45:18 -05:00
Author
Owner

@rick-github commented on GitHub (Dec 14, 2024):

https://github.com/ollama/ollama/blob/main/docs/faq.md#how-can-i-specify-the-context-window-size

The CLI is a simple frontend to the API and has limited functionality, different clients are available to make better use of the server. But you are right that the limitations aren't better explained.

A workaround for the problem you highlight is to increase the context window of the model.

$ ollama run llama3.2
>>> /set parameter num_ctx 32768
Set parameter 'num_ctx' to '32768'
>>> /save llama3.2-32k
Created new model 'llama3.2-32k'
>>> /bye
$ ollama run llama3.2-32k "Summarize this file: $(cat README.md)"
This file appears to be a documentation or README page for the Ollama project, which is a framework for building and running large language models.
...
<!-- gh-comment-id:2543316682 --> @rick-github commented on GitHub (Dec 14, 2024): https://github.com/ollama/ollama/blob/main/docs/faq.md#how-can-i-specify-the-context-window-size The CLI is a simple frontend to the API and has limited functionality, [different clients](https://github.com/ollama/ollama?tab=readme-ov-file#community-integrations) are available to make better use of the server. But you are right that the limitations aren't better explained. A workaround for the problem you highlight is to increase the context window of the model. ```console $ ollama run llama3.2 >>> /set parameter num_ctx 32768 Set parameter 'num_ctx' to '32768' >>> /save llama3.2-32k Created new model 'llama3.2-32k' >>> /bye $ ollama run llama3.2-32k "Summarize this file: $(cat README.md)" This file appears to be a documentation or README page for the Ollama project, which is a framework for building and running large language models. ... ```
Author
Owner

@rick-github commented on GitHub (Dec 14, 2024):

Related: #7043

<!-- gh-comment-id:2543321528 --> @rick-github commented on GitHub (Dec 14, 2024): Related: #7043
Author
Owner

@pdevine commented on GitHub (Dec 17, 2024):

@daniel-j-h thanks for the issue. I definitely agree it's not easy to see right now when the context length is exceeded. I'm going to close this in favor of #7043 though.

<!-- gh-comment-id:2549441663 --> @pdevine commented on GitHub (Dec 17, 2024): @daniel-j-h thanks for the issue. I definitely agree it's not easy to see right now when the context length is exceeded. I'm going to close this in favor of #7043 though.
Author
Owner

@gevzak commented on GitHub (Mar 13, 2025):

Was this implemented at some point? I am unable to find documentation for it.

<!-- gh-comment-id:2721990074 --> @gevzak commented on GitHub (Mar 13, 2025): Was this implemented at some point? I am unable to find documentation for it.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#51689