[GH-ISSUE #6253] When systemMessage exceeds a certain length, ollama is unable to process it. #65950

Closed
opened 2026-05-03 23:17:57 -05:00 by GiteaMirror · 5 comments
Owner

Originally created by @billrenhero on GitHub (Aug 8, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/6253

Originally assigned to: @jmorganca on GitHub.

What is the issue?

截屏2024-08-08 11 28 37 when system message exceeds a certain length(4096 likely), Ollama returns "It seems like you're sharing some information, but it's not in a readable format. Could you please rephrase or provide more context about what this is related to? I'd be happy to help if I can! ". It is ok when I uses version 0.1.47.

OS

Linux, macOS

GPU

Nvidia, Apple

CPU

AMD, Apple

Ollama version

0.3.4

Originally created by @billrenhero on GitHub (Aug 8, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/6253 Originally assigned to: @jmorganca on GitHub. ### What is the issue? <img width="1250" alt="截屏2024-08-08 11 28 37" src="https://github.com/user-attachments/assets/49e8c26e-ef09-4f4a-b06b-7e24801c2f69"> when system message exceeds a certain length(4096 likely), Ollama returns "It seems like you're sharing some information, but it's not in a readable format. Could you please rephrase or provide more context about what this is related to? I'd be happy to help if I can! ". It is ok when I uses version 0.1.47. ### OS Linux, macOS ### GPU Nvidia, Apple ### CPU AMD, Apple ### Ollama version 0.3.4
GiteaMirror added the bug label 2026-05-03 23:17:57 -05:00
Author
Owner

@jmorganca commented on GitHub (Aug 8, 2024):

Hi @billrenhero thanks for the issue. It may be that the context size is too small – setting the num_ctx to 8192 or higher (if the model supports this) should help. In the meantime I'll look into how we can avoid truncating the system prompt

<!-- gh-comment-id:2274966173 --> @jmorganca commented on GitHub (Aug 8, 2024): Hi @billrenhero thanks for the issue. It may be that the context size is too small – setting the `num_ctx` to `8192` or higher (if the model supports this) should help. In the meantime I'll look into how we can avoid truncating the system prompt
Author
Owner

@cannox227 commented on GitHub (Aug 13, 2024):

I'm following, same issue with Llama3 and 3.1.
Short system prompt is considered, long one (not above context length limit) is discarded

error happened during debug:
"truncating input message which exceed context length"

Therefore only user prompt is considered...

<!-- gh-comment-id:2286473101 --> @cannox227 commented on GitHub (Aug 13, 2024): I'm following, same issue with Llama3 and 3.1. Short system prompt is considered, long one (not above context length limit) is discarded error happened during debug: "truncating input message which exceed context length" Therefore only user prompt is considered...
Author
Owner

@vividfog commented on GitHub (Aug 13, 2024):

Same happens here, for any system prompt of sufficient but still moderate size in RAG use cases — apparently with all models.

Tested with Ollama 0.3.5 and 0.2.8, with Llama3, 3.1 70B, Gemma2 and Mistral-nemo, on dual Nvidia A40 GPUs. All models using default settings after "ollama pull". Sending an OpenAI-style API call via Postman.

All these models should be able to handle the context in question, it's about 2 kt in size. /show info reports max context size well above this.

If the nr input tokens are at or above exactly 2048 tokens, as reported by Ollama's own OpenAI-style response, the system prompt is dropped.

Debug line of interest:
time=2024-08-13T14:57:35.755Z level=DEBUG source=prompt.go:51 msg="truncating input messages which exceed context length" truncated=2

This is the largest input that works:

    "usage": {
        "prompt_tokens": 2047,
        "completion_tokens": 100,
        "total_tokens": 2147
    }

If I make the system prompt or user prompt just one character longer, after seeing this stat, only the user message is considered for input, and the log line above is shown.

Loaded models and num_parallel are set to 4. I also tested with num_parallel 1, but same behavior.

Frankly, if the input + max_tokens is truly oversized, I think we should get an API error in this situation, like with TGI. But in this case these models should all be able to handle inputs of 2048 tokens, running with their default settings.

<!-- gh-comment-id:2286521829 --> @vividfog commented on GitHub (Aug 13, 2024): Same happens here, for any system prompt of sufficient but still moderate size in RAG use cases — apparently with all models. Tested with Ollama 0.3.5 and 0.2.8, with Llama3, 3.1 70B, Gemma2 and Mistral-nemo, on dual Nvidia A40 GPUs. All models using default settings after "ollama pull". Sending an OpenAI-style API call via Postman. All these models should be able to handle the context in question, it's about 2 kt in size. `/show info` reports max context size well above this. If the nr input tokens are at or above exactly 2048 tokens, as reported by Ollama's own OpenAI-style response, the system prompt is dropped. Debug line of interest: `time=2024-08-13T14:57:35.755Z level=DEBUG source=prompt.go:51 msg="truncating input messages which exceed context length" truncated=2 ` This is the largest input that works: ``` "usage": { "prompt_tokens": 2047, "completion_tokens": 100, "total_tokens": 2147 } ``` If I make the system prompt or user prompt just one character longer, after seeing this stat, only the user message is considered for input, and the log line above is shown. Loaded models and num_parallel are set to 4. I also tested with num_parallel 1, but same behavior. Frankly, if the input + max_tokens is truly oversized, I think we should get an API error in this situation, like with TGI. But in this case these models should all be able to handle inputs of 2048 tokens, running with their default settings.
Author
Owner

@cannox227 commented on GitHub (Aug 13, 2024):

@jmorganca
Here I have a full example that could be reproduced: issue

<!-- gh-comment-id:2286980158 --> @cannox227 commented on GitHub (Aug 13, 2024): @jmorganca Here I have a full example that could be reproduced: [issue](https://github.com/ollama/ollama/issues/6176#issuecomment-2286978001)
Author
Owner

@jmorganca commented on GitHub (Sep 2, 2024):

This may be because the input is longer than the context window, and so inputs are truncated based until it will fit. In this case it seems the system message is truncated. Ollama doesn't currently error when truncation is required, and so to extend the context window you'll need to use the num_ctx option:

"options": {
  "num_ctx": 8192
}

Let me know if that helps.

<!-- gh-comment-id:2325371592 --> @jmorganca commented on GitHub (Sep 2, 2024): This may be because the input is longer than the context window, and so inputs are truncated based until it will fit. In this case it seems the system message is truncated. Ollama doesn't currently error when truncation is required, and so to extend the context window you'll need to use the `num_ctx` option: ``` "options": { "num_ctx": 8192 } ``` Let me know if that helps.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#65950