[GH-ISSUE #13388] v0.13.1+ Breaks Deepseek R1 671B system messages #55351

Open
opened 2026-04-29 08:59:46 -05:00 by GiteaMirror · 8 comments
Owner

Originally created by @MrEdigital on GitHub (Dec 9, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/13388

What is the issue?

Issue was introduced in v0.13.1, and persists in the latest (v0.13.2).

Description: The "system" message is not making it through to Deepseek R1 671B.

Relevant log output


OS

macOS

GPU

Apple

CPU

Apple

Ollama version

0.13.2

Originally created by @MrEdigital on GitHub (Dec 9, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/13388 ### What is the issue? Issue was introduced in v0.13.1, and persists in the latest (v0.13.2). **Description:** The "system" message is not making it through to Deepseek R1 671B. ### Relevant log output ```shell ``` ### OS macOS ### GPU Apple ### CPU Apple ### Ollama version 0.13.2
GiteaMirror added the bugneeds more info labels 2026-04-29 08:59:48 -05:00
Author
Owner

@rick-github commented on GitHub (Dec 9, 2025):

$ ollama -v
ollama version is 0.13.2
$ curl -s localhost:11434/api/chat -d '{
  "model":"deepseek-r1:671b",
  "messages":[
    {"role":"system","content":"speak like a pirate"},
    {"role":"user","content":"hello"}
  ],
  "stream":false
}' | jq -r .message.content
Ahoy, matey! Welcome aboard! What brings ye to these digital waters today? 🏴‍☠️

<!-- gh-comment-id:3630005482 --> @rick-github commented on GitHub (Dec 9, 2025): ```console $ ollama -v ollama version is 0.13.2 $ curl -s localhost:11434/api/chat -d '{ "model":"deepseek-r1:671b", "messages":[ {"role":"system","content":"speak like a pirate"}, {"role":"user","content":"hello"} ], "stream":false }' | jq -r .message.content Ahoy, matey! Welcome aboard! What brings ye to these digital waters today? 🏴‍☠️ ```
Author
Owner

@mchiang0610 commented on GitHub (Dec 11, 2025):

@MrEdigital May I ask how you are adding the system messages to help us reproduce this?

<!-- gh-comment-id:3643932731 --> @mchiang0610 commented on GitHub (Dec 11, 2025): @MrEdigital May I ask how you are adding the system messages to help us reproduce this?
Author
Owner

@MrEdigital commented on GitHub (Dec 18, 2025):

Here's a minimal reproduction:

$ curl -s http://localhost:11434/api/chat -d '{ "model" : "deepseek-r1:671b-0528-q4_K_M", "messages": [ { "content": "You love counting 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36", "role": "system" }, { "content": "What do you love to do?", "role": "user" }]}'

You will find that it ignores the system prompt. If you remove one number, it will suddenly see it (and go much slower).

There seems to be some sort of threshold that is much more easily reached with it in recent versions. Numbers get it there quickly. You can also remove a number and add, say ~10 commas throughout and it will also hit that threshold, or ~8 periods, etc.

There is some variance in the number based on position and count of numbers being removed, but these could be just about any symbol, scattered throughout a prompt. A sizable prompt can easily hit this now with numbering, bulleting, punctuation, and raw text.

In summary: Small system prompts still work, but there is now a threshold which, once reached, will result in it being skipped entirely.

<!-- gh-comment-id:3672437937 --> @MrEdigital commented on GitHub (Dec 18, 2025): Here's a minimal reproduction: $ curl -s http://localhost:11434/api/chat -d '{ "model" : "deepseek-r1:671b-0528-q4_K_M", "messages": [ { "content": "You love counting 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36", "role": "system" }, { "content": "What do you love to do?", "role": "user" }]}' You will find that it ignores the system prompt. If you remove one number, it will suddenly see it (and go much slower). There seems to be some sort of threshold that is much more easily reached with it in recent versions. Numbers get it there quickly. You can also remove a number and add, say ~10 commas throughout and it will also hit that threshold, or ~8 periods, etc. There is some variance in the number based on position and count of numbers being removed, but these could be just about any symbol, scattered throughout a prompt. A sizable prompt can easily hit this now with numbering, bulleting, punctuation, and raw text. In summary: Small system prompts still work, but there is now a threshold which, once reached, will result in it being skipped entirely.
Author
Owner

@MrEdigital commented on GitHub (Dec 18, 2025):

I've confirmed again, that not only does the prompt get discarded in v0.31.1+ only (that is to say, it works in v0.31.0), but when the prompt is adhered to in these later versions, it also goes MUCH MUCH slower. We're talking ~0:10 versus ~2:45, just to get started, and then much slower on each streamed response, too.

<!-- gh-comment-id:3672516437 --> @MrEdigital commented on GitHub (Dec 18, 2025): I've confirmed again, that not only does the prompt get discarded in v0.31.1+ only (that is to say, it works in v0.31.0), but when the prompt is adhered to in these later versions, it also goes MUCH MUCH slower. We're talking ~0:10 versus ~2:45, just to get started, and then much slower on each streamed response, too.
Author
Owner

@rick-github commented on GitHub (Dec 19, 2025):

This is caused by 5c1063df7f. Removal of the attention.key_length_mla check results in a pre-tokenizer that duplicates tokens in the prompt when digits are present:

$ ollama run deepseek-r1:671b a b c d repeat the preceding letters
Thinking...
We are given the pattern: "a b c d repeat the preceding letters"
 We need to generate a sequence of letters following the pattern.
...
$ ollama run deepseek-r1:671b 1 2 3 4 repeat the preceding numbers
Thinking...
First, the user said: "1 1 2 1 1 2 3 1 1 2 1 2 3 4 repeat the preceding numbers"
...

The token duplication causes the prompt processor to think it will run out of room in the context buffer and so it discards the system prompt.

<!-- gh-comment-id:3673293719 --> @rick-github commented on GitHub (Dec 19, 2025): This is caused by https://github.com/ollama/ollama/commit/5c1063df7fb6850410a35ddfb92cd6efb818fa6e. Removal of the `attention.key_length_mla` check results in a pre-tokenizer that duplicates tokens in the prompt when digits are present: ```console $ ollama run deepseek-r1:671b a b c d repeat the preceding letters Thinking... We are given the pattern: "a b c d repeat the preceding letters" We need to generate a sequence of letters following the pattern. ... ``` ```console $ ollama run deepseek-r1:671b 1 2 3 4 repeat the preceding numbers Thinking... First, the user said: "1 1 2 1 1 2 3 1 1 2 1 2 3 4 repeat the preceding numbers" ... ``` The token duplication causes the prompt processor to think it will run out of room in the context buffer and so it discards the system prompt.
Author
Owner

@MrEdigital commented on GitHub (Dec 20, 2025):

Makes sense. We should get a fix lined up for this.

<!-- gh-comment-id:3677974766 --> @MrEdigital commented on GitHub (Dec 20, 2025): Makes sense. We should get a fix lined up for this.
Author
Owner

@MrEdigital commented on GitHub (Dec 31, 2025):

Sounds like we can remove the needs more info label now.

<!-- gh-comment-id:3701397986 --> @MrEdigital commented on GitHub (Dec 31, 2025): Sounds like we can remove the `needs more info` label now.
Author
Owner

@MrEdigital commented on GitHub (Feb 12, 2026):

This still is not entirely fixed in 0.15.6. What I am seeing:

  1. It does not discard the system prompt entirely, but it does miss fundamental instructions. In the example given above, it recognizes the system prompt as its own message, and believes it was counting in a nonsensical way. It completely misses any further instructions, fixating entirely on that.
  2. It goes VERY VERY slow. It takes minutes to begin stream, where 0.12.11 is a fraction of the time. This is consistent with my original observations where I said "You will find that it ignores the system prompt. If you remove one number, it will suddenly see it (and go much slower)."

Due to speed and context drops, I will be reverting again to 0.12.11.

@rick-github Please remove the "needs more info" tag. This is a breaking bug for all local users of Deepseek 671b, and it appears to be at least mostly understood now. A fix it urgently needed.

<!-- gh-comment-id:3888367193 --> @MrEdigital commented on GitHub (Feb 12, 2026): This still is not entirely fixed in 0.15.6. What I am seeing: 1. It does not discard the system prompt entirely, but it does miss fundamental instructions. In the example given above, it recognizes the system prompt as its own message, and believes it was counting in a nonsensical way. It completely misses any further instructions, fixating entirely on that. 2. It goes VERY VERY slow. It takes minutes to begin stream, where 0.12.11 is a fraction of the time. This is consistent with my original observations where I said "You will find that it ignores the system prompt. If you remove one number, it will suddenly see it **(and go much slower).**" Due to speed and context drops, I will be reverting again to 0.12.11. @rick-github Please remove the "needs more info" tag. This is a breaking bug for all local users of Deepseek 671b, and it appears to be at least mostly understood now. A fix it urgently needed.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#55351