[GH-ISSUE #9010] Error:" the current context does not support k-shift " deepseek-r1:671b crashes in memory after answering several questions and then reloads to memory again #52368

New Issue

@fishreyuu commented on GitHub (Feb 12, 2025):

@sobermh 你可以调小
PARAMETER num_ctx 2048
PARAMETER num_predict 512

运行时设置OLLAMA_NUM_PARALLEL=1

@fishreyuu commented on GitHub (Feb 12, 2025): @sobermh 你可以调小 PARAMETER num_ctx 2048 PARAMETER num_predict 512 运行时设置OLLAMA_NUM_PARALLEL=1

GiteaMirror commented

@sobermh commented on GitHub (Feb 12, 2025):

@fishreyuu 我可以通过提升内存来避免这个错误吗

@sobermh commented on GitHub (Feb 12, 2025): @fishreyuu 我可以通过提升内存来避免这个错误吗

GiteaMirror commented

@sobermh commented on GitHub (Feb 12, 2025):

If your input tokens + output tokens > num_ctx, the model will fail due to k-shift. So if you want to use longer prompts (multiple Q&A), you need to increase num_ctx.

@fishreyuu 但是这位官方人员说的意思应该是num_ctx不能太小
我用了这个modelfile：

FROM deepseek-r1:671b
PARAMETER num_ctx 4096
PARAMETER num_predict 512

OLLAMA_NUM_PARALLEL=1

回答次数一多模型在内存依然会崩溃

@sobermh commented on GitHub (Feb 12, 2025): > If your input tokens + output tokens > num_ctx, the model will fail due to k-shift. So if you want to use longer prompts (multiple Q&A), you need to increase `num_ctx`. @fishreyuu 但是这位官方人员说的意思应该是num_ctx不能太小我用了这个modelfile： ``` FROM deepseek-r1:671b PARAMETER num_ctx 4096 PARAMETER num_predict 512 ``` OLLAMA_NUM_PARALLEL=1 回答次数一多模型在内存依然会崩溃

GiteaMirror commented

2026-04-28 23:06:22 -05:00

@rick-github commented on GitHub (Feb 12, 2025):

FROM deepseek-r1:671b
PARAMETER num_ctx 163840
PARAMETER num_predict 8192

@rick-github commented on GitHub (Feb 12, 2025): ``` FROM deepseek-r1:671b PARAMETER num_ctx 163840 PARAMETER num_predict 8192 ```

GiteaMirror commented

@sobermh commented on GitHub (Feb 13, 2025):

@rick-github
My memory size is only 503G

if I upgrade my memory after, will this new model still have the same problem?
Can provide me with a new configuration to adapt to my current memory size

root@ver-PowerEdge-R750:/home/ver# ollama run deepseek-r1:671b-fixed
Error: model requires more system memory (3726.0 GiB) than is available (498.3 GiB)
root@ver-PowerEdge-R750:/home/ver# cat Modelfile
FROM deepseek-r1:671b
PARAMETER num_ctx 163840
PARAMETER num_predict 8192

@sobermh commented on GitHub (Feb 13, 2025): @rick-github My memory size is only 503G 1. if I upgrade my memory after, will this new model still have the same problem? 2. Can provide me with a new configuration to adapt to my current memory size root@ver-PowerEdge-R750:/home/ver# ollama run deepseek-r1:671b-fixed Error: model requires more system memory (3726.0 GiB) than is available (498.3 GiB) root@ver-PowerEdge-R750:/home/ver# cat Modelfile FROM deepseek-r1:671b PARAMETER num_ctx 163840 PARAMETER num_predict 8192

GiteaMirror commented

2026-04-28 23:06:22 -05:00

@YoungerLwb commented on GitHub (Feb 13, 2025):

@sobermh I also encountered this issue. Can I communicate with you via ?

@YoungerLwb commented on GitHub (Feb 13, 2025): @sobermh I also encountered this issue. Can I communicate with you via ?

GiteaMirror commented

2026-04-28 23:06:22 -05:00

@sobermh commented on GitHub (Feb 13, 2025):

@YoungerLwb yep
409788696@qq.com

@sobermh commented on GitHub (Feb 13, 2025): @YoungerLwb yep 409788696@qq.com

GiteaMirror commented

@oOoOoOoll commented on GitHub (Feb 14, 2025):

i think memory round * num_predict < num_ctx； k-shift problem will be solved

@oOoOoOoll commented on GitHub (Feb 14, 2025): i think memory round * num_predict < num_ctx； k-shift problem will be solved

GiteaMirror commented

@sobermh commented on GitHub (Feb 14, 2025):

@oOoOoOoll What does ”memory round“ mean？

@sobermh commented on GitHub (Feb 14, 2025): @oOoOoOoll What does ”memory round“ mean？

GiteaMirror commented

@oOoOoOoll commented on GitHub (Feb 14, 2025):

@oOoOoOoll What does ”memory round“ mean？

Conversation History or conversation history round nums

@oOoOoOoll commented on GitHub (Feb 14, 2025): > [@oOoOoOoll](https://github.com/oOoOoOoll) What does ”memory round“ mean？ Conversation History or conversation history round nums

GiteaMirror commented

2026-04-28 23:06:24 -05:00

@sobermh commented on GitHub (Feb 14, 2025):

@oOoOoOoll What does ”memory round“ mean？

Conversation History or conversation history round nums

I think you might be right . I just want to know if there is a way to avoid this situation . Let the inequality always hold

@sobermh commented on GitHub (Feb 14, 2025): > > [@oOoOoOoll](https://github.com/oOoOoOoll) What does ”memory round“ mean？ > > Conversation History or conversation history round nums I think you might be right . I just want to know if there is a way to avoid this situation . Let the inequality always hold

GiteaMirror commented

@mariaccc commented on GitHub (Feb 24, 2025):

FROM deepseek-r1:671b
PARAMETER num_ctx 163840
PARAMETER num_predict 8192

@rick-github how to calculate a suitable num_ctx? it base on your GPU?

@mariaccc commented on GitHub (Feb 24, 2025): > ``` > FROM deepseek-r1:671b > PARAMETER num_ctx 163840 > PARAMETER num_predict 8192 > ``` @rick-github how to calculate a suitable num_ctx? it base on your GPU?

GiteaMirror commented

2026-04-28 23:06:24 -05:00

@rick-github commented on GitHub (Feb 24, 2025):

Choose a size that allows the model to process the input tokens and generate output tokens to satisfy the requirements of the task you want to use an LLM for. If you don't have any particular requirements, set it as large as you can without causing the model to spill to system RAM.

@rick-github commented on GitHub (Feb 24, 2025): Choose a size that allows the model to process the input tokens and generate output tokens to satisfy the requirements of the task you want to use an LLM for. If you don't have any particular requirements, set it as large as you can without causing the model to spill to system RAM.

GiteaMirror commented

2026-04-28 23:06:24 -05:00

@mariaccc commented on GitHub (Feb 25, 2025):

@rick-github thanks, num_ctx 8192 works for me.

@mariaccc commented on GitHub (Feb 25, 2025): @rick-github thanks, num_ctx 8192 works for me.

GiteaMirror commented

@sobermh commented on GitHub (Feb 25, 2025):

@mariaccc Can I know your server configuration and modelfile detail?

@sobermh commented on GitHub (Feb 25, 2025): @mariaccc Can I know your server configuration and modelfile detail?

GiteaMirror commented

@mariaccc commented on GitHub (Feb 25, 2025):

@sobermh 8*A100 when I increase num_ctx, ollama ps PROCESSOR change to CPU and GPU, 8192 works full GPU

@mariaccc commented on GitHub (Feb 25, 2025): @sobermh 8*A100 when I increase num_ctx, ollama ps PROCESSOR change to CPU and GPU, 8192 works full GPU

GiteaMirror commented

@sobermh commented on GitHub (Feb 25, 2025):

@mariaccc What is the size of your memory?

@sobermh commented on GitHub (Feb 25, 2025): @mariaccc What is the size of your memory?

GiteaMirror commented