[PR #9393] [CLOSED] runner: reduce deepseek failures by allowing dynamic num_predict behaviour. #12943

Closed
opened 2026-04-13 00:13:17 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/9393
Author: @rick-github
Created: 2/27/2025
Status: Closed

Base: mainHead: num_predict


📝 Commits (1)

  • bf42297 server: num_predict==-2 fills context buffer

📊 Changes

3 files changed (+9 additions, -1 deletions)

View changed files

📝 docs/modelfile.md (+1 -1)
📝 runner/llamarunner/runner.go (+4 -0)
📝 runner/ollamarunner/runner.go (+4 -0)

📄 Description

Deepseek architecture doesn't support K-shift and so crashes when the context buffer is exceeded. The workaround is to set num_ctx and num_predict to values that reduce crashes, but this can still fail since the input prompt is unbounded. This PR allows num_predict to dynamically limit the number of tokens generated to the size of the context buffer.

Fixes: #5975
Fixes: #8074
Fixes: #8571
Fixes: #8599
Fixes: #8602
Fixes: #8614
Fixes: #8924
Fixes: #9010
Fixes: #9047
Fixes: #9064
Fixes: #9105
Fixes: #9171
Fixes: #9248
Fixes: #9410


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/9393 **Author:** [@rick-github](https://github.com/rick-github) **Created:** 2/27/2025 **Status:** ❌ Closed **Base:** `main` ← **Head:** `num_predict` --- ### 📝 Commits (1) - [`bf42297`](https://github.com/ollama/ollama/commit/bf422973509d053d21abff3a22b462a934f55547) server: num_predict==-2 fills context buffer ### 📊 Changes **3 files changed** (+9 additions, -1 deletions) <details> <summary>View changed files</summary> 📝 `docs/modelfile.md` (+1 -1) 📝 `runner/llamarunner/runner.go` (+4 -0) 📝 `runner/ollamarunner/runner.go` (+4 -0) </details> ### 📄 Description Deepseek architecture doesn't support K-shift and so crashes when the context buffer is exceeded. The workaround is to set `num_ctx` and `num_predict` to values that reduce crashes, but this can still fail since the input prompt is unbounded. This PR allows `num_predict` to dynamically limit the number of tokens generated to the size of the context buffer. Fixes: #5975 Fixes: #8074 Fixes: #8571 Fixes: #8599 Fixes: #8602 Fixes: #8614 Fixes: #8924 Fixes: #9010 Fixes: #9047 Fixes: #9064 Fixes: #9105 Fixes: #9171 Fixes: #9248 Fixes: #9410 --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-13 00:13:17 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#12943