[PR #9547] server: allow dynamic token generation limit with num_predict==-2 #23536

Open
opened 2026-04-19 17:04:22 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/9547
Author: @rick-github
Created: 3/6/2025
Status: 🔄 Open

Base: mainHead: num_predict


📝 Commits (4)

  • bf42297 server: num_predict==-2 fills context buffer
  • f33198a Merge branch 'ollama:main' into num_predict
  • 094747e Follow e53b3cbd0 and switch from string to enum.
  • 7547349 Merge branch 'ollama:main' into num_predict

📊 Changes

3 files changed (+9 additions, -1 deletions)

View changed files

📝 docs/modelfile.md (+1 -1)
📝 runner/llamarunner/runner.go (+4 -0)
📝 runner/ollamarunner/runner.go (+4 -0)

📄 Description

Currently num_predict either allows infinite generation or restricts token generation to a static limit. This shadows llama.cpp in allowing num_predict==-2 to fill the context buffer.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/9547 **Author:** [@rick-github](https://github.com/rick-github) **Created:** 3/6/2025 **Status:** 🔄 Open **Base:** `main` ← **Head:** `num_predict` --- ### 📝 Commits (4) - [`bf42297`](https://github.com/ollama/ollama/commit/bf422973509d053d21abff3a22b462a934f55547) server: num_predict==-2 fills context buffer - [`f33198a`](https://github.com/ollama/ollama/commit/f33198a3661ca6743a1cf95eb885bedece1098ff) Merge branch 'ollama:main' into num_predict - [`094747e`](https://github.com/ollama/ollama/commit/094747e2cdb529fde711ab4d170abdfe569de961) Follow e53b3cbd0 and switch from string to enum. - [`7547349`](https://github.com/ollama/ollama/commit/7547349094d9b91536bca0b44a215127d88030f6) Merge branch 'ollama:main' into num_predict ### 📊 Changes **3 files changed** (+9 additions, -1 deletions) <details> <summary>View changed files</summary> 📝 `docs/modelfile.md` (+1 -1) 📝 `runner/llamarunner/runner.go` (+4 -0) 📝 `runner/ollamarunner/runner.go` (+4 -0) </details> ### 📄 Description Currently num_predict either allows infinite generation or restricts token generation to a static limit. This shadows llama.cpp in allowing num_predict==-2 to fill the context buffer. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-19 17:04:22 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#23536