[PR #3092] [CLOSED] Limit num_predict to num_ctx #16335

Closed
opened 2026-04-16 05:25:29 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/3092
Author: @jmorganca
Created: 3/13/2024
Status: Closed

Base: mainHead: jmorganca/limit


📝 Commits (1)

  • ca7c3f7 limit num_predict to num_ctx

📊 Changes

1 file changed (+13 additions, -0 deletions)

View changed files

📝 llm/dyn_ext_server.go (+13 -0)

📄 Description

This limits the number of tokens generated to the context window size, allowing a maximum of two "context shifts" should the context window limit be passed. It will also provide a maximum token limit to stop any "runaway" prompts that occur from smaller models that continue to generate indefinitely (e.g. in JSON mode)

Ideally we have no context shifts: we only generate the number of tokens left in the context window after the prompt, but this felt like a simple change that would ease our way into this since chat prompts can get quite large and we'd have to change our prompt trimming strategy to leave enough space for longer responses.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/3092 **Author:** [@jmorganca](https://github.com/jmorganca) **Created:** 3/13/2024 **Status:** ❌ Closed **Base:** `main` ← **Head:** `jmorganca/limit` --- ### 📝 Commits (1) - [`ca7c3f7`](https://github.com/ollama/ollama/commit/ca7c3f7e0f9b14ea435aedef1c448a37a44f423c) limit `num_predict` to `num_ctx` ### 📊 Changes **1 file changed** (+13 additions, -0 deletions) <details> <summary>View changed files</summary> 📝 `llm/dyn_ext_server.go` (+13 -0) </details> ### 📄 Description This limits the number of tokens generated to the context window size, allowing a maximum of two "context shifts" should the context window limit be passed. It will also provide a maximum token limit to stop any "runaway" prompts that occur from smaller models that continue to generate indefinitely (e.g. in JSON mode) Ideally we have no context shifts: we only generate the number of tokens left in the context window after the prompt, but this felt like a simple change that would ease our way into this since chat prompts can get quite large and we'd have to change our prompt trimming strategy to leave enough space for longer responses. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-16 05:25:29 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#16335