[PR #14523] llm: make token repeat limit configurable and fix done response #40584

Open
opened 2026-04-23 01:27:00 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/14523
Author: @4RH1T3CT0R7
Created: 3/1/2026
Status: 🔄 Open

Base: mainHead: fix/token-repeat-limit-configurable


📝 Commits (1)

  • c0f67e6 llm: make token repeat limit configurable and fix done response

📊 Changes

6 files changed (+37 additions, -9 deletions)

View changed files

📝 api/types.go (+1 -0)
📝 docs/api.md (+1 -0)
📝 docs/modelfile.mdx (+1 -0)
📝 docs/openapi.yaml (+3 -0)
📝 llm/server.go (+18 -9)
📝 openai/openai.go (+13 -0)

📄 Description

Summary

  • Make the hardcoded token repeat limit (30) configurable via new token_repeat_limit option (default: 0 = disabled)
  • Fix incorrect done: false response when repeat limit is hit — now properly sends done: true with done_reason: "repeat"
  • Skip empty/whitespace-only tokens in repeat tracking to prevent false positives
  • Map "repeat" to "stop" in OpenAI compatibility layer for spec compliance

Context

The hardcoded repeat limit of 30 tokens in llm/server.go caused false positives on valid model output (e.g., OCR dots . . . . . . . as reported in #14117). When triggered, the function returned ctx.Err() without sending a final done: true response, leaving clients in an ambiguous state.

Changes

File Change
llm/server.go Add DoneReasonRepeat enum, fix repeat limit check to use configurable option and send proper done response, skip empty tokens
api/types.go Add TokenRepeatLimit field to Options struct
openai/openai.go Map "repeat" finish reason to "stop" for OpenAI API compatibility
docs/modelfile.mdx Document token_repeat_limit parameter
docs/api.md Add option to API docs example
docs/openapi.yaml Add to OpenAPI specification

Usage

# Via API option
curl http://localhost:11434/api/generate -d '{
  "model": "glm-ocr",
  "prompt": "OCR this image",
  "options": {"token_repeat_limit": 100}
}'

# Via Modelfile
PARAMETER token_repeat_limit 100

Test plan

  • Verify token_repeat_limit: 0 (default) disables repeat detection — no false positives on repeated dots/patterns
  • Verify token_repeat_limit: 100 stops generation after 100 consecutive identical tokens
  • Verify response includes done: true and done_reason: "repeat" when limit is hit
  • Verify OpenAI /v1/chat/completions endpoint returns finish_reason: "stop" (not "repeat")
  • Verify existing tests pass

Fixes #14117


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/14523 **Author:** [@4RH1T3CT0R7](https://github.com/4RH1T3CT0R7) **Created:** 3/1/2026 **Status:** 🔄 Open **Base:** `main` ← **Head:** `fix/token-repeat-limit-configurable` --- ### 📝 Commits (1) - [`c0f67e6`](https://github.com/ollama/ollama/commit/c0f67e6b50129c594d9c07af15b7dd5d4b9001f7) llm: make token repeat limit configurable and fix done response ### 📊 Changes **6 files changed** (+37 additions, -9 deletions) <details> <summary>View changed files</summary> 📝 `api/types.go` (+1 -0) 📝 `docs/api.md` (+1 -0) 📝 `docs/modelfile.mdx` (+1 -0) 📝 `docs/openapi.yaml` (+3 -0) 📝 `llm/server.go` (+18 -9) 📝 `openai/openai.go` (+13 -0) </details> ### 📄 Description ## Summary - Make the hardcoded token repeat limit (30) configurable via new `token_repeat_limit` option (default: 0 = disabled) - Fix incorrect `done: false` response when repeat limit is hit — now properly sends `done: true` with `done_reason: "repeat"` - Skip empty/whitespace-only tokens in repeat tracking to prevent false positives - Map `"repeat"` to `"stop"` in OpenAI compatibility layer for spec compliance ## Context The hardcoded repeat limit of 30 tokens in `llm/server.go` caused false positives on valid model output (e.g., OCR dots `. . . . . . .` as reported in #14117). When triggered, the function returned `ctx.Err()` without sending a final `done: true` response, leaving clients in an ambiguous state. ## Changes | File | Change | |------|--------| | `llm/server.go` | Add `DoneReasonRepeat` enum, fix repeat limit check to use configurable option and send proper done response, skip empty tokens | | `api/types.go` | Add `TokenRepeatLimit` field to `Options` struct | | `openai/openai.go` | Map `"repeat"` finish reason to `"stop"` for OpenAI API compatibility | | `docs/modelfile.mdx` | Document `token_repeat_limit` parameter | | `docs/api.md` | Add option to API docs example | | `docs/openapi.yaml` | Add to OpenAPI specification | ## Usage ```bash # Via API option curl http://localhost:11434/api/generate -d '{ "model": "glm-ocr", "prompt": "OCR this image", "options": {"token_repeat_limit": 100} }' # Via Modelfile PARAMETER token_repeat_limit 100 ``` ## Test plan - [ ] Verify `token_repeat_limit: 0` (default) disables repeat detection — no false positives on repeated dots/patterns - [ ] Verify `token_repeat_limit: 100` stops generation after 100 consecutive identical tokens - [ ] Verify response includes `done: true` and `done_reason: "repeat"` when limit is hit - [ ] Verify OpenAI `/v1/chat/completions` endpoint returns `finish_reason: "stop"` (not `"repeat"`) - [ ] Verify existing tests pass Fixes #14117 --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-23 01:27:01 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#40584