[PR #15212] sample: add segment-level repetition loop detection #77372

Open
opened 2026-05-05 10:03:01 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/15212
Author: @Frank-Schruefer
Created: 4/2/2026
Status: 🔄 Open

Base: mainHead: feature/repeat-line-loop-detection


📝 Commits (2)

  • 7329bcf sample: add segment-level repetition loop detection
  • cc18cbf sample: add ':' to default repeat_line_delimiters

📊 Changes

7 files changed (+407 additions, -59 deletions)

View changed files

📝 api/types.go (+16 -0)
📝 runner/llamarunner/runner.go (+123 -34)
📝 runner/ollamarunner/runner.go (+18 -0)
📝 sample/samplers.go (+104 -14)
📝 sample/samplers_benchmark_test.go (+4 -4)
📝 sample/samplers_test.go (+96 -6)
📝 x/mlxrunner/pipeline.go (+46 -1)

📄 Description

Problem

Thinking models (and other models at low temperature) can enter infinite
repetition loops where the same sentence or paragraph repeats endlessly.
The existing repeat_penalty mechanism operates on token-level n-gram
recency and cannot break multi-token phrase repetition — the penalty
window would need to cover the entire repeated phrase to be effective,
which is impractical.

Solution

Add segment-level loop detection with four new sampling options:

Option Type Default Description
repeat_line_window int 0 (off) Number of past segments to track; 0 disables the feature
repeat_line_delimiters string "\n.!?" Characters that end a segment
repeat_line_temp_boost float32 0.5 Temperature added when a loop is detected
repeat_line_min_length int 20 Minimum segment length to consider (avoids false positives from short phrases like "Ok.")

When repeat_line_window > 0 the sampler accumulates token text into
segments separated by delimiter characters. When a completed segment
exactly matches any of the last repeat_line_window segments, the
effective sampling temperature is raised by repeat_line_temp_boost
for all subsequent tokens until a non-matching segment breaks the loop.

The minimum effective temperature during a loop is clamped to
repeat_line_temp_boost * 2, ensuring the boost is meaningful even
when base temperature is 0 (greedy decoding).

Design decisions

  • Off by default (repeat_line_window=0): no behaviour change for existing users
  • Exact-match only: substring/fuzzy matching adds complexity and the exact-match catches the real-world failure mode (deterministic loops produce identical text)
  • Segment-level, not token-level: the loop manifests at the phrase level; token-level detection misses it
  • Additive temperature boost: preserves the relative shape of the distribution while injecting enough randomness to escape the attractor

Implementation

  • sample/samplers.go: core detection and boost logic (ollamarunner path)
  • runner/llamarunner/runner.go: loop detection for the llama.cpp runner via llama.SamplingContext recreation
  • x/mlxrunner/pipeline.go: loop detection for the MLX runner via direct Sampler.Temperature mutation
  • api/types.go: new fields on Options, defaults in DefaultOptions()
  • sample/samplers_test.go: unit test covering detection, resolution, and sliding window

Testing

go test ./sample/...

🤖 Co-authored with Claude Code


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/15212 **Author:** [@Frank-Schruefer](https://github.com/Frank-Schruefer) **Created:** 4/2/2026 **Status:** 🔄 Open **Base:** `main` ← **Head:** `feature/repeat-line-loop-detection` --- ### 📝 Commits (2) - [`7329bcf`](https://github.com/ollama/ollama/commit/7329bcf8a42cd5a4ce240deef376de2e0976514e) sample: add segment-level repetition loop detection - [`cc18cbf`](https://github.com/ollama/ollama/commit/cc18cbf739a836230131e448e7d9455474aa1023) sample: add ':' to default repeat_line_delimiters ### 📊 Changes **7 files changed** (+407 additions, -59 deletions) <details> <summary>View changed files</summary> 📝 `api/types.go` (+16 -0) 📝 `runner/llamarunner/runner.go` (+123 -34) 📝 `runner/ollamarunner/runner.go` (+18 -0) 📝 `sample/samplers.go` (+104 -14) 📝 `sample/samplers_benchmark_test.go` (+4 -4) 📝 `sample/samplers_test.go` (+96 -6) 📝 `x/mlxrunner/pipeline.go` (+46 -1) </details> ### 📄 Description ## Problem Thinking models (and other models at low temperature) can enter infinite repetition loops where the same sentence or paragraph repeats endlessly. The existing `repeat_penalty` mechanism operates on token-level n-gram recency and cannot break multi-token phrase repetition — the penalty window would need to cover the entire repeated phrase to be effective, which is impractical. ## Solution Add segment-level loop detection with four new sampling options: | Option | Type | Default | Description | |---|---|---|---| | `repeat_line_window` | int | 0 (off) | Number of past segments to track; 0 disables the feature | | `repeat_line_delimiters` | string | `"\n.!?"` | Characters that end a segment | | `repeat_line_temp_boost` | float32 | 0.5 | Temperature added when a loop is detected | | `repeat_line_min_length` | int | 20 | Minimum segment length to consider (avoids false positives from short phrases like "Ok.") | When `repeat_line_window > 0` the sampler accumulates token text into segments separated by delimiter characters. When a completed segment exactly matches any of the last `repeat_line_window` segments, the effective sampling temperature is raised by `repeat_line_temp_boost` for all subsequent tokens until a non-matching segment breaks the loop. The minimum effective temperature during a loop is clamped to `repeat_line_temp_boost * 2`, ensuring the boost is meaningful even when base temperature is 0 (greedy decoding). ## Design decisions - **Off by default** (`repeat_line_window=0`): no behaviour change for existing users - **Exact-match only**: substring/fuzzy matching adds complexity and the exact-match catches the real-world failure mode (deterministic loops produce identical text) - **Segment-level, not token-level**: the loop manifests at the phrase level; token-level detection misses it - **Additive temperature boost**: preserves the relative shape of the distribution while injecting enough randomness to escape the attractor ## Implementation - `sample/samplers.go`: core detection and boost logic (ollamarunner path) - `runner/llamarunner/runner.go`: loop detection for the llama.cpp runner via `llama.SamplingContext` recreation - `x/mlxrunner/pipeline.go`: loop detection for the MLX runner via direct `Sampler.Temperature` mutation - `api/types.go`: new fields on `Options`, defaults in `DefaultOptions()` - `sample/samplers_test.go`: unit test covering detection, resolution, and sliding window ## Testing ``` go test ./sample/... ``` 🤖 Co-authored with [Claude Code](https://claude.ai/claude-code) --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-05-05 10:03:01 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#77372