[GH-ISSUE #15783] Go sampler (ollamarunner) silently ignores repeat_penalty, frequency_penalty, and presence_penalty #56569

Open
opened 2026-04-29 11:03:01 -05:00 by GiteaMirror · 1 comment
Owner

Originally created by @42euge on GitHub (Apr 24, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/15783

What happened

The Go-native sampler used by models on the ollamarunner path (Gemma 4, and other newer models) accepts repeat_penalty, frequency_penalty, and presence_penalty via the API but silently ignores them. Only temperature, top_k, top_p, and min_p are implemented.

The llamarunner path (used by older models) delegates these to llama.cpp's C++ sampler where they work correctly.

How to reproduce

curl http://localhost:11434/api/chat -d '{
  "model": "gemma4:e4b",
  "messages": [{"role": "user", "content": "Transcribe:", "images": ["<base64 audio>"]}],
  "options": {"repeat_penalty": 5.0}
}'

Even with repeat_penalty: 5.0, the model produces identical output to repeat_penalty: 1.0 because the value is never read by the sampler.

Where the bug is

sample/samplers.go — the Sampler struct and NewSampler() only accept temperature, topK, topP, minP, seed, and grammar. The repeat/frequency/presence penalty fields defined in api/types.go (lines 593-597) with defaults (lines 1066-1069) are never passed through.

runner/ollamarunner/runner.go line ~890 — the NewSampler() call only passes 6 arguments, omitting the penalty options from req.Options.

Impact

This affects all models using the ollamarunner path. It's most visible on audio transcription (Gemma 4 e4b) where the lack of repeat penalty causes severe repetition loops — 84-93% word error rate on longer utterances due to the model generating the same phrases in a loop.

Benchmark results from a 7-sentence voice corpus (actual voice recordings):

Sentence type Without penalty (current) With penalty (fix)
Simple sentence 0.0% WER 0.0% WER
Long paragraph (25s) 84.7% WER 30.6% WER
Technical jargon 60.6% WER 33.3% WER
Paragraph (15s) 2.5% WER 0.0% WER

Proposed fix

  1. Add repeatPenalty, frequencyPenalty, presencePenalty, and repeatLastN fields to the Sampler struct
  2. Implement a repeatPenalize() transform in sample/transforms.go matching llama.cpp's algorithm
  3. Maintain a token history ring buffer (capped at repeatLastN)
  4. Apply the penalty before topK/temperature/softmax in the sampling pipeline
  5. Wire the options through in runner/ollamarunner/runner.go

I have a working implementation with tests ready to submit as a PR. ~224 lines across 6 files (including tests).

Related: #9278 (sampler interface TODO)

Originally created by @42euge on GitHub (Apr 24, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/15783 ## What happened The Go-native sampler used by models on the `ollamarunner` path (Gemma 4, and other newer models) accepts `repeat_penalty`, `frequency_penalty`, and `presence_penalty` via the API but silently ignores them. Only `temperature`, `top_k`, `top_p`, and `min_p` are implemented. The `llamarunner` path (used by older models) delegates these to llama.cpp's C++ sampler where they work correctly. ## How to reproduce ```bash curl http://localhost:11434/api/chat -d '{ "model": "gemma4:e4b", "messages": [{"role": "user", "content": "Transcribe:", "images": ["<base64 audio>"]}], "options": {"repeat_penalty": 5.0} }' ``` Even with `repeat_penalty: 5.0`, the model produces identical output to `repeat_penalty: 1.0` because the value is never read by the sampler. ## Where the bug is `sample/samplers.go` — the `Sampler` struct and `NewSampler()` only accept `temperature`, `topK`, `topP`, `minP`, `seed`, and `grammar`. The repeat/frequency/presence penalty fields defined in `api/types.go` (lines 593-597) with defaults (lines 1066-1069) are never passed through. `runner/ollamarunner/runner.go` line ~890 — the `NewSampler()` call only passes 6 arguments, omitting the penalty options from `req.Options`. ## Impact This affects all models using the `ollamarunner` path. It's most visible on audio transcription (Gemma 4 e4b) where the lack of repeat penalty causes severe repetition loops — 84-93% word error rate on longer utterances due to the model generating the same phrases in a loop. Benchmark results from a 7-sentence voice corpus (actual voice recordings): | Sentence type | Without penalty (current) | With penalty (fix) | |--------------|--------------------------|-------------------| | Simple sentence | 0.0% WER | 0.0% WER | | Long paragraph (25s) | 84.7% WER | 30.6% WER | | Technical jargon | 60.6% WER | 33.3% WER | | Paragraph (15s) | 2.5% WER | 0.0% WER | ## Proposed fix 1. Add `repeatPenalty`, `frequencyPenalty`, `presencePenalty`, and `repeatLastN` fields to the `Sampler` struct 2. Implement a `repeatPenalize()` transform in `sample/transforms.go` matching llama.cpp's algorithm 3. Maintain a token history ring buffer (capped at `repeatLastN`) 4. Apply the penalty before topK/temperature/softmax in the sampling pipeline 5. Wire the options through in `runner/ollamarunner/runner.go` I have a working implementation with tests ready to submit as a PR. ~224 lines across 6 files (including tests). Related: #9278 (sampler interface TODO)
Author
Owner

@42euge commented on GitHub (Apr 24, 2026):

Let me know if I need to run additional tests to get this merged. Thanks!

<!-- gh-comment-id:4317089417 --> @42euge commented on GitHub (Apr 24, 2026): Let me know if I need to run additional tests to get this merged. Thanks!
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#56569