[GH-ISSUE #12065] Does Ollama support --ignore-eos? #8013

Closed
opened 2026-04-12 20:14:04 -05:00 by GiteaMirror · 1 comment
Owner

Originally created by @DoubleRedX on GitHub (Aug 25, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/12065

It seems that "num_predict" parameter is equivalent to restricting the upper bound of token generation. If I want to specify the length of the output token as 128, I neesd the cooperation of '--ignore-eos'.I found the corresonding parameter in llama.cpp. Does Ollama support it?

llama.cpp code

if (params.sampling.ignore_eos) {
    for (llama_token i = 0; i < llama_vocab_n_tokens(vocab); i++) {
        if (llama_vocab_is_eog(vocab, i)) {
            LOG_INF("%s: added %s logit bias = %f\n", __func__, common_token_to_piece(lctx, i).c_str(), -INFINITY);
            params.sampling.logit_bias.push_back({i, -INFINITY});
        }
    }
}
Originally created by @DoubleRedX on GitHub (Aug 25, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/12065 It seems that "num_predict" parameter is equivalent to restricting the upper bound of token generation. If I want to specify the length of the output token as 128, I neesd the cooperation of '--ignore-eos'.I found the corresonding parameter in llama.cpp. Does Ollama support it? llama.cpp code if (params.sampling.ignore_eos) { for (llama_token i = 0; i < llama_vocab_n_tokens(vocab); i++) { if (llama_vocab_is_eog(vocab, i)) { LOG_INF("%s: added %s logit bias = %f\n", __func__, common_token_to_piece(lctx, i).c_str(), -INFINITY); params.sampling.logit_bias.push_back({i, -INFINITY}); } } }
Author
Owner

@DoubleRedX commented on GitHub (Aug 25, 2025):

runner/llamarunner/runner.go

	// if it's an end of sequence token, break
	if s.model.TokenIsEog(token) {
		// TODO (jmorganca): we should send this back
		// as it's important for the /api/generate context
		// seq.responses <- piece

		s.removeSequence(i, llm.DoneReasonStop)
		continue
	}

Found it. The cpp code doesn't take effect which shouble be ignore.

<!-- gh-comment-id:3220060997 --> @DoubleRedX commented on GitHub (Aug 25, 2025): runner/llamarunner/runner.go // if it's an end of sequence token, break if s.model.TokenIsEog(token) { // TODO (jmorganca): we should send this back // as it's important for the /api/generate context // seq.responses <- piece s.removeSequence(i, llm.DoneReasonStop) continue } Found it. The cpp code doesn't take effect which shouble be ignore.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#8013