[PR #442] [MERGED] treat stop as stop sequences, not exact tokens #15422

Closed
opened 2026-04-16 04:58:49 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/442
Author: @sqs
Created: 8/30/2023
Status: Merged
Merged: 8/30/2023
Merged by: @jmorganca

Base: mainHead: stop-token-contains


📝 Commits (1)

  • 33ae533 treat stop as stop sequences, not exact tokens

📊 Changes

4 files changed (+109 additions, -17 deletions)

View changed files

📝 docs/modelfile.md (+1 -1)
📝 llm/llama.go (+28 -15)
llm/llama_test.go (+79 -0)
📝 server/images.go (+1 -1)

📄 Description

The stop option to the generate API is a list of sequences that should cause generation to stop. Although these are commonly called "stop tokens", they do not necessarily correspond to LLM tokens (per the LLM's tokenizer). For example, if the caller sends a generate request with "stop":["\n"], then generation should stop on any token containing \n (and trim \n from the output), not just if the token exactly matches \n. If stop were interpreted strictly as LLM tokens, then it would require callers of the generate API to know the LLM's tokenizer and enumerate many tokens in the stop list.

Fixes https://github.com/jmorganca/ollama/issues/295.

Example output (note that generation ends on a token not that is truncated to n because the stop sequence is ot):

% curl -d '{"prompt":"const primes=[1,2,3,","model":"codellama:7b","options":{"seed":1337,"temperature":0,"num_ctx":100,"stop":["ot"]}}' http://localhost:11434/api/generate
{"model":"codellama:7b","created_at":"2023-08-30T05:17:54.435096Z","response":" The","done":false}
{"model":"codellama:7b","created_at":"2023-08-30T05:17:54.486337Z","response":" code","done":false}
{"model":"codellama:7b","created_at":"2023-08-30T05:17:54.53943Z","response":" you","done":false}
{"model":"codellama:7b","created_at":"2023-08-30T05:17:54.593747Z","response":" provided","done":false}
{"model":"codellama:7b","created_at":"2023-08-30T05:17:54.648514Z","response":" is","done":false}
{"model":"codellama:7b","created_at":"2023-08-30T05:17:54.702975Z","response":" n","done":false}
{"model":"codellama:7b","created_at":"2023-08-30T05:17:54.702999Z","done":true, ...}

🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/442 **Author:** [@sqs](https://github.com/sqs) **Created:** 8/30/2023 **Status:** ✅ Merged **Merged:** 8/30/2023 **Merged by:** [@jmorganca](https://github.com/jmorganca) **Base:** `main` ← **Head:** `stop-token-contains` --- ### 📝 Commits (1) - [`33ae533`](https://github.com/ollama/ollama/commit/33ae533c885001ffa52943b19894a4aa1505bd20) treat stop as stop sequences, not exact tokens ### 📊 Changes **4 files changed** (+109 additions, -17 deletions) <details> <summary>View changed files</summary> 📝 `docs/modelfile.md` (+1 -1) 📝 `llm/llama.go` (+28 -15) ➕ `llm/llama_test.go` (+79 -0) 📝 `server/images.go` (+1 -1) </details> ### 📄 Description The `stop` option to the generate API is a list of sequences that should cause generation to stop. Although these are commonly called "stop tokens", they do not necessarily correspond to LLM tokens (per the LLM's tokenizer). For example, if the caller sends a generate request with `"stop":["\n"]`, then generation should stop on any token containing `\n` (and trim `\n` from the output), not just if the token exactly matches `\n`. If `stop` were interpreted strictly as LLM tokens, then it would require callers of the generate API to know the LLM's tokenizer and enumerate many tokens in the `stop` list. Fixes https://github.com/jmorganca/ollama/issues/295. Example output (note that generation ends on a token ` not` that is truncated to ` n` because the stop sequence is `ot`): ``` % curl -d '{"prompt":"const primes=[1,2,3,","model":"codellama:7b","options":{"seed":1337,"temperature":0,"num_ctx":100,"stop":["ot"]}}' http://localhost:11434/api/generate {"model":"codellama:7b","created_at":"2023-08-30T05:17:54.435096Z","response":" The","done":false} {"model":"codellama:7b","created_at":"2023-08-30T05:17:54.486337Z","response":" code","done":false} {"model":"codellama:7b","created_at":"2023-08-30T05:17:54.53943Z","response":" you","done":false} {"model":"codellama:7b","created_at":"2023-08-30T05:17:54.593747Z","response":" provided","done":false} {"model":"codellama:7b","created_at":"2023-08-30T05:17:54.648514Z","response":" is","done":false} {"model":"codellama:7b","created_at":"2023-08-30T05:17:54.702975Z","response":" n","done":false} {"model":"codellama:7b","created_at":"2023-08-30T05:17:54.702999Z","done":true, ...} ``` --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-16 04:58:49 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#15422