[PR #3080] [MERGED] token repeat limit for prediction requests #73350

Closed
opened 2026-05-05 05:08:27 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/3080
Author: @BruceMacD
Created: 3/12/2024
Status: Merged
Merged: 3/13/2024
Merged by: @BruceMacD

Base: mainHead: brucemacd/json-looping


📝 Commits (3)

📊 Changes

1 file changed (+27 additions, -7 deletions)

View changed files

📝 llm/dyn_ext_server.go (+27 -7)

📄 Description

  • abort prediction (generate/chat) requests when a token repeat limit is hit
  • this prevent json format infinite loops
  • this prevents a stuck request from starving the other queued requests
  • move completion cancellation to its own function

Tested with this code from #1910

import requests 
import json
country = "france"
schema = {
	"city": {
		"type": "string",
		"description": "Name of the city"
	},
	"lat":{
		"type": "float",
		"description": "Decimal Latitude of the city"
	},
	"lon":{
		"type": "float",
		"description": "Decimal Longitude of the city"
	}
}
payload = {
	"model": "mistral-no-repeat",
	"messages": [
		{"role": "system", "content": f"You are a helpful AI assistant. The user will enter a country name and the assistant will return the decimal latitude and decimal longitude of the capital of the country. Output in JSON using the schema defined here: {schema}."},
		{"role": "user", "content": "japan"},
		{"role": "assistant", "content": "{\"city\": \"Tokyo\", \"lat\": 35.6748, \"lon\": 139.7624}"},
		{"role": "user", "content": country},
		],
		"format": "json",
		"stream": False
		
}
response = requests.post ("http://localhost:11434/api/chat", json=payload)
response.raise_for_status()
chat = response.json()
try:
    message_content_json = json.loads(chat['message']['content'])
    print(message_content_json)
except json.JSONDecodeError:
    print("JSONDecodeError: The content is not in proper JSON format.")

output is more reliable, with occasional JSON format failures which can be handled with a retry:

bruce@Bruces-MBP triage % poetry run python3 make_json_request.py
{'city': 'Paris', ' lat ': 48.8566, ' lon': 2.3522}
bruce@Bruces-MBP triage % poetry run python3 make_json_request.py
{'city': 'Paris', 'lat': 48.8566, ' lon ': 2.3522}
bruce@Bruces-MBP triage % poetry run python3 make_json_request.py
{'city': 'Paris', 'lat': 48.8566, ' lon ': 2.3522}
bruce@Bruces-MBP triage % poetry run python3 make_json_request.py
{'city': 'Paris', ' lat': 48.8566, ' lon': 2.3522}
bruce@Bruces-MBP triage % poetry run python3 make_json_request.py
{'city': 'Paris', 'lat': 48.8566, ' lon ': 2.3522}
bruce@Bruces-MBP triage % poetry run python3 make_json_request.py
{'city': 'Paris', ' lat ': 48.8566, ' lon': 2.3522}
bruce@Bruces-MBP triage % poetry run python3 make_json_request.py
JSONDecodeError: The content is not in proper JSON format.
bruce@Bruces-MBP triage % poetry run python3 make_json_request.py
{'city': 'Paris', ' lat ': 48.8534, ' lon': 2.3522}

resolves #1910


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/3080 **Author:** [@BruceMacD](https://github.com/BruceMacD) **Created:** 3/12/2024 **Status:** ✅ Merged **Merged:** 3/13/2024 **Merged by:** [@BruceMacD](https://github.com/BruceMacD) **Base:** `main` ← **Head:** `brucemacd/json-looping` --- ### 📝 Commits (3) - [`4806747`](https://github.com/ollama/ollama/commit/480674763cc752ff893a2301dbe5739b76b7c502) token repeat limit for prediction requests - [`b2fb365`](https://github.com/ollama/ollama/commit/b2fb365508501d5e564f0ec169cf87c5b22aaeaa) if/else - [`a989d49`](https://github.com/ollama/ollama/commit/a989d49e591ea419bc504ffed1befcd590ca6f11) Revert "if/else" ### 📊 Changes **1 file changed** (+27 additions, -7 deletions) <details> <summary>View changed files</summary> 📝 `llm/dyn_ext_server.go` (+27 -7) </details> ### 📄 Description - abort prediction (generate/chat) requests when a token repeat limit is hit - this prevent json format infinite loops - this prevents a stuck request from starving the other queued requests - move completion cancellation to its own function Tested with this code from #1910 ```python import requests import json country = "france" schema = { "city": { "type": "string", "description": "Name of the city" }, "lat":{ "type": "float", "description": "Decimal Latitude of the city" }, "lon":{ "type": "float", "description": "Decimal Longitude of the city" } } payload = { "model": "mistral-no-repeat", "messages": [ {"role": "system", "content": f"You are a helpful AI assistant. The user will enter a country name and the assistant will return the decimal latitude and decimal longitude of the capital of the country. Output in JSON using the schema defined here: {schema}."}, {"role": "user", "content": "japan"}, {"role": "assistant", "content": "{\"city\": \"Tokyo\", \"lat\": 35.6748, \"lon\": 139.7624}"}, {"role": "user", "content": country}, ], "format": "json", "stream": False } response = requests.post ("http://localhost:11434/api/chat", json=payload) response.raise_for_status() chat = response.json() try: message_content_json = json.loads(chat['message']['content']) print(message_content_json) except json.JSONDecodeError: print("JSONDecodeError: The content is not in proper JSON format.") ``` output is more reliable, with occasional JSON format failures which can be handled with a retry: ```bash bruce@Bruces-MBP triage % poetry run python3 make_json_request.py {'city': 'Paris', ' lat ': 48.8566, ' lon': 2.3522} bruce@Bruces-MBP triage % poetry run python3 make_json_request.py {'city': 'Paris', 'lat': 48.8566, ' lon ': 2.3522} bruce@Bruces-MBP triage % poetry run python3 make_json_request.py {'city': 'Paris', 'lat': 48.8566, ' lon ': 2.3522} bruce@Bruces-MBP triage % poetry run python3 make_json_request.py {'city': 'Paris', ' lat': 48.8566, ' lon': 2.3522} bruce@Bruces-MBP triage % poetry run python3 make_json_request.py {'city': 'Paris', 'lat': 48.8566, ' lon ': 2.3522} bruce@Bruces-MBP triage % poetry run python3 make_json_request.py {'city': 'Paris', ' lat ': 48.8566, ' lon': 2.3522} bruce@Bruces-MBP triage % poetry run python3 make_json_request.py JSONDecodeError: The content is not in proper JSON format. bruce@Bruces-MBP triage % poetry run python3 make_json_request.py {'city': 'Paris', ' lat ': 48.8534, ' lon': 2.3522} ``` resolves #1910 --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-05-05 05:08:27 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#73350