[PR #14040] [MERGED] server: optimize chatPrompt to reduce tokenization calls #40357

Closed
opened 2026-04-23 01:16:16 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/14040
Author: @jmorganca
Created: 2/3/2026
Status: Merged
Merged: 2/4/2026
Merged by: @jmorganca

Base: mainHead: ollama-efficient-render


📝 Commits (3)

  • 7a2bd10 server: optimize chatPrompt to reduce tokenization calls
  • 62c0b20 Update server/prompt.go
  • 3cf7b10 fix loop to include lastMsgIdx in iteration

📊 Changes

2 files changed (+84 additions, -14 deletions)

View changed files

📝 server/prompt.go (+18 -14)
📝 server/prompt_test.go (+66 -0)

📄 Description

Change the truncation algorithm to start with all messages and remove from the front until it fits, rather than adding messages one at a time from the back. This reduces tokenization calls from O(n) to O(1) in the common case where all messages fit in context.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/14040 **Author:** [@jmorganca](https://github.com/jmorganca) **Created:** 2/3/2026 **Status:** ✅ Merged **Merged:** 2/4/2026 **Merged by:** [@jmorganca](https://github.com/jmorganca) **Base:** `main` ← **Head:** `ollama-efficient-render` --- ### 📝 Commits (3) - [`7a2bd10`](https://github.com/ollama/ollama/commit/7a2bd108457a4768ceae48313653773413e2e099) server: optimize chatPrompt to reduce tokenization calls - [`62c0b20`](https://github.com/ollama/ollama/commit/62c0b2097dcc98c0915cfe442bb726f3ce077022) Update server/prompt.go - [`3cf7b10`](https://github.com/ollama/ollama/commit/3cf7b10b9edecb1e0a066754221c13e75ee5dfe2) fix loop to include lastMsgIdx in iteration ### 📊 Changes **2 files changed** (+84 additions, -14 deletions) <details> <summary>View changed files</summary> 📝 `server/prompt.go` (+18 -14) 📝 `server/prompt_test.go` (+66 -0) </details> ### 📄 Description Change the truncation algorithm to start with all messages and remove from the front until it fits, rather than adding messages one at a time from the back. This reduces tokenization calls from O(n) to O(1) in the common case where all messages fit in context. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-23 01:16:16 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#40357