[PR #7500] [MERGED] prompt: Use a single token when estimating mllama context size #59140

Closed
opened 2026-04-29 14:01:55 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/7500
Author: @jessegross
Created: 11/5/2024
Status: Merged
Merged: 11/5/2024
Merged by: @jessegross

Base: mainHead: jessegross/mllama_tokens


📝 Commits (1)

  • ac7be77 prompt: Use a single token when estimating mllama context size

📊 Changes

1 file changed (+11 additions, -3 deletions)

View changed files

📝 server/prompt.go (+11 -3)

📄 Description

Currently we assume that images take 768 tokens of context size for the purposes of clipping old messages that exceed the context window. However, our mllama implementation stores the full image embedding in a single token. As a result, there is significant waste of context space.

Ideally, we would handle this more generically and have the implementation report the number of tokens. However, at the moment this would just result in a similar set of 'if' conditions in the runner plus APIs to report it back. So for now, we just keep this simple.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/7500 **Author:** [@jessegross](https://github.com/jessegross) **Created:** 11/5/2024 **Status:** ✅ Merged **Merged:** 11/5/2024 **Merged by:** [@jessegross](https://github.com/jessegross) **Base:** `main` ← **Head:** `jessegross/mllama_tokens` --- ### 📝 Commits (1) - [`ac7be77`](https://github.com/ollama/ollama/commit/ac7be77e960bd23eb06c4a24e23a33961d236860) prompt: Use a single token when estimating mllama context size ### 📊 Changes **1 file changed** (+11 additions, -3 deletions) <details> <summary>View changed files</summary> 📝 `server/prompt.go` (+11 -3) </details> ### 📄 Description Currently we assume that images take 768 tokens of context size for the purposes of clipping old messages that exceed the context window. However, our mllama implementation stores the full image embedding in a single token. As a result, there is significant waste of context space. Ideally, we would handle this more generically and have the implementation report the number of tokens. However, at the moment this would just result in a similar set of 'if' conditions in the runner plus APIs to report it back. So for now, we just keep this simple. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-29 14:01:56 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#59140