[PR #9604] [MERGED] model: Update encoder cache to use multimodal input processing handler #23551

Closed
opened 2026-04-19 17:04:44 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/9604
Author: @jessegross
Created: 3/9/2025
Status: Merged
Merged: 3/10/2025
Merged by: @jessegross

Base: mainHead: jessegross/encoder


📝 Commits (1)

  • 63cf507 model: Update encoder cache to use multimodal input processing handler

📊 Changes

13 files changed (+157 additions, -160 deletions)

View changed files

📝 kvcache/cache.go (+2 -1)
📝 kvcache/causal.go (+7 -6)
📝 kvcache/causal_test.go (+2 -1)
📝 kvcache/encoder.go (+6 -3)
📝 kvcache/wrapper.go (+5 -4)
model/input/input.go (+37 -0)
📝 model/model.go (+24 -59)
📝 model/model_test.go (+2 -1)
📝 model/models/llama/model.go (+2 -1)
📝 model/models/mllama/model.go (+7 -6)
📝 runner/ollamarunner/cache.go (+7 -6)
📝 runner/ollamarunner/cache_test.go (+36 -36)
📝 runner/ollamarunner/runner.go (+20 -36)

📄 Description

The encoder cache needs to know the position of images in the input stream so that it knows when to delete them. Previously images didn't have a position, so we implied one by breaking batches before an image and then assuming the image was in the first position. However, multimodal objects are now given explicit positions in the input stream, so we can use that instead.

Breaking batches was also a way to simulate a cross attention mask for mllama. However, given that it only supports a single sequence and a single image, this mask doesn't serve any real purpose. Removing the batch break does not appear to affect the quality of the output.

Most of this is simply moving the input data structures to a new package to avoid import cycles.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/9604 **Author:** [@jessegross](https://github.com/jessegross) **Created:** 3/9/2025 **Status:** ✅ Merged **Merged:** 3/10/2025 **Merged by:** [@jessegross](https://github.com/jessegross) **Base:** `main` ← **Head:** `jessegross/encoder` --- ### 📝 Commits (1) - [`63cf507`](https://github.com/ollama/ollama/commit/63cf5074ffa40a61d1ab61844ab6e704432d89b6) model: Update encoder cache to use multimodal input processing handler ### 📊 Changes **13 files changed** (+157 additions, -160 deletions) <details> <summary>View changed files</summary> 📝 `kvcache/cache.go` (+2 -1) 📝 `kvcache/causal.go` (+7 -6) 📝 `kvcache/causal_test.go` (+2 -1) 📝 `kvcache/encoder.go` (+6 -3) 📝 `kvcache/wrapper.go` (+5 -4) ➕ `model/input/input.go` (+37 -0) 📝 `model/model.go` (+24 -59) 📝 `model/model_test.go` (+2 -1) 📝 `model/models/llama/model.go` (+2 -1) 📝 `model/models/mllama/model.go` (+7 -6) 📝 `runner/ollamarunner/cache.go` (+7 -6) 📝 `runner/ollamarunner/cache_test.go` (+36 -36) 📝 `runner/ollamarunner/runner.go` (+20 -36) </details> ### 📄 Description The encoder cache needs to know the position of images in the input stream so that it knows when to delete them. Previously images didn't have a position, so we implied one by breaking batches before an image and then assuming the image was in the first position. However, multimodal objects are now given explicit positions in the input stream, so we can use that instead. Breaking batches was also a way to simulate a cross attention mask for mllama. However, given that it only supports a single sequence and a single image, this mask doesn't serve any real purpose. Removing the batch break does not appear to affect the quality of the output. Most of this is simply moving the input data structures to a new package to avoid import cycles. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-19 17:04:44 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#23551