[PR #10882] [MERGED] kvcache: Skip computing causal mask for worst case graph reservation #44641

Closed
opened 2026-04-25 00:15:02 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/10882
Author: @jessegross
Created: 5/27/2025
Status: Merged
Merged: 5/27/2025
Merged by: @jessegross

Base: mainHead: jessegross/cache_start


📝 Commits (1)

  • 9cb1752 kvcache: Skip computing causal mask for worst case graph reservation

📊 Changes

1 file changed (+12 additions, -1 deletions)

View changed files

📝 kvcache/causal.go (+12 -1)

📄 Description

Computing an attention mask for a large context and max batch is expensive - over 100ms. Models like Gemma3 that have multiple types of caches and custom attention masks need to do this 4 times, so this adds approximately 500ms to startup time when using 128k context

When we are reserving the worst case graph, we don't need the mask, only its shape, so we can skip this.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/10882 **Author:** [@jessegross](https://github.com/jessegross) **Created:** 5/27/2025 **Status:** ✅ Merged **Merged:** 5/27/2025 **Merged by:** [@jessegross](https://github.com/jessegross) **Base:** `main` ← **Head:** `jessegross/cache_start` --- ### 📝 Commits (1) - [`9cb1752`](https://github.com/ollama/ollama/commit/9cb1752bb61b57c8e5c28db2d7e8aeb40fe2b455) kvcache: Skip computing causal mask for worst case graph reservation ### 📊 Changes **1 file changed** (+12 additions, -1 deletions) <details> <summary>View changed files</summary> 📝 `kvcache/causal.go` (+12 -1) </details> ### 📄 Description Computing an attention mask for a large context and max batch is expensive - over 100ms. Models like Gemma3 that have multiple types of caches and custom attention masks need to do this 4 times, so this adds approximately 500ms to startup time when using 128k context When we are reserving the worst case graph, we don't need the mask, only its shape, so we can skip this. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-25 00:15:02 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#44641