[PR #12543] [MERGED] kvcache: Clean up sliding window state with independent batches #24406

Closed
opened 2026-04-19 17:33:30 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/12543
Author: @jessegross
Created: 10/8/2025
Status: Merged
Merged: 10/8/2025
Merged by: @jessegross

Base: mainHead: jessegross/swa


📝 Commits (1)

  • cde4464 kvcache: Clean up sliding window state with independent batches

📊 Changes

2 files changed (+139 additions, -28 deletions)

View changed files

📝 kvcache/causal.go (+42 -14)
📝 kvcache/causal_test.go (+97 -14)

📄 Description

Sliding windows models (e.g. gpt-oss, gemma3) remove tokens that are out of the cache's window each time we start a new forward pass.

The cache storage needs to handle the window size for each sequence plus the batch size, since the batch needs to attend to the full window size. This means that we have greater than a window size stored while processing the batch.

When the next batch comes, we are currently only looking at the sequences in the incoming batch to slide the window forward. However, we also need to clean up the other sequences that might be occupying space in the batch processing buffer to ensure each sequence is only using its window size of storage. Failure to do this can result in "no kv cache slot found" errors.

Fixes: #10127


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/12543 **Author:** [@jessegross](https://github.com/jessegross) **Created:** 10/8/2025 **Status:** ✅ Merged **Merged:** 10/8/2025 **Merged by:** [@jessegross](https://github.com/jessegross) **Base:** `main` ← **Head:** `jessegross/swa` --- ### 📝 Commits (1) - [`cde4464`](https://github.com/ollama/ollama/commit/cde44642ebb9740e9eab5737f5623761e8236434) kvcache: Clean up sliding window state with independent batches ### 📊 Changes **2 files changed** (+139 additions, -28 deletions) <details> <summary>View changed files</summary> 📝 `kvcache/causal.go` (+42 -14) 📝 `kvcache/causal_test.go` (+97 -14) </details> ### 📄 Description Sliding windows models (e.g. gpt-oss, gemma3) remove tokens that are out of the cache's window each time we start a new forward pass. The cache storage needs to handle the window size for each sequence plus the batch size, since the batch needs to attend to the full window size. This means that we have greater than a window size stored while processing the batch. When the next batch comes, we are currently only looking at the sequences in the incoming batch to slide the window forward. However, we also need to clean up the other sequences that might be occupying space in the batch processing buffer to ensure each sequence is only using its window size of storage. Failure to do this can result in "no kv cache slot found" errors. Fixes: #10127 --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-19 17:33:30 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#24406