[PR #14912] [CLOSED] Fix RotatingKVCache trim and wrap handling for sliding window attention #77205

Closed
opened 2026-05-05 09:53:27 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/14912
Author: @dhiltgen
Created: 3/17/2026
Status: Closed

Base: mainHead: fix-rotating-kv-cache


📝 Commits (1)

  • 996242f Fix RotatingKVCache trim and wrap handling for sliding window attention

📊 Changes

3 files changed (+303 additions, -8 deletions)

View changed files

📝 x/mlxrunner/cache.go (+12 -5)
📝 x/mlxrunner/cache/cache.go (+37 -3)
x/mlxrunner/cache/cache_test.go (+254 -0)

📄 Description

RotatingKVCache had several issues when the sliding window wrapped past maxSize:

  • concat() didn't handle the wrapped case (offset > maxSize), causing incorrect key/value concatenation. Now linearizes the circular buffer before appending when wrapped.
  • Trim() could corrupt state on a wrapped cache by producing negative growth sizes in subsequent update() calls. Now returns 0 to signal failure, letting the caller do a full cache reset instead.
  • State() used offset directly as slice bound, which could exceed the physical array dimension after wrapping. Now uses min(offset, dim).
  • update() growth calculation could go negative when offset exceeded maxSize with keys still smaller than maxSize. Added a safety clamp.
  • kvCache.trimToPrefix() now returns bool to indicate success/failure, and callers handle trim failure with a full cache reset rather than silently continuing with corrupted state.

🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/14912 **Author:** [@dhiltgen](https://github.com/dhiltgen) **Created:** 3/17/2026 **Status:** ❌ Closed **Base:** `main` ← **Head:** `fix-rotating-kv-cache` --- ### 📝 Commits (1) - [`996242f`](https://github.com/ollama/ollama/commit/996242f0292d2f1a23478990957f2d7768658a29) Fix RotatingKVCache trim and wrap handling for sliding window attention ### 📊 Changes **3 files changed** (+303 additions, -8 deletions) <details> <summary>View changed files</summary> 📝 `x/mlxrunner/cache.go` (+12 -5) 📝 `x/mlxrunner/cache/cache.go` (+37 -3) ➕ `x/mlxrunner/cache/cache_test.go` (+254 -0) </details> ### 📄 Description RotatingKVCache had several issues when the sliding window wrapped past maxSize: - concat() didn't handle the wrapped case (offset > maxSize), causing incorrect key/value concatenation. Now linearizes the circular buffer before appending when wrapped. - Trim() could corrupt state on a wrapped cache by producing negative growth sizes in subsequent update() calls. Now returns 0 to signal failure, letting the caller do a full cache reset instead. - State() used offset directly as slice bound, which could exceed the physical array dimension after wrapping. Now uses min(offset, dim). - update() growth calculation could go negative when offset exceeded maxSize with keys still smaller than maxSize. Added a safety clamp. - kvCache.trimToPrefix() now returns bool to indicate success/failure, and callers handle trim failure with a full cache reset rather than silently continuing with corrupted state. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-05-05 09:53:27 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#77205