[PR #15591] [MERGED] mlx: fix RotatingKVCache.concat() dropping context on mid-rotation #61920

Closed
opened 2026-04-29 16:54:09 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/15591
Author: @dhiltgen
Created: 4/14/2026
Status: Merged
Merged: 4/15/2026
Merged by: @dhiltgen

Base: mainHead: gemma4-rotating-cache-fix


📝 Commits (1)

  • 9c6164a mlx: fix RotatingKVCache.concat() dropping context on mid-rotation

📊 Changes

2 files changed (+355 additions, -2 deletions)

View changed files

📝 x/mlxrunner/cache/cache.go (+17 -2)
x/mlxrunner/cache/rotating_multiturn_test.go (+338 -0)

📄 Description

After the rotating buffer has wrapped (c.offset > c.maxSize) a subsequent
L>1 Update() went through a slice-to-[0, c.idx) path that discarded all
slots in [c.idx, Dim), losing the older-but-still-in-window tokens the
first Q of the new batch needs for its sliding-window attention.

Linearize the circular buffer to logical order in that wrapped case so
the existing trim + concat preserves the last (maxSize - 1) old tokens.
When the buffer has not yet wrapped (c.offset <= c.maxSize), slots
[c.idx, Dim) are grow padding or stale post-rewind data, so keep
dropping them.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/15591 **Author:** [@dhiltgen](https://github.com/dhiltgen) **Created:** 4/14/2026 **Status:** ✅ Merged **Merged:** 4/15/2026 **Merged by:** [@dhiltgen](https://github.com/dhiltgen) **Base:** `main` ← **Head:** `gemma4-rotating-cache-fix` --- ### 📝 Commits (1) - [`9c6164a`](https://github.com/ollama/ollama/commit/9c6164a1b2bbee0d3af31963a7f2991fe53e6a84) mlx: fix RotatingKVCache.concat() dropping context on mid-rotation ### 📊 Changes **2 files changed** (+355 additions, -2 deletions) <details> <summary>View changed files</summary> 📝 `x/mlxrunner/cache/cache.go` (+17 -2) ➕ `x/mlxrunner/cache/rotating_multiturn_test.go` (+338 -0) </details> ### 📄 Description After the rotating buffer has wrapped (c.offset > c.maxSize) a subsequent L>1 Update() went through a slice-to-[0, c.idx) path that discarded all slots in [c.idx, Dim), losing the older-but-still-in-window tokens the first Q of the new batch needs for its sliding-window attention. Linearize the circular buffer to logical order in that wrapped case so the existing trim + concat preserves the last (maxSize - 1) old tokens. When the buffer has not yet wrapped (c.offset <= c.maxSize), slots [c.idx, Dim) are grow padding or stale post-rewind data, so keep dropping them. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-29 16:54:09 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#61920