[PR #11530] [MERGED] kvcache: Group shift operations into batches #75866

Closed
opened 2026-05-05 08:17:36 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/11530
Author: @jessegross
Created: 7/25/2025
Status: Merged
Merged: 7/25/2025
Merged by: @jessegross

Base: mainHead: jessegross/shift


📝 Commits (1)

  • f56333b kvcache: Group shift operations into batches

📊 Changes

1 file changed (+36 additions, -29 deletions)

View changed files

📝 kvcache/causal.go (+36 -29)

📄 Description

Currently, when we need to do a shift on the cache, it is one RoPE operation on the entire size of the cache (per layer). In some cases, this can create a compute graph that is larger than the forward pass since the forward pass is working in batches. Since we don't consider shifting in our memory estimates, it's possible for this to cause a crash if we run out of memory.

By limiting the size of the RoPE calls to batch size chunks, we ensure that the shift will never exceed the size of the forward pass, since the forward pass will also contain a RoPE of the same size. This does not have a sigificant impact on performance since RoPE is a math operation that is mostly proportional to the size of its inputs.

In theory defrag could have the same issue since it also creates a compute graph outside of the forward pass, however, since it is only copies, it does not require any working space.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/11530 **Author:** [@jessegross](https://github.com/jessegross) **Created:** 7/25/2025 **Status:** ✅ Merged **Merged:** 7/25/2025 **Merged by:** [@jessegross](https://github.com/jessegross) **Base:** `main` ← **Head:** `jessegross/shift` --- ### 📝 Commits (1) - [`f56333b`](https://github.com/ollama/ollama/commit/f56333b8a1825da566552ceb1758546998a25756) kvcache: Group shift operations into batches ### 📊 Changes **1 file changed** (+36 additions, -29 deletions) <details> <summary>View changed files</summary> 📝 `kvcache/causal.go` (+36 -29) </details> ### 📄 Description Currently, when we need to do a shift on the cache, it is one RoPE operation on the entire size of the cache (per layer). In some cases, this can create a compute graph that is larger than the forward pass since the forward pass is working in batches. Since we don't consider shifting in our memory estimates, it's possible for this to cause a crash if we run out of memory. By limiting the size of the RoPE calls to batch size chunks, we ensure that the shift will never exceed the size of the forward pass, since the forward pass will also contain a RoPE of the same size. This does not have a sigificant impact on performance since RoPE is a math operation that is mostly proportional to the size of its inputs. In theory defrag could have the same issue since it also creates a compute graph outside of the forward pass, however, since it is only copies, it does not require any working space. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-05-05 08:17:36 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#75866