[PR #6831] [MERGED] cache: Clear old KV cache entries when evicting a slot #22774

Closed
opened 2026-04-19 16:33:37 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/6831
Author: @jessegross
Created: 9/16/2024
Status: Merged
Merged: 9/16/2024
Merged by: @jessegross

Base: jmorganca/llamaHead: jessegross/kv


📝 Commits (1)

  • 6c78236 cache: Clear old KV cache entries when evicting a slot

📊 Changes

1 file changed (+1 additions, -0 deletions)

View changed files

📝 llama/runner/cache.go (+1 -0)

📄 Description

When forking a cache entry, if no empty slots are available we evict the least recently used one and copy over the KV entries from the closest match. However, this copy does not overwrite existing values but only adds new ones. Therefore, we need to clear the old slot first.

This change fixes two issues:

  • The KV cache fills up and runs out of space even though we think we are managing it correctly
  • Performance gets worse over time as we use new cache entries that are not hot in the processor caches

🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/6831 **Author:** [@jessegross](https://github.com/jessegross) **Created:** 9/16/2024 **Status:** ✅ Merged **Merged:** 9/16/2024 **Merged by:** [@jessegross](https://github.com/jessegross) **Base:** `jmorganca/llama` ← **Head:** `jessegross/kv` --- ### 📝 Commits (1) - [`6c78236`](https://github.com/ollama/ollama/commit/6c782361f42f59b3fccad8abcbda0e1010b768c7) cache: Clear old KV cache entries when evicting a slot ### 📊 Changes **1 file changed** (+1 additions, -0 deletions) <details> <summary>View changed files</summary> 📝 `llama/runner/cache.go` (+1 -0) </details> ### 📄 Description When forking a cache entry, if no empty slots are available we evict the least recently used one and copy over the KV entries from the closest match. However, this copy does not overwrite existing values but only adds new ones. Therefore, we need to clear the old slot first. This change fixes two issues: - The KV cache fills up and runs out of space even though we think we are managing it correctly - Performance gets worse over time as we use new cache entries that are not hot in the processor caches --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-19 16:33:38 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#22774