[PR #7767] [MERGED] KV Cache Fixes #17783

Closed
opened 2026-04-16 06:13:52 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/7767
Author: @jessegross
Created: 11/20/2024
Status: Merged
Merged: 11/20/2024
Merged by: @jessegross

Base: mainHead: jessegross/kv


📝 Commits (5)

  • e1a72ba runner.go: Use correct index when retrieving embedding results
  • f34bbe2 runner.go: Retry decoding after defragmentation if needed
  • 065451f runner.go: Hard fail on errors rather than potentially infinite looping
  • 85a9b21 runner.go: Don't add inputs to cache view until actually processed
  • 26e0461 runner.go: Truncate inputs that exceed context rather than shifting

📊 Changes

4 files changed (+94 additions, -37 deletions)

View changed files

📝 integration/context_test.go (+31 -0)
📝 llama/llama.go (+10 -4)
📝 llama/runner/cache.go (+12 -4)
📝 llama/runner/runner.go (+41 -29)

📄 Description

Users have reported a number of errors related to the KV cache such as:

  • Error: "could not find a KV slot for the batch - try reducing the size of the batch or increase the context. code: 1"
  • Hanging due to infinite loops
  • Output that ends unexpectedly
  • Slower performance than before when passing inputs that are much longer than the context size

This aims to both fix these problems and continue to make this area of the code less error prone.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/7767 **Author:** [@jessegross](https://github.com/jessegross) **Created:** 11/20/2024 **Status:** ✅ Merged **Merged:** 11/20/2024 **Merged by:** [@jessegross](https://github.com/jessegross) **Base:** `main` ← **Head:** `jessegross/kv` --- ### 📝 Commits (5) - [`e1a72ba`](https://github.com/ollama/ollama/commit/e1a72bacb49120fd8cdbe07e6ae24f4a6c58cf1d) runner.go: Use correct index when retrieving embedding results - [`f34bbe2`](https://github.com/ollama/ollama/commit/f34bbe2d0d26a405f0d4135fa3cbb17654b16ecd) runner.go: Retry decoding after defragmentation if needed - [`065451f`](https://github.com/ollama/ollama/commit/065451f66889d8af9d01f8b768c73ef4d5f9cf7f) runner.go: Hard fail on errors rather than potentially infinite looping - [`85a9b21`](https://github.com/ollama/ollama/commit/85a9b2126c30dc429ea17a7971b3f3ed085b697f) runner.go: Don't add inputs to cache view until actually processed - [`26e0461`](https://github.com/ollama/ollama/commit/26e0461d132f4ab6e5d493f971c08ce9299b5758) runner.go: Truncate inputs that exceed context rather than shifting ### 📊 Changes **4 files changed** (+94 additions, -37 deletions) <details> <summary>View changed files</summary> 📝 `integration/context_test.go` (+31 -0) 📝 `llama/llama.go` (+10 -4) 📝 `llama/runner/cache.go` (+12 -4) 📝 `llama/runner/runner.go` (+41 -29) </details> ### 📄 Description Users have reported a number of errors related to the KV cache such as: - Error: "could not find a KV slot for the batch - try reducing the size of the batch or increase the context. code: 1" - Hanging due to infinite loops - Output that ends unexpectedly - Slower performance than before when passing inputs that are much longer than the context size This aims to both fix these problems and continue to make this area of the code less error prone. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-16 06:13:52 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#17783