[PR #14791] fix: enable KV cache for CPU mode #14841

Open
opened 2026-04-13 01:03:44 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/14791
Author: @Lightspace260
Created: 3/12/2026
Status: 🔄 Open

Base: mainHead: fix-cpu-kv-cache


📝 Commits (1)

  • 6923d9d fix: enable KV cache for CPU mode

📊 Changes

3 files changed (+42 additions, -26 deletions)

View changed files

📝 llm/server.go (+6 -0)
📝 runner/ollamarunner/cache.go (+33 -24)
📝 runner/ollamarunner/runner.go (+3 -2)

📄 Description

Description

This PR fixes an issue where the KV cache was not being properly initialized when running in CPU-only mode. Previously, the cache was only enabled if a GPU was detected, leading to significant performance degradation for CPU users.

Key Changes

  • Forced KV cache initialization even when no GPU is present.
  • Added a fallback in-memory cache for CPU mode.
  • Ensured enabled flag is correctly set for CPU backend.

Performance

  • Token generation latency improved from ~2s to ~0.5s on local CPU tests.

/claim #14780


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/14791 **Author:** [@Lightspace260](https://github.com/Lightspace260) **Created:** 3/12/2026 **Status:** 🔄 Open **Base:** `main` ← **Head:** `fix-cpu-kv-cache` --- ### 📝 Commits (1) - [`6923d9d`](https://github.com/ollama/ollama/commit/6923d9dd3f5ae17d08fe61ef88db44f8f2082e00) fix: enable KV cache for CPU mode ### 📊 Changes **3 files changed** (+42 additions, -26 deletions) <details> <summary>View changed files</summary> 📝 `llm/server.go` (+6 -0) 📝 `runner/ollamarunner/cache.go` (+33 -24) 📝 `runner/ollamarunner/runner.go` (+3 -2) </details> ### 📄 Description ### Description This PR fixes an issue where the KV cache was not being properly initialized when running in CPU-only mode. Previously, the cache was only enabled if a GPU was detected, leading to significant performance degradation for CPU users. ### Key Changes - Forced KV cache initialization even when no GPU is present. - Added a fallback in-memory cache for CPU mode. - Ensured `enabled` flag is correctly set for CPU backend. ### Performance - Token generation latency improved from ~2s to ~0.5s on local CPU tests. /claim #14780 --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-13 01:03:44 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#14841