[PR #5492] [MERGED] Use slot with cached prompt instead of least recently used #11799

Closed
opened 2026-04-12 23:39:10 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/5492
Author: @jmorganca
Created: 7/5/2024
Status: Merged
Merged: 7/5/2024
Merged by: @jmorganca

Base: mainHead: jmorganca/mru-slot


📝 Commits (2)

  • f536794 Use common prefix to select slot
  • f691bcf actually report longest

📊 Changes

1 file changed (+39 additions, -1 deletions)

View changed files

📝 llm/ext_server/server.cpp (+39 -1)

📄 Description

This chooses the slot with the longest common prompt prefix instead of selecting the least recently used slot – this maximizes cache time for a single "conversation".

Future improvements:

  • Clone slots and their cache
  • Avoid requests "stealing" slots from each other because they have a small but common prefix
  • Account for context shifts in the cache matching

🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/5492 **Author:** [@jmorganca](https://github.com/jmorganca) **Created:** 7/5/2024 **Status:** ✅ Merged **Merged:** 7/5/2024 **Merged by:** [@jmorganca](https://github.com/jmorganca) **Base:** `main` ← **Head:** `jmorganca/mru-slot` --- ### 📝 Commits (2) - [`f536794`](https://github.com/ollama/ollama/commit/f53679433da4ccc6db11564504ad0f4d1a7c5296) Use common prefix to select slot - [`f691bcf`](https://github.com/ollama/ollama/commit/f691bcf51536ea5b9217e109c683c483a3fbe162) actually report `longest` ### 📊 Changes **1 file changed** (+39 additions, -1 deletions) <details> <summary>View changed files</summary> 📝 `llm/ext_server/server.cpp` (+39 -1) </details> ### 📄 Description This chooses the slot with the longest common prompt prefix instead of selecting the least recently used slot – this maximizes cache time for a single "conversation". Future improvements: - [ ] Clone slots and their cache - [ ] Avoid requests "stealing" slots from each other because they have a small but common prefix - [ ] Account for context shifts in the cache matching --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-12 23:39:10 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#11799