[PR #1642] [MERGED] Add Cache option #1573 #57332

Closed
opened 2026-04-29 11:54:50 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/1642
Author: @K0IN
Created: 12/20/2023
Status: Merged
Merged: 12/22/2023
Merged by: @BruceMacD

Base: mainHead: main


📝 Commits (1)

📊 Changes

3 files changed (+6 additions, -2 deletions)

View changed files

📝 api/types.go (+2 -0)
📝 docs/api.md (+2 -1)
📝 llm/ext_server.go (+2 -1)

📄 Description

This PR, adds the API option "cache", that allows the llama.cpp server to cache our prompt Eval and the response.
This speed-up follow-up calls immensely for some models, if you use it over the API, with the same prompt (or even partial ones), it will speed up subsequent calls, since it skips the evaluation of the prompt.

Also, this PR includes commands /set cache and /set nocache to give users the ability to enable prompt caching in the official CLI.

  • Add a new entry "cache" to the options object that is passed to the worker
  • Add commands /set cache and /set nocache to allow this in the repl cli
  • Update docs

This is a partial fix for, Enable prompt cache #1573, we might need to patch llama.cpp at some point to allow us full flexibility.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/1642 **Author:** [@K0IN](https://github.com/K0IN) **Created:** 12/20/2023 **Status:** ✅ Merged **Merged:** 12/22/2023 **Merged by:** [@BruceMacD](https://github.com/BruceMacD) **Base:** `main` ← **Head:** `main` --- ### 📝 Commits (1) - [`504497f`](https://github.com/ollama/ollama/commit/504497ffbcac44596f083ad14ee3d99b24b62996) Add Cache flag to api ### 📊 Changes **3 files changed** (+6 additions, -2 deletions) <details> <summary>View changed files</summary> 📝 `api/types.go` (+2 -0) 📝 `docs/api.md` (+2 -1) 📝 `llm/ext_server.go` (+2 -1) </details> ### 📄 Description This PR, adds the API option "cache", that allows the llama.cpp server to cache our prompt Eval and the response. This speed-up follow-up calls immensely for some models, if you use it over the API, with the same prompt (or even partial ones), it will speed up subsequent calls, since it skips the evaluation of the prompt. Also, this PR includes commands /set cache and /set nocache to give users the ability to enable prompt caching in the official CLI. * Add a new entry "cache" to the options object that is passed to the worker * Add commands /set cache and /set nocache to allow this in the repl cli * Update docs This is a partial fix for, Enable prompt cache #1573, we might need to patch llama.cpp at some point to allow us full flexibility. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-29 11:54:50 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#57332