[PR #4632] [CLOSED] make cache_prompt as an option #22088

Closed
opened 2026-04-19 16:04:55 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/4632
Author: @Windfarer
Created: 5/25/2024
Status: Closed

Base: mainHead: option-prompt-cache


📝 Commits (1)

  • 5328d7f add disable cache prompt env

📊 Changes

3 files changed (+29 additions, -16 deletions)

View changed files

📝 docs/api.md (+1 -0)
📝 envconfig/config.go (+27 -15)
📝 llm/server.go (+1 -1)

📄 Description

When we request as request-reproducible-outputs, the responses are not reproducible.
It's easy to reproduce this issue, all the following steps should set temperature=0 and seed=1

  1. request with prompt A
  2. request with prompt B
  3. request with prompt A
  4. request with prompt A
  5. request with prompt A

We will find the 1,3,4 outputs are different, and 4,5 outputs are same.
I made some tests with the llama.cpp, found that when the cache_prompt enabled, the response will be affected by the previous input.
So, we should make the cache_prompt as an option, allow users to disable it when they need reproducible outputs.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/4632 **Author:** [@Windfarer](https://github.com/Windfarer) **Created:** 5/25/2024 **Status:** ❌ Closed **Base:** `main` ← **Head:** `option-prompt-cache` --- ### 📝 Commits (1) - [`5328d7f`](https://github.com/ollama/ollama/commit/5328d7f750c3ac4cdc79bee1b0c7cc09050c57fd) add disable cache prompt env ### 📊 Changes **3 files changed** (+29 additions, -16 deletions) <details> <summary>View changed files</summary> 📝 `docs/api.md` (+1 -0) 📝 `envconfig/config.go` (+27 -15) 📝 `llm/server.go` (+1 -1) </details> ### 📄 Description When we request as [request-reproducible-outputs](https://github.com/ollama/ollama/blob/main/docs/api.md#request-reproducible-outputs), the responses are not reproducible. It's easy to reproduce this issue, all the following steps should set `temperature=0` and `seed=1` 1. request with prompt A 2. request with prompt B 3. request with prompt A 4. request with prompt A 5. request with prompt A We will find the 1,3,4 outputs are different, and 4,5 outputs are same. I made some tests with the llama.cpp, found that when the `cache_prompt` enabled, the response will be affected by the previous input. So, we should make the `cache_prompt` as an option, allow users to disable it when they need reproducible outputs. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-19 16:04:55 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#22088