[PR #4632] make cache_prompt as an option #11548

Closed
opened 2026-04-12 23:32:09 -05:00 by GiteaMirror · 0 comments
Owner

Original Pull Request: https://github.com/ollama/ollama/pull/4632

State: closed
Merged: No


When we request as request-reproducible-outputs, the responses are not reproducible.
It's easy to reproduce this issue, all the following steps should set temperature=0 and seed=1

  1. request with prompt A
  2. request with prompt B
  3. request with prompt A
  4. request with prompt A
  5. request with prompt A

We will find the 1,3,4 outputs are different, and 4,5 outputs are same.
I made some tests with the llama.cpp, found that when the cache_prompt enabled, the response will be affected by the previous input.
So, we should make the cache_prompt as an option, allow users to disable it when they need reproducible outputs.

**Original Pull Request:** https://github.com/ollama/ollama/pull/4632 **State:** closed **Merged:** No --- When we request as [request-reproducible-outputs](https://github.com/ollama/ollama/blob/main/docs/api.md#request-reproducible-outputs), the responses are not reproducible. It's easy to reproduce this issue, all the following steps should set `temperature=0` and `seed=1` 1. request with prompt A 2. request with prompt B 3. request with prompt A 4. request with prompt A 5. request with prompt A We will find the 1,3,4 outputs are different, and 4,5 outputs are same. I made some tests with the llama.cpp, found that when the `cache_prompt` enabled, the response will be affected by the previous input. So, we should make the `cache_prompt` as an option, allow users to disable it when they need reproducible outputs.
GiteaMirror added the pull-request label 2026-04-12 23:32:09 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#11548