[PR #7073] [MERGED] runner.go: Don't cache prompts for embeddings #74596

Closed
opened 2026-05-05 06:45:50 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/7073
Author: @jessegross
Created: 10/2/2024
Status: Merged
Merged: 10/2/2024
Merged by: @jessegross

Base: jmorganca/llamaHead: jessegross/cache


📝 Commits (1)

  • da76e98 runner.go: Don't cache prompts for embeddings

📊 Changes

2 files changed (+13 additions, -7 deletions)

View changed files

📝 llama/runner/cache.go (+5 -1)
📝 llama/runner/runner.go (+8 -6)

📄 Description

Our integration with server.cpp implicitly disables prompt caching because it is not part of the JSON object being parsed, this makes the Go runner behavior similarly.

Prompt caching has been seen to affect the results of text completions on certain hardware. The results are not wrong either way but they are non-deterministic. However, embeddings seem to be affected even on hardware that does not show this behavior for completions. For now, it is best to maintain consistency with the existing behavior.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/7073 **Author:** [@jessegross](https://github.com/jessegross) **Created:** 10/2/2024 **Status:** ✅ Merged **Merged:** 10/2/2024 **Merged by:** [@jessegross](https://github.com/jessegross) **Base:** `jmorganca/llama` ← **Head:** `jessegross/cache` --- ### 📝 Commits (1) - [`da76e98`](https://github.com/ollama/ollama/commit/da76e980443bebddc003d9ffe3b08a0fa6f7fbe2) runner.go: Don't cache prompts for embeddings ### 📊 Changes **2 files changed** (+13 additions, -7 deletions) <details> <summary>View changed files</summary> 📝 `llama/runner/cache.go` (+5 -1) 📝 `llama/runner/runner.go` (+8 -6) </details> ### 📄 Description Our integration with server.cpp implicitly disables prompt caching because it is not part of the JSON object being parsed, this makes the Go runner behavior similarly. Prompt caching has been seen to affect the results of text completions on certain hardware. The results are not wrong either way but they are non-deterministic. However, embeddings seem to be affected even on hardware that does not show this behavior for completions. For now, it is best to maintain consistency with the existing behavior. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-05-05 06:45:50 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#74596