[GH-ISSUE #15758] Ollama's Cloud doesn't report number of cached tokens #72103

Open
opened 2026-05-05 03:29:05 -05:00 by GiteaMirror · 1 comment
Owner

Originally created by @drifkin on GitHub (Apr 23, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/15758

What is the issue?

Behind the scenes requests are sped up with caches, but we currently always report 0 cached tokens.

Originally created by @drifkin on GitHub (Apr 23, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/15758 ### What is the issue? Behind the scenes requests are sped up with caches, but we currently always report 0 cached tokens.
GiteaMirror added the bug label 2026-05-05 03:29:05 -05:00
Author
Owner

@anishesg commented on GitHub (Apr 25, 2026):

PR #15768 exposes the cached input count from loadCache as a new PromptCachedCount field in the completion response. Backward compatible (omitempty).

<!-- gh-comment-id:4317150755 --> @anishesg commented on GitHub (Apr 25, 2026): PR #15768 exposes the cached input count from loadCache as a new PromptCachedCount field in the completion response. Backward compatible (omitempty).
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#72103