[PR #15967] server: cache show responses #77670

Open
opened 2026-05-05 10:20:44 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/15967
Author: @ParthSareen
Created: 5/4/2026
Status: 🔄 Open

Base: mainHead: parth-cache-show-endpoint


📝 Commits (2)

📊 Changes

5 files changed (+1252 additions, -16 deletions)

View changed files

server/model_caches.go (+27 -0)
📝 server/model_recommendations_test.go (+2 -2)
server/model_show_cache.go (+694 -0)
server/model_show_cache_test.go (+498 -0)
📝 server/routes.go (+31 -14)

📄 Description

Summary

Adds a server-side cache for full /api/show responses, aimed at making model capability and metadata lookup fast for launch flows.

Architecture

  • Introduces a small modelCaches manager owned by Server.
  • Adds modelShowCache, alongside the existing model recommendations cache.
  • Wires ShowHandler through the cache for cacheable requests.
  • Persists snapshots under ~/.ollama/cache/show/:
    • local.json
    • cloud.json

Cache Design

Local models:

  • Cache key is canonical model name + manifest digest + verbose.
  • Manifest digest is the freshness boundary.
  • verbose=false and verbose=true are separate entries.
  • Startup hydration scans manifests in the background and skips unchanged cached entries.

Cloud models:

  • Cache key is normalized cloud base model + verbose.
  • Explicit :cloud and legacy -cloud normalize to the same cloud key.
  • Uses stale-while-revalidate behavior:
    • warm hit returns cached data immediately
    • background refresh updates that model
    • cold miss preserves existing synchronous proxy behavior
  • Startup hydration fetches cloud /api/tags, then /api/show for returned models with bounded concurrency.

Important Decisions

  • Cache full api.ShowResponse, not a reduced capabilities subset.
  • Keep local and cloud caches in separate maps so local qwen3.5 and cloud qwen3.5:cloud cannot collide.
  • Do not cache requests with System or Options overlays.
  • Do not serve stale cloud data when cloud is disabled.
  • Clone responses at cache boundaries so handler mutations cannot leak into cached values.
  • Snapshot parse/write failures are non-fatal and only log warnings.

Tests

Covers local cache hits, manifest digest invalidation, startup hydration, verbose variants, overlay bypass, cloud SWR, cold cloud fallback, cloud hydration, cloud disabled behavior, local/cloud key separation, and snapshot failure tolerance.

Verified:

  • go test ./server
  • go test ./cmd/launch

🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/15967 **Author:** [@ParthSareen](https://github.com/ParthSareen) **Created:** 5/4/2026 **Status:** 🔄 Open **Base:** `main` ← **Head:** `parth-cache-show-endpoint` --- ### 📝 Commits (2) - [`3249066`](https://github.com/ollama/ollama/commit/3249066254652ba203e7e7c9744d8762760bf111) server: cache show responses - [`d72fd2f`](https://github.com/ollama/ollama/commit/d72fd2f70e8748a1a0247a2908fbac0eda98a1fa) Address comments ### 📊 Changes **5 files changed** (+1252 additions, -16 deletions) <details> <summary>View changed files</summary> ➕ `server/model_caches.go` (+27 -0) 📝 `server/model_recommendations_test.go` (+2 -2) ➕ `server/model_show_cache.go` (+694 -0) ➕ `server/model_show_cache_test.go` (+498 -0) 📝 `server/routes.go` (+31 -14) </details> ### 📄 Description ## Summary Adds a server-side cache for full `/api/show` responses, aimed at making model capability and metadata lookup fast for launch flows. ## Architecture - Introduces a small `modelCaches` manager owned by `Server`. - Adds `modelShowCache`, alongside the existing model recommendations cache. - Wires `ShowHandler` through the cache for cacheable requests. - Persists snapshots under `~/.ollama/cache/show/`: - `local.json` - `cloud.json` ## Cache Design Local models: - Cache key is canonical model name + manifest digest + `verbose`. - Manifest digest is the freshness boundary. - `verbose=false` and `verbose=true` are separate entries. - Startup hydration scans manifests in the background and skips unchanged cached entries. Cloud models: - Cache key is normalized cloud base model + `verbose`. - Explicit `:cloud` and legacy `-cloud` normalize to the same cloud key. - Uses stale-while-revalidate behavior: - warm hit returns cached data immediately - background refresh updates that model - cold miss preserves existing synchronous proxy behavior - Startup hydration fetches cloud `/api/tags`, then `/api/show` for returned models with bounded concurrency. ## Important Decisions - Cache full `api.ShowResponse`, not a reduced capabilities subset. - Keep local and cloud caches in separate maps so local `qwen3.5` and cloud `qwen3.5:cloud` cannot collide. - Do not cache requests with `System` or `Options` overlays. - Do not serve stale cloud data when cloud is disabled. - Clone responses at cache boundaries so handler mutations cannot leak into cached values. - Snapshot parse/write failures are non-fatal and only log warnings. ## Tests Covers local cache hits, manifest digest invalidation, startup hydration, verbose variants, overlay bypass, cloud SWR, cold cloud fallback, cloud hydration, cloud disabled behavior, local/cloud key separation, and snapshot failure tolerance. Verified: - `go test ./server` - `go test ./cmd/launch` --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-05-05 10:20:44 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#77670