[PR #15868] [MERGED] server/launch: add model recommendations cache endpoint #77629

Closed
opened 2026-05-05 10:18:10 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/15868
Author: @ParthSareen
Created: 4/28/2026
Status: Merged
Merged: 4/29/2026
Merged by: @ParthSareen

Base: mainHead: parth-model-rec-cache


📝 Commits (1)

  • 17fc4d6 server/launch: add model recommendations cache endpoint

📊 Changes

10 files changed (+1250 additions, -38 deletions)

View changed files

📝 api/client.go (+10 -0)
📝 api/types.go (+14 -0)
📝 cmd/launch/command_test.go (+4 -0)
📝 cmd/launch/hermes_test.go (+8 -9)
📝 cmd/launch/launch.go (+63 -4)
📝 cmd/launch/launch_test.go (+56 -0)
📝 cmd/launch/models.go (+76 -20)
server/model_recommendations.go (+401 -0)
server/model_recommendations_test.go (+586 -0)
📝 server/routes.go (+32 -5)

📄 Description

Summary

This PR adds an experimental model recommendations API backed by a server-side cache, and updates ollama launch to consume it for model recommendations.

The goal is to keep recommendations fresh without breaking launch flows if recommendation fetches fail.

What Changed

  • Added GET /api/experimental/model-recommendations on the local server.
    • Fetches to ollama.com can be disabled with OLLAMA_NO_CLOUD there is a fallback list which can be used for launch.
  • Added a server-side modelRecommendationsCache with:
    • startup defaults,
    • on-disk snapshot persistence (~/.ollama/cache/model-recommendations.json),
    • periodic background refresh,
    • read-triggered SWR refresh.
  • Added API client/types support for recommendations.
  • Updated launch model selection to request server recommendations.
  • Mapped recommendation metadata into launch model items (description, vram, context_length, max_output_tokens).
  • Added dynamic cloud limit updates from recommendation payloads.

Key Design Decisions

  1. Server owns recommendation freshness.
  • Launch should not independently fetch from ollama.com.
  • The local server is the source of truth for recommendation payloads.
  1. SWR + timer refresh model.
  • Keep the 4h background refresh loop.
  • Also trigger async revalidation on endpoint reads.
  • Reads return cached data immediately; refresh is non-blocking.
  1. Fail-open behavior for launch.
  • Recommendation issues should not block ollama launch.
  • If recommendation request fails or returns empty, launch falls back to built-in recommendation data.
  1. Keep fallback safety on both sides.
  • Server has hardcoded defaults for cold start/offline safety.
  • Launch keeps fallback behavior to avoid user-facing breakage when endpoint data is unavailable.

Notes / Follow-ups

  • We can later move to a generalized server cache manager if more runtime caches are added.

🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/15868 **Author:** [@ParthSareen](https://github.com/ParthSareen) **Created:** 4/28/2026 **Status:** ✅ Merged **Merged:** 4/29/2026 **Merged by:** [@ParthSareen](https://github.com/ParthSareen) **Base:** `main` ← **Head:** `parth-model-rec-cache` --- ### 📝 Commits (1) - [`17fc4d6`](https://github.com/ollama/ollama/commit/17fc4d6110f809667eaece5c5d95e7cf263f54a6) server/launch: add model recommendations cache endpoint ### 📊 Changes **10 files changed** (+1250 additions, -38 deletions) <details> <summary>View changed files</summary> 📝 `api/client.go` (+10 -0) 📝 `api/types.go` (+14 -0) 📝 `cmd/launch/command_test.go` (+4 -0) 📝 `cmd/launch/hermes_test.go` (+8 -9) 📝 `cmd/launch/launch.go` (+63 -4) 📝 `cmd/launch/launch_test.go` (+56 -0) 📝 `cmd/launch/models.go` (+76 -20) ➕ `server/model_recommendations.go` (+401 -0) ➕ `server/model_recommendations_test.go` (+586 -0) 📝 `server/routes.go` (+32 -5) </details> ### 📄 Description ## Summary This PR adds an **experimental** model recommendations API backed by a server-side cache, and updates `ollama launch` to consume it for model recommendations. The goal is to keep recommendations fresh without breaking launch flows if recommendation fetches fail. ## What Changed - Added `GET /api/experimental/model-recommendations` on the local server. - Fetches to ollama.com can be disabled with `OLLAMA_NO_CLOUD` there is a fallback list which can be used for launch. - Added a server-side `modelRecommendationsCache` with: - startup defaults, - on-disk snapshot persistence (`~/.ollama/cache/model-recommendations.json`), - periodic background refresh, - read-triggered SWR refresh. - Added API client/types support for recommendations. - Updated launch model selection to request server recommendations. - Mapped recommendation metadata into launch model items (`description`, `vram`, `context_length`, `max_output_tokens`). - Added dynamic cloud limit updates from recommendation payloads. ## Key Design Decisions 1. Server owns recommendation freshness. - Launch should not independently fetch from ollama.com. - The local server is the source of truth for recommendation payloads. 2. SWR + timer refresh model. - Keep the 4h background refresh loop. - Also trigger async revalidation on endpoint reads. - Reads return cached data immediately; refresh is non-blocking. 3. Fail-open behavior for launch. - Recommendation issues should not block `ollama launch`. - If recommendation request fails or returns empty, launch falls back to built-in recommendation data. 4. Keep fallback safety on both sides. - Server has hardcoded defaults for cold start/offline safety. - Launch keeps fallback behavior to avoid user-facing breakage when endpoint data is unavailable. ## Notes / Follow-ups - We can later move to a generalized server cache manager if more runtime caches are added. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-05-05 10:18:10 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#77629