[PR #14574] [MERGED] don't require pulling stubs for cloud models #14730

Closed
opened 2026-04-13 01:01:37 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/14574
Author: @drifkin
Created: 3/3/2026
Status: Merged
Merged: 3/3/2026
Merged by: @drifkin

Base: mainHead: drifkin/cloud-no-pull


📝 Commits (8)

  • cea732d don't require pulling stubs for cloud models
  • 03add3f consolidate pull logic into confirmAndPull helper
  • 1ff3c33 skip local existence checks for cloud models
  • 2e429a7 support optionally pulling stubs for newly-style names
  • 7eb03a5 Fix server alias syncing
  • a3d3efb Update cmd/cmd.go
  • 7da8c3e address comments
  • 2ef9350 improve some naming

📊 Changes

23 files changed (+2843 additions, -108 deletions)

View changed files

📝 cmd/cmd.go (+39 -5)
📝 cmd/cmd_test.go (+171 -3)
📝 cmd/config/claude.go (+6 -11)
📝 cmd/config/config.go (+3 -0)
📝 cmd/config/droid.go (+1 -3)
📝 cmd/config/integrations.go (+22 -31)
📝 cmd/config/integrations_test.go (+131 -8)
📝 cmd/config/opencode.go (+6 -7)
📝 cmd/tui/tui.go (+9 -2)
internal/modelref/modelref.go (+115 -0)
internal/modelref/modelref_test.go (+268 -0)
📝 middleware/anthropic.go (+2 -1)
server/cloud_proxy.go (+460 -0)
server/cloud_proxy_test.go (+154 -0)
📝 server/create.go (+12 -5)
server/model_resolver.go (+81 -0)
server/model_resolver_test.go (+170 -0)
📝 server/routes.go (+123 -27)
📝 server/routes_cloud_test.go (+988 -0)
📝 server/routes_create_test.go (+37 -0)

...and 3 more files

📄 Description

This is a first in a series of PRs that will better integrate Ollama's cloud into the API and CLI. Previously we used to have a layer of indirection where you'd first have to pull a "stub" model that contains a reference to a cloud model. With this change, you don't have to pull first, you can just use a cloud model in various routes like /api/chat and /api/show. This change respects https://github.com/ollama/ollama/pull/14221, so if cloud is disabled, these models won't be accessible.

There's also a new, simpler pass-through proxy that doesn't convert the requests ahead of hitting the cloud models, which they themselves already support various formats (e.g., v1/chat/completions or Open Responses, etc.). This will help prevent issues caused by double converting (e.g., v1/chat/completions converted to api/chat on the client, then calling cloud and converting back to a v1/chat/completions response instead of the cloud model handling the original v1/chat/completions request first).

There's now a notion of "source tags", which can be mixed with existing tags. So instead of having different formats likegpt-oss:20b-cloud vs. kimi-k2.5:cloud (-cloud suffix vs. :cloud), you can now specify cloud by simply appending :cloud. This PR doesn't change model resolution yet, but sets us up to allow for things like omitting the non-source tag, which would make something like ollama run gpt-oss:cloud work the same way that ollama run gpt-oss already works today.

More detailed changes:

  • Added a shared model selector parser in types/modelselector:
    • supports :cloud and :local
    • accepts source tags in any position
    • supports legacy :<tag>-cloud
    • rejects conflicting source tags
  • Integrated selector handling across server inference/show routes:
    • GenerateHandler, ChatHandler, EmbedHandler, EmbeddingsHandler, ShowHandler
  • Added explicit-cloud passthrough proxy for ollama.com:
    • same-endpoint forwarding for /api/*, /v1/*, and /v1/messages
    • normalizes model (and name for /api/show) before forwarding
    • forwards request headers except hop-by-hop/proxy-managed headers
    • uses bounded response-header timeout
    • handles auth failures in a friendly way
  • Preserved cloud-disable behavior (OLLAMA_NO_CLOUD)
  • Updated create flow to support FROM ...:cloud model sources (though this flow uses the legacy proxy still, supporting Modelfile overrides is more complicated with the direct proxy approach)
  • Updated CLI/TUI/config cloud detection to use shared selector logic
  • Updated CLI preflight behavior so explicit cloud requests do not
    auto-pull local stubs

What's next?

  • Cloud discovery/listing and cache-backed ollama ls / /api/tags
  • Modelfile overlay support for virtual cloud models on OpenAI/Anthropic
    request families
  • Recommender/default-selection behavior for ambiguous model families
  • Fully remove the legacy flow

Fixes: https://github.com/ollama/ollama/issues/13801


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/14574 **Author:** [@drifkin](https://github.com/drifkin) **Created:** 3/3/2026 **Status:** ✅ Merged **Merged:** 3/3/2026 **Merged by:** [@drifkin](https://github.com/drifkin) **Base:** `main` ← **Head:** `drifkin/cloud-no-pull` --- ### 📝 Commits (8) - [`cea732d`](https://github.com/ollama/ollama/commit/cea732d89477a4a54bfe548c68e0e2d208a9462d) don't require pulling stubs for cloud models - [`03add3f`](https://github.com/ollama/ollama/commit/03add3fd9e0ef50a696bc282fea08ba99f01a86b) consolidate pull logic into confirmAndPull helper - [`1ff3c33`](https://github.com/ollama/ollama/commit/1ff3c33f7e5fc2fd171c868c4ee3cb27d951601b) skip local existence checks for cloud models - [`2e429a7`](https://github.com/ollama/ollama/commit/2e429a7ffdbfe9284a5b1f79091e62b9e077646c) support optionally pulling stubs for newly-style names - [`7eb03a5`](https://github.com/ollama/ollama/commit/7eb03a59b2eade1932affb6d107fe222c9caa4e2) Fix server alias syncing - [`a3d3efb`](https://github.com/ollama/ollama/commit/a3d3efb4c9035b238a3f577d44950d2604c0007b) Update cmd/cmd.go - [`7da8c3e`](https://github.com/ollama/ollama/commit/7da8c3eae236b32abe1752c5fe1ed2507fce3b52) address comments - [`2ef9350`](https://github.com/ollama/ollama/commit/2ef9350cb54a2af598f4780b4ec40a0c7c821d91) improve some naming ### 📊 Changes **23 files changed** (+2843 additions, -108 deletions) <details> <summary>View changed files</summary> 📝 `cmd/cmd.go` (+39 -5) 📝 `cmd/cmd_test.go` (+171 -3) 📝 `cmd/config/claude.go` (+6 -11) 📝 `cmd/config/config.go` (+3 -0) 📝 `cmd/config/droid.go` (+1 -3) 📝 `cmd/config/integrations.go` (+22 -31) 📝 `cmd/config/integrations_test.go` (+131 -8) 📝 `cmd/config/opencode.go` (+6 -7) 📝 `cmd/tui/tui.go` (+9 -2) ➕ `internal/modelref/modelref.go` (+115 -0) ➕ `internal/modelref/modelref_test.go` (+268 -0) 📝 `middleware/anthropic.go` (+2 -1) ➕ `server/cloud_proxy.go` (+460 -0) ➕ `server/cloud_proxy_test.go` (+154 -0) 📝 `server/create.go` (+12 -5) ➕ `server/model_resolver.go` (+81 -0) ➕ `server/model_resolver_test.go` (+170 -0) 📝 `server/routes.go` (+123 -27) 📝 `server/routes_cloud_test.go` (+988 -0) 📝 `server/routes_create_test.go` (+37 -0) _...and 3 more files_ </details> ### 📄 Description This is a first in a series of PRs that will better integrate Ollama's cloud into the API and CLI. Previously we used to have a layer of indirection where you'd first have to pull a "stub" model that contains a reference to a cloud model. With this change, you don't have to pull first, you can just use a cloud model in various routes like `/api/chat` and `/api/show`. This change respects <https://github.com/ollama/ollama/pull/14221>, so if cloud is disabled, these models won't be accessible. There's also a new, simpler pass-through proxy that doesn't convert the requests ahead of hitting the cloud models, which they themselves already support various formats (e.g., `v1/chat/completions` or Open Responses, etc.). This will help prevent issues caused by double converting (e.g., `v1/chat/completions` converted to `api/chat` on the client, then calling cloud and converting back to a `v1/chat/completions` response instead of the cloud model handling the original `v1/chat/completions` request first). There's now a notion of "source tags", which can be mixed with existing tags. So instead of having different formats like`gpt-oss:20b-cloud` vs. `kimi-k2.5:cloud` (`-cloud` suffix vs. `:cloud`), you can now specify cloud by simply appending `:cloud`. This PR doesn't change model resolution yet, but sets us up to allow for things like omitting the non-source tag, which would make something like `ollama run gpt-oss:cloud` work the same way that `ollama run gpt-oss` already works today. More detailed changes: - Added a shared model selector parser in `types/modelselector`: - supports `:cloud` and `:local` - accepts source tags in any position - supports legacy `:<tag>-cloud` - rejects conflicting source tags - Integrated selector handling across server inference/show routes: - `GenerateHandler`, `ChatHandler`, `EmbedHandler`, `EmbeddingsHandler`, `ShowHandler` - Added explicit-cloud passthrough proxy for ollama.com: - same-endpoint forwarding for `/api/*`, `/v1/*`, and `/v1/messages` - normalizes `model` (and `name` for `/api/show`) before forwarding - forwards request headers except hop-by-hop/proxy-managed headers - uses bounded response-header timeout - handles auth failures in a friendly way - Preserved cloud-disable behavior (`OLLAMA_NO_CLOUD`) - Updated create flow to support `FROM ...:cloud` model sources (though this flow uses the legacy proxy still, supporting Modelfile overrides is more complicated with the direct proxy approach) - Updated CLI/TUI/config cloud detection to use shared selector logic - Updated CLI preflight behavior so explicit cloud requests do not auto-pull local stubs What's next? - Cloud discovery/listing and cache-backed `ollama ls` / `/api/tags` - Modelfile overlay support for virtual cloud models on OpenAI/Anthropic request families - Recommender/default-selection behavior for ambiguous model families - Fully remove the legacy flow Fixes: https://github.com/ollama/ollama/issues/13801 --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-13 01:01:37 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#14730