mirror of https://github.com/ollama/ollama.git synced 2026-03-09 03:12:11 -05:00

Go to file

Devon Rifkin 8207e55ec7 don't require pulling stubs for cloud models (#14574 )

* don't require pulling stubs for cloud models

This is a first in a series of PRs that will better integrate Ollama's
cloud into the API and CLI. Previously we used to have a layer of
indirection where you'd first have to pull a "stub" model that contains
a reference to a cloud model. With this change, you don't have to pull
first, you can just use a cloud model in various routes like `/api/chat`
and `/api/show`. This change respects
<https://github.com/ollama/ollama/pull/14221>, so if cloud is disabled,
these models won't be accessible.

There's also a new, simpler pass-through proxy that doesn't convert the
requests ahead of hitting the cloud models, which they themselves
already support various formats (e.g., `v1/chat/completions` or Open
Responses, etc.). This will help prevent issues caused by double
converting (e.g., `v1/chat/completions` converted to `api/chat` on the
client, then calling cloud and converting back to a
`v1/chat/completions` response instead of the cloud model handling the
original `v1/chat/completions` request first).

There's now a notion of "source tags", which can be mixed with existing
tags. So instead of having different formats like`gpt-oss:20b-cloud` vs.
`kimi-k2.5:cloud` (`-cloud` suffix vs. `:cloud`), you can now specify
cloud by simply appending `:cloud`. This PR doesn't change model
resolution yet, but sets us up to allow for things like omitting the
non-source tag, which would make something like `ollama run
gpt-oss:cloud` work the same way that `ollama run gpt-oss` already works
today.

More detailed changes:

- Added a shared model selector parser in `types/modelselector`:
  - supports `:cloud` and `:local`
  - accepts source tags in any position
  - supports legacy `:<tag>-cloud`
  - rejects conflicting source tags
- Integrated selector handling across server inference/show routes:
  - `GenerateHandler`, `ChatHandler`, `EmbedHandler`,
    `EmbeddingsHandler`, `ShowHandler`
- Added explicit-cloud passthrough proxy for ollama.com:
  - same-endpoint forwarding for `/api/*`, `/v1/*`, and `/v1/messages`
  - normalizes `model` (and `name` for `/api/show`) before forwarding
  - forwards request headers except hop-by-hop/proxy-managed headers
  - uses bounded response-header timeout
  - handles auth failures in a friendly way
- Preserved cloud-disable behavior (`OLLAMA_NO_CLOUD`)
- Updated create flow to support `FROM ...:cloud` model sources (though
  this flow uses the legacy proxy still, supporting Modelfile overrides
  is more complicated with the direct proxy approach)
- Updated CLI/TUI/config cloud detection to use shared selector logic
- Updated CLI preflight behavior so explicit cloud requests do not
  auto-pull local stubs

What's next?

- Cloud discovery/listing and cache-backed `ollama ls` / `/api/tags`
- Modelfile overlay support for virtual cloud models on OpenAI/Anthropic
  request families
- Recommender/default-selection behavior for ambiguous model families
- Fully remove the legacy flow

Fixes: https://github.com/ollama/ollama/issues/13801

* consolidate pull logic into confirmAndPull helper

pullIfNeeded and ShowOrPull shared identical confirm-and-pull logic.
Extract confirmAndPull to eliminate the duplication.

* skip local existence checks for cloud models

ModelExists and the TUI's modelExists both check the local model list,
which causes cloud models to appear missing. Return true early for
explicit cloud models so the TUI displays them beside the integration
name and skips re-prompting the model picker on relaunch.

* support optionally pulling stubs for newly-style names

We now normalize names like `<family>:<size>:cloud` into legacy-style
names like `<family>:<size>-cloud` for pulling and deleting (this also
supports stripping `:local`). Support for pulling cloud models is
temporary, once we integrate properly into `/api/tags` we won't need
this anymore.

* Fix server alias syncing

* Update cmd/cmd.go

Co-authored-by: Parth Sareen <parth.sareen@ollama.com>

* address comments

* improve some naming

---------

Co-authored-by: ParthSareen <parth.sareen@ollama.com>

2026-03-03 10:46:33 -08:00

.github

win: add curl-style install script (#14178 )

2026-02-09 15:28:11 -08:00

anthropic

anthropic: enable websearch (#14246 )

2026-02-13 19:20:46 -08:00

api

mlx: Remove peak memory from the API

2026-03-02 15:56:18 -08:00

app

fix: window app crash on startup when update is pending (#14451 )

2026-02-26 16:47:12 -05:00

auth

auth: fix problems with the ollama keypairs (#12373 )

2025-09-22 23:20:20 -07:00

cmd

don't require pulling stubs for cloud models (#14574 )

2026-03-03 10:46:33 -08:00

convert

model: support for qwen3.5 architecture (#14378 )

2026-02-24 20:08:05 -08:00

discover

CUDA: filter devices on secondary discovery (#13317 )

2025-12-03 12:58:16 -08:00

docs

runner: add token history sampling parameters to ollama runner (#14537 )

2026-03-01 19:16:07 -08:00

envconfig

add ability to disable cloud (#14221 )

2026-02-12 15:47:00 -08:00

format

chore(all): replace instances of interface with any (#10067 )

2025-04-02 09:44:27 -07:00

model: support for qwen3.5 architecture (#14378 )

2026-02-24 20:08:05 -08:00

harmony

Parser for Cogito v2 (#13145 )

2025-11-19 17:21:07 -08:00

integration

ollamarunner: Fix off by one error with numPredict

2026-02-04 17:14:24 -08:00

internal

don't require pulling stubs for cloud models (#14574 )

2026-03-03 10:46:33 -08:00

kvcache

model: support for qwen3.5 architecture (#14378 )

2026-02-24 20:08:05 -08:00

llama

models: add nemotronh architecture support (#14356 )

2026-02-22 15:09:14 -08:00

llm

mlx: Remove peak memory from the API

2026-03-02 15:56:18 -08:00

logutil

logutil: fix source field (#12279 )

2025-09-16 16:18:07 -07:00

manifest

Clean up the manifest and modelpath (#13807 )

2026-01-21 11:46:17 -08:00

middleware

don't require pulling stubs for cloud models (#14574 )

2026-03-03 10:46:33 -08:00

model: add support for qwen3.5-27b model (#14415 )

2026-02-25 01:09:58 -08:00

model

model/qwen3next: avoid crash in in DeltaNet when offloading (#14541 )

2026-03-01 18:44:04 -08:00

openai

x/imagegen: add image edit capabilities (#13846 )

2026-01-22 20:35:08 -08:00

parser

Add experimental MLX backend and engine with imagegen support (#13648 )

2026-01-08 16:18:59 -08:00

progress

Add z-image image generation prototype (#13659 )

2026-01-09 21:09:46 -08:00

readline

feature: add ctrl-g to allow users to use an editor to edit their prompt (#14197 )

2026-02-11 13:04:41 -08:00

runner

runner: add token history sampling parameters to ollama runner (#14537 )

2026-03-01 19:16:07 -08:00

sample

runner: add token history sampling parameters to ollama runner (#14537 )

2026-03-01 19:16:07 -08:00

scripts

install: prevent partial download script execution (#14311 )

2026-02-18 18:32:45 -08:00

server

don't require pulling stubs for cloud models (#14574 )

2026-03-03 10:46:33 -08:00

template

template: fix args-as-json rendering (#13636 )

2026-01-06 18:33:57 -08:00

thinking

thinking: fix double emit when no opening tag

2025-08-21 21:03:12 -07:00

tokenizer

move tokenizers to separate package (#13825 )

2026-02-05 17:44:11 -08:00

tools

preserve tool definition and call JSON ordering (#13525 )

2026-01-05 18:03:36 -08:00

types

Update vendor ggml code to a5bb8ba4 (#13832 )

2026-02-02 17:31:59 -08:00

version

add version

2023-08-22 09:40:58 -07:00

don't require pulling stubs for cloud models (#14574 )

2026-03-03 10:46:33 -08:00

.dockerignore

next build (#8539 )

2025-01-29 15:03:38 -08:00

.gitattributes

.gitattributes: add app/webview to linguist-vendored (#13274 )

2025-11-29 23:46:10 -05:00

.gitignore

harmony: remove special casing in routes.go

2025-09-18 14:55:59 -07:00

.golangci.yaml

ci: restore previous linter rules (#13322 )

2025-12-03 18:55:02 -08:00

CMakeLists.txt

Revert "move tokenizers to separate package (#13825 )" (#14111 )

2026-02-05 20:49:08 -08:00

CMakePresets.json

Revert "Update vendored llama.cpp to b7847" (#14061 )

2026-02-03 18:39:36 -08:00

CONTRIBUTING.md

docs: fix typos in repository documentation (#10683 )

2025-11-15 20:22:29 -08:00

Dockerfile

update mlx-c bindings to 0.5.0 (#14380 )

2026-02-23 16:44:29 -08:00

go.mod

cmd: set codex env vars on launch and handle zstd request bodies (#14122 )

2026-02-18 17:19:36 -08:00

go.sum

cmd: set codex env vars on launch and handle zstd request bodies (#14122 )

2026-02-18 17:19:36 -08:00

LICENSE

proto -> ollama

2023-06-26 15:57:13 -04:00

main.go

lint

2024-08-01 17:06:06 -07:00

Makefile.sync

Revert "Update vendored llama.cpp to b7847" (#14061 )

2026-02-03 18:39:36 -08:00

MLX_VERSION

update mlx-c bindings to 0.5.0 (#14380 )

2026-02-23 16:44:29 -08:00

README.md

readme: update download link for macOS (#1 ) (#14271 )

2026-02-15 15:25:15 -08:00

SECURITY.md

docs: fix typos in repository documentation (#10683 )

2025-11-15 20:22:29 -08:00

README.md

Ollama

Start building with open models.

Download

macOS

curl -fsSL https://ollama.com/install.sh | sh

or download manually

Windows

irm https://ollama.com/install.ps1 | iex

or download manually

Linux

curl -fsSL https://ollama.com/install.sh | sh

Manual install instructions

Docker

The official Ollama Docker image ollama/ollama is available on Docker Hub.

Libraries

Community

Get started

ollama

You'll be prompted to run a model or connect Ollama to your existing agents or applications such as claude, codex, openclaw and more.

Coding

To launch a specific integration:

ollama launch claude

Supported integrations include Claude Code, Codex, Droid, and OpenCode.

AI assistant

Use OpenClaw to turn Ollama into a personal AI assistant across WhatsApp, Telegram, Slack, Discord, and more:

ollama launch openclaw

Chat with a model

Run and chat with Gemma 3:

ollama run gemma3

See ollama.com/library for the full list.

See the quickstart guide for more details.

REST API

Ollama has a REST API for running and managing models.

curl http://localhost:11434/api/chat -d '{
  "model": "gemma3",
  "messages": [{
    "role": "user",
    "content": "Why is the sky blue?"
  }],
  "stream": false
}'

See the API documentation for all endpoints.

Python

pip install ollama

from ollama import chat

response = chat(model='gemma3', messages=[
  {
    'role': 'user',
    'content': 'Why is the sky blue?',
  },
])
print(response.message.content)

JavaScript

npm i ollama

import ollama from "ollama";

const response = await ollama.chat({
  model: "gemma3",
  messages: [{ role: "user", content: "Why is the sky blue?" }],
});
console.log(response.message.content);

Supported backends

llama.cpp project founded by Georgi Gerganov.

Documentation

Community Integrations

Want to add your project? Open a pull request.

Chat Interfaces

Web

Open WebUI - Extensible, self-hosted AI interface
Onyx - Connected AI workspace
LibreChat - Enhanced ChatGPT clone with multi-provider support
Lobe Chat - Modern chat framework with plugin ecosystem (docs)
NextChat - Cross-platform ChatGPT UI (docs)
Perplexica - AI-powered search engine, open-source Perplexity alternative
big-AGI - AI suite for professionals
Lollms WebUI - Multi-model web interface
ChatOllama - Chatbot with knowledge bases
Bionic GPT - On-premise AI platform
Chatbot UI - ChatGPT-style web interface
Hollama - Minimal web interface
Chatbox - Desktop and web AI client
chat - Chat web app for teams
Ollama RAG Chatbot - Chat with multiple PDFs using RAG
Tkinter-based client - Python desktop client

Desktop

Dify.AI - LLM app development platform
AnythingLLM - All-in-one AI app for Mac, Windows, and Linux
Maid - Cross-platform mobile and desktop client
Witsy - AI desktop app for Mac, Windows, and Linux
Cherry Studio - Multi-provider desktop client
Ollama App - Multi-platform client for desktop and mobile
PyGPT - AI desktop assistant for Linux, Windows, and Mac
Alpaca - GTK4 client for Linux and macOS
SwiftChat - Cross-platform including iOS, Android, and Apple Vision Pro
Enchanted - Native macOS and iOS client
RWKV-Runner - Multi-model desktop runner
Ollama Grid Search - Evaluate and compare models
macai - macOS client for Ollama and ChatGPT
AI Studio - Multi-provider desktop IDE
Reins - Parameter tuning and reasoning model support
ConfiChat - Privacy-focused with optional encryption
LLocal.in - Electron desktop client
MindMac - AI chat client for Mac
Msty - Multi-model desktop client
BoltAI for Mac - AI chat client for Mac
IntelliBar - AI-powered assistant for macOS
Kerlig AI - AI writing assistant for macOS
Hillnote - Markdown-first AI workspace
Perfect Memory AI - Productivity AI personalized by screen and meeting history

Mobile

Ollama Android Chat - One-click Ollama on Android

SwiftChat, Enchanted, Maid, Ollama App, Reins, and ConfiChat listed above also support mobile platforms.

Code Editors & Development

Cline - VS Code extension for multi-file/whole-repo coding
Continue - Open-source AI code assistant for any IDE
Void - Open source AI code editor, Cursor alternative
Copilot for Obsidian - AI assistant for Obsidian
twinny - Copilot and Copilot chat alternative
gptel Emacs client - LLM client for Emacs
Ollama Copilot - Use Ollama as GitHub Copilot
Obsidian Local GPT - Local AI for Obsidian
Ellama Emacs client - LLM tool for Emacs
orbiton - Config-free text editor with Ollama tab completion
AI ST Completion - Sublime Text 4 AI assistant
VT Code - Rust-based terminal coding agent with Tree-sitter
QodeAssist - AI coding assistant for Qt Creator
AI Toolkit for VS Code - Microsoft-official VS Code extension
Open Interpreter - Natural language interface for computers

Libraries & SDKs

LiteLLM - Unified API for 100+ LLM providers
Semantic Kernel - Microsoft AI orchestration SDK
LangChain4j - Java LangChain (example)
LangChainGo - Go LangChain (example)
Spring AI - Spring framework AI support (docs)
LangChain and LangChain.js with example
Ollama for Ruby - Ruby LLM library
any-llm - Unified LLM interface by Mozilla
OllamaSharp for .NET - .NET SDK
LangChainRust - Rust LangChain (example)
Agents-Flex for Java - Java agent framework (example)
Elixir LangChain - Elixir LangChain
Ollama-rs for Rust - Rust SDK
LangChain for .NET - .NET LangChain (example)
chromem-go - Go vector database with Ollama embeddings (example)
LangChainDart - Dart LangChain
LlmTornado - Unified C# interface for multiple inference APIs
Ollama4j for Java - Java SDK
Ollama for Laravel - Laravel integration
Ollama for Swift - Swift SDK
LlamaIndex and LlamaIndexTS - Data framework for LLM apps
Haystack - AI pipeline framework
Firebase Genkit - Google AI framework
Ollama-hpp for C++ - C++ SDK
PromptingTools.jl - Julia LLM toolkit (example)
Ollama for R - rollama - R SDK
Portkey - AI gateway
Testcontainers - Container-based testing
LLPhant - PHP AI framework

Frameworks & Agents

AutoGPT - Autonomous AI agent platform
crewAI - Multi-agent orchestration framework
Strands Agents - Model-driven agent building by AWS
Cheshire Cat - AI assistant framework
any-agent - Unified agent framework interface by Mozilla
Stakpak - Open source DevOps agent
Hexabot - Conversational AI builder
Neuro SAN - Multi-agent orchestration (docs)

RAG & Knowledge Bases

RAGFlow - RAG engine based on deep document understanding
R2R - Open-source RAG engine
MaxKB - Ready-to-use RAG chatbot
Minima - On-premises or fully local RAG
Chipper - AI interface with Haystack RAG
ARGO - RAG and deep research on Mac/Windows/Linux
Archyve - RAG-enabling document library
Casibase - AI knowledge base with RAG and SSO
BrainSoup - Native client with RAG and multi-agent automation

Bots & Messaging

LangBot - Multi-platform messaging bots with agents and RAG
AstrBot - Multi-platform chatbot with RAG and plugins
Discord-Ollama Chat Bot - TypeScript Discord bot
Ollama Telegram Bot - Telegram bot
LLM Telegram Bot - Telegram bot for roleplay

Terminal & CLI

aichat - All-in-one LLM CLI with Shell Assistant, RAG, and AI tools
oterm - Terminal client for Ollama
gollama - Go-based model manager for Ollama
tlm - Local shell copilot
tenere - TUI for LLMs
ParLlama - TUI for Ollama
llm-ollama - Plugin for Datasette's LLM CLI
ShellOracle - Shell command suggestions
LLM-X - Progressive web app for LLMs
cmdh - Natural language to shell commands
VT - Minimal multimodal AI chat app

Productivity & Apps

AppFlowy - AI collaborative workspace, self-hostable Notion alternative
Screenpipe - 24/7 screen and mic recording with AI-powered search
Vibe - Transcribe and analyze meetings
Page Assist - Chrome extension for AI-powered browsing
NativeMind - Private, on-device browser AI assistant
Ollama Fortress - Security proxy for Ollama
1Panel - Web-based Linux server management
Writeopia - Text editor with Ollama integration
QA-Pilot - GitHub code repository understanding
Raycast extension - Ollama in Raycast
Painting Droid - Painting app with AI integrations
Serene Pub - AI roleplaying app
Mayan EDMS - Document management with Ollama workflows
TagSpaces - File management with AI tagging

Observability & Monitoring

Opik - Debug, evaluate, and monitor LLM applications
OpenLIT - OpenTelemetry-native monitoring for Ollama and GPUs
Lunary - LLM observability with analytics and PII masking
Langfuse - Open source LLM observability
HoneyHive - AI observability and evaluation for agents
MLflow Tracing - Open source LLM observability

Database & Embeddings

pgai - PostgreSQL as a vector database (guide)
MindsDB - Connect Ollama with 200+ data platforms
chromem-go - Embeddable vector database for Go (example)
Kangaroo - AI-powered SQL client

Infrastructure & Deployment

Cloud

Google Cloud
Fly.io
Koyeb
Harbor - Containerized LLM toolkit with Ollama as default backend

Package Managers

Releases 100

Latest

2025-11-05 14:33:01 -06:00

Languages

Go 80.5%

C 10%

TypeScript 5.2%

C++ 1.8%

Objective-C 0.8%

Other 1.6%

README.md Unescape Escape

Ollama

Download

macOS

Windows

Linux

Docker

Libraries

Community

Get started

Coding

AI assistant

Chat with a model

REST API

Python

JavaScript

Supported backends

Documentation

Community Integrations

Chat Interfaces

Web

Desktop

Mobile

Code Editors & Development

Libraries & SDKs

Frameworks & Agents

RAG & Knowledge Bases

Bots & Messaging

Terminal & CLI

Productivity & Apps

Observability & Monitoring

Database & Embeddings

Infrastructure & Deployment

Cloud

Package Managers

README.md