Commit Graph

5108 Commits

Author SHA1 Message Date
Eva H
3323c1d319 app: add upgrade configuration to settings page (#13512) 2026-02-23 18:08:52 -05:00
Jesse Gross
f20dc6b698 mlx: don't default to affine quantization for unquantized models
Otherwise the BF16 version of models trigger segfaults when they
call into quantized kernels.
2026-02-23 15:03:53 -08:00
Jeffrey Morgan
4b2ac1f369 model: improvements to LFM architectures (#14368) 2026-02-23 14:38:10 -08:00
Jesse Gross
8daf47fb3a mlxrunner: Fix duplicate log prefixes and reduce log noise
Pass subprocess stdout/stderr through to the parent's stderr directly
instead of re-wrapping each line with slog. The subprocess already
writes structured slog output, so the re-wrapping produced nested
timestamps, levels, and message fields that were hard to read.

Also downgrade verbose KV cache debug logs to trace level.
2026-02-23 14:09:20 -08:00
Eva H
6c980579cd ui: use capability-based detection for web search (#14336) 2026-02-23 15:00:09 -05:00
Jesse Gross
5c73c4e2ee mlxrunner: Simplify KV cache to single-entry prefix matching
The KV cache previously used a tree structure which could
store multiple divergent sequences, which is good for cache
reuse. However, this is typically used in conjunction with
paged attention so each node in the tree can store just a
chunk of the KV cache and they can be stitched together later.
We don't currently do this, so the cache was storing copies of
the full cache for each past sequence.

This redundancy plus the lack of resource limits, caused significant
memory use as a conversation grew. Instead, this changes to store
a single entry for the cache, which can be prefix matched. Although
it is less ideal for multiple users, it largely matches Ollama's
current behavior. It can be improved as additional pieces are fleshed
out.
2026-02-23 09:50:07 -08:00
Jesse Gross
5daf59cc66 mlxrunner: Fix memory leaks with pin/sweep lifecycle management
The previous approach tracked array lifecycles through reference
counting, where each array recorded its inputs and a reference count
that was decremented as dependents were freed. This is not really
necessary as MLX tracks references internally. It is also error
prone as it is easy to create new arrays and forget to free them
when the Go variable goes out of scope.

Instead, we can pin just the arrays we want (typically outputs and
specific intermediates, like the cache). All other arrays are freed
by default when we run sweep. This avoids most causes of memory leaks
while still giving the freedom to save what we want.
2026-02-23 09:50:07 -08:00
Jeffrey Morgan
0ade9205cc models: add nemotronh architecture support (#14356) 2026-02-22 15:09:14 -08:00
Parth Sareen
06edabdde1 cmd/config: install web search plugin to user-level extensions dir (#14362) v0.17.0-rc2 v0.17.0 2026-02-22 02:17:03 -08:00
Jeffrey Morgan
8b4e5a82a8 mlx: remove noisy error output from dynamic library loading (#14346)
The recent change in #14322 added tryLoadByName() which attempts to
load libmlxc.dylib via rpath before searching directories. This is an
optimization for Homebrew installations where rpath is correctly set.

However, when rpath isn't set (which is the common case for app bundle
installations), dlopen fails and the CHECK macro prints an error to
stderr:

  ERROR - dynamic.c:21 - CHECK failed: handle->ctx != NULL

This error is misleading because it's an expected failure path - the
code correctly falls back to searching the executable directory and
loads the library successfully. The error message causes user confusion
and makes it appear that something is broken.

Replace the CHECK macro with a simple return code so the C code fails
silently. The Go code already handles error logging appropriately:
tryLoadByName() fails silently (intentional fallback), while
tryLoadFromDir() logs via slog.Error() when explicit path loading fails.
v0.17.0-rc1
2026-02-20 23:46:07 -08:00
Parth Sareen
3445223311 cmd: openclaw onboarding (#14344) v0.17.0-rc0 2026-02-20 19:08:38 -08:00
Jeffrey Morgan
fa6c0127e6 app: expose server's default context length to UI (#14037)
Parse the default_num_ctx from the server's "vram-based default context"
log line and expose it through the inference compute API. This eliminates
duplicate VRAM tier calculation logic in the frontend.

- Add InferenceInfo struct with Computes and DefaultContextLength
- Rename GetInferenceComputer to GetInferenceInfo
- Handle missing default context line gracefully (older servers)
- Add DefaultContextLength to InferenceComputeResponse
- Update Settings UI to use server's default, disable slider while loading
- Add disabled prop to Slider component (grays out + hides handle)
- Migrate existing users with context_length=4096 to 0 (auto mode)
2026-02-20 18:56:30 -08:00
Patrick Devine
97323d1c68 consolidate the tokenizer (#14327)
This change adds a new x/tokenizer package which includes:
  * New BPE and SentencePiece tokenizers
  * Removing the dependency on the imagegen tokenizers
  * Fixes to multibyte decoding in the pipeline
  * Various correctness and benchmark tests

Not included in this PR is the WordPiece tokenizer for BERT models which will be
added when we add embedding models. The imagegen tokenizers will also be removed in
a follow-up PR.
2026-02-19 15:55:45 -08:00
natl-set
458dd1b9d9 mlx: try loading library via rpath before searching directories (#14322)
The existing code manually searches directories for libmlxc.* and passes
full paths to dlopen, bypassing the binary's rpath. This means MLX
libraries installed via package managers (e.g., Homebrew) aren't found
even when rpath is correctly set at link time.

This change adds a fallback that tries loading via rpath first (using
just the library name), before falling back to the existing directory
search. This follows standard Unix/macOS conventions and works with any
installation that sets rpath.

Fixes library loading on macOS with Homebrew-installed mlx-c without
requiring OLLAMA_LIBRARY_PATH environment variable.

Co-authored-by: Natl <nat@MacBook-Pro.local>
2026-02-19 10:55:02 -08:00
Bruce MacDonald
9d02d1d767 install: prevent partial download script execution (#14311)
Wrap script in main function so that a truncated partial download doesn't end up executing half a script.
v0.16.3-rc2 v0.16.3
2026-02-18 18:32:45 -08:00
Bruce MacDonald
1a636fb47a cmd: set codex env vars on launch and handle zstd request bodies (#14122)
The Codex runner was not setting OPENAI_BASE_URL or OPENAI_API_KEY, this prevents Codex from sending requests to api.openai.com instead of the local Ollama server. This mirrors the approach used by the Claude runner.

Codex v0.98.0 sends zstd-compressed request bodies to the /v1/responses endpoint. Add decompression support in ResponsesMiddleware with an 8MB max decompressed size limit to prevent resource exhaustion.
2026-02-18 17:19:36 -08:00
Patrick Devine
0759fface9 Revert "chore: update mlx-c bindings to 0.5.0 (#14303)" (#14316)
This reverts commit f01a9a7859.
2026-02-18 17:01:25 -08:00
Parth Sareen
325b72bc31 cmd/tui: default to single-select for editor integrations (#14302) v0.16.3-rc1 2026-02-17 18:17:27 -08:00
Patrick Devine
f01a9a7859 chore: update mlx-c bindings to 0.5.0 (#14303) 2026-02-17 16:48:16 -08:00
Patrick Devine
9aefd2dfee model: add qwen3 support to mlxrunner (#14293) v0.16.3-rc0 2026-02-17 13:58:49 -08:00
Patrick Devine
d07e4a1dd3 bugfix: better mlx model scheduling (#14290)
This fixes a bug with current MLX based models which don't get loaded/unloaded correctly. The first model currently gets loaded and then subsequent model starts get shunted to the first runner which results in the wrong model being run.
2026-02-17 13:57:05 -08:00
Parth Sareen
8a257ec00a docs: make integrations more discoverable (#14301)
* docs: add Pi integration page

* docs: flatten integration sidebar with expanded subheadings

* docs: add OpenClaw and Claude Code to quickstart
2026-02-17 13:27:25 -08:00
Parth Sareen
2f4de1acf7 cmd: ollama launch always show model picker (#14299) 2026-02-17 12:02:14 -08:00
Parth Sareen
ec95c45f70 cmd/config: ollama launch cline CLI (#14294) 2026-02-17 11:37:53 -08:00
Patrick Devine
3a88f7eb20 bugfix: add missing linear layer factory (#14289) 2026-02-16 17:22:20 -08:00
Patrick Devine
0d5da826d4 bugfix: display the parameter count correctly in mlx for ollama show (#14285) 2026-02-16 13:03:34 -08:00
Patrick Devine
9b795698b8 model: add llama3 architecture to mlxrunner (#14277) 2026-02-15 23:06:28 -08:00
Patrick Devine
041fb77639 model: add gemma3 to the mlxrunner (#14276)
This change adds the gemma3 model to the mlxrunner and simplifies some of the quantization
code for loading weights.
2026-02-15 22:47:59 -08:00
Saumil Shah
8224cce583 readme: update download link for macOS (#1) (#14271) 2026-02-15 15:25:15 -08:00
Patrick Devine
d18dcd7775 mlxrunner fixes (#14247)
* load glm4_moe_lite from the mlxrunner

* fix loading diffusion models

* remove log lines

* fix --imagegen flag
v0.16.2-rc0 v0.16.2
2026-02-13 22:30:42 -08:00
Parth Sareen
5f5ef20131 anthropic: enable websearch (#14246) 2026-02-13 19:20:46 -08:00
Parth Sareen
f0a07a353b cmd/tui: fix powershell search (#14242) 2026-02-13 15:53:11 -08:00
Devon Rifkin
948de6bbd2 add ability to disable cloud (#14221)
* add ability to disable cloud

Users can now easily opt-out of cloud inference and web search by
setting

```
"disable_ollama_cloud": true
```

in their `~/.ollama/server.json` settings file. After a setting update,
the server must be restarted.

Alternatively, setting the environment variable `OLLAMA_NO_CLOUD=1` will
also disable cloud features. While users previously were able to avoid
cloud models by not pulling or `ollama run`ing them, this gives them an
easy way to enforce that decision. Any attempt to run a cloud model when
cloud is disabled will fail.

The app's old "airplane mode" setting, which did a similar thing for
hiding cloud models within the app is now unified with this new cloud
disabled mode. That setting has been replaced with a "Cloud" toggle,
which behind the scenes edits `server.json` and then restarts the
server.

* gate cloud models across TUI and launch flows when cloud is disabled

Block cloud models from being selected, launched, or written to
integration configs when cloud mode is turned off:

- TUI main menu: open model picker instead of launching with a
  disabled cloud model
- cmd.go: add IsCloudModelDisabled checks for all Selection* paths
- LaunchCmd: filter cloud models from saved Editor configs before
  launch, fall through to picker if none remain
- Editor Run() methods (droid, opencode, openclaw): filter cloud
  models before calling Edit() and persist the cleaned list
- Export SaveIntegration, remove SaveIntegrationModel wrapper that
  was accumulating models instead of replacing them

* rename saveIntegration to SaveIntegration in config.go and tests

* cmd/config: add --model guarding and empty model list fixes

* Update docs/faq.mdx

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

* Update internal/cloud/policy.go

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

* Update internal/cloud/policy.go

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

* Update server/routes.go

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

* Revert "Update internal/cloud/policy.go"

This reverts commit 8bff8615f9.

Since this error shows up in other integrations, we want it to be
prefixed with Ollama

* rename cloud status

* more status renaming

* fix tests that weren't updated after rename

---------

Co-authored-by: ParthSareen <parth.sareen@ollama.com>
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
2026-02-12 15:47:00 -08:00
Parth Sareen
598b74d42c cmd/config: add minimax-m2.5 (#14223) v0.16.1 2026-02-12 14:29:50 -08:00
Jeffrey Morgan
935a48ed1a scripts: skip macOS symlink creation if already correct (#14142) 2026-02-12 12:44:42 -08:00
Daniel Hiltgen
de39e24bf7 win: progress reporting on install download (#14219)
* win: progress reporting on install download

Downloading Ollama...
  [####################################    ] 91%  1106.6 / 1204.2 MB

* review comments
2026-02-12 12:06:56 -08:00
Eva H
519b11eba1 site: update readme (#14217) 2026-02-12 12:14:13 -05:00
Eva H
379fd64fa8 Revert "update README (#14213)" (#14215) 2026-02-12 12:06:00 -05:00
frob
59c019a6fb x: configurable model load timeout (#14204)
Co-authored-by: rick <rick@frob.com.au>
2026-02-12 09:05:42 -08:00
Eva H
fad3bcccb2 update README (#14213) 2026-02-12 11:59:42 -05:00
Bruce MacDonald
bd6697ad95 docs: update quickstart for tui (#14208) 2026-02-12 08:44:33 -08:00
SamareshSingh
f8dc7c9f54 docs: fix openapi schema for /api/ps and /api/tags endpoints (#14210) v0.16.0 v0.16.0-rc2 2026-02-11 17:37:40 -08:00
Patrick Devine
4a3741129d bug: fix loading non-mlx models when ollama is built with mlx support (#14211)
This change fixes an issue where GGML based models (for either the Ollama runner or
the legacy llama.cpp runner) would try to load the mlx library. That would panic
and the model fails to start.
v0.16.0-rc1 v0.16.0-rc0
2026-02-11 14:48:33 -08:00
Parth Sareen
77ba9404ac cmd/tui: improve model picker UX (#14209) 2026-02-11 14:36:54 -08:00
Patrick Devine
0aaf6119ec feature: add ctrl-g to allow users to use an editor to edit their prompt (#14197) 2026-02-11 13:04:41 -08:00
Parth Sareen
f08427c138 cmd: TUI UX improvements (#14198) 2026-02-11 10:18:41 -08:00
Maternion
2dbb000908 update context length format. 2026-02-10 17:06:05 -08:00
Maternion
c980e19995 Fix formatting of context length notes in documentation 2026-02-10 17:06:05 -08:00
Maternion
6162374ca9 Update context-length.mdx 2026-02-10 17:06:05 -08:00
Patrick Devine
44bdd9a2ef Add MLX runner with GLM4-MoE-Lite model support (#14185)
This change adds a new MLX based runner which includes:

  * Method-based MLX bindings
  * Subprocess-based MLX runner (x/mlxrunner)
  * KV cache with tree management
  * A basic sampler

The GLM4-MoE-Lite model has been ported to use the new bindings.

---------

Co-authored-by: Michael Yang <git@mxy.ng>
2026-02-10 14:57:57 -08:00