ollama-ollama

mirror of https://github.com/ollama/ollama.git synced 2026-03-09 03:12:11 -05:00

Author	SHA1	Message	Date
Jesse Gross	4d5ff25724	mlxrunner: Report actual memory usage from runner The MLX runner previously reported a static VRAM estimate that was computed at load time and consisted only of the weights. This is strictly less than the actual memory usage, as it does not include the KV cache or compute graph.	2026-02-25 15:06:37 -08:00
Jesse Gross	0f23b7bff5	mlxrunner: Cancel in-flight requests when the client disconnects Currently, a canceled request can result in computation continuing in the background to completion. It can also trigger a deadlock when there is nobody to read the output tokens and the pipeline cannot continue to the next request.	2026-02-25 14:00:42 -08:00
Jesse Gross	4e57d2094e	mlxrunner: Simplify pipeline memory and cache management Particularly in error cases, it can be difficult to ensure that all pinned memory is unpinned, MLX buffers are released and cache state is consistent. This encapsulates those pieces and sets up proper deferrals so that this happens automatically on exit.	2026-02-25 14:00:42 -08:00
Jeffrey Morgan	7f9efd53df	model: add support for qwen3.5-27b model (#14415 ) v0.17.1-rc2	2026-02-25 01:09:58 -08:00
Jeffrey Morgan	da70c3222e	model: support for qwen3.5 architecture (#14378 ) v0.17.1-rc1	2026-02-24 20:08:05 -08:00
Bruce MacDonald	9d902d63ce	ggml: ensure tensor size is valid (#14406 ) When quantizing tensors during model creation validate that the resulting sizes match what is expected based on the shape.	2026-02-24 21:52:44 -04:00
Daniel Hiltgen	f4f0a4a471	update mlx-c bindings to 0.5.0 (#14380 ) * chore: update mlx-c bindings to 0.5.0 (#14303) * linux: use gcc 11 --------- Co-authored-by: Patrick Devine <patrick@infrahq.com> v0.17.1-rc0	2026-02-23 16:44:29 -08:00
Eva H	3323c1d319	app: add upgrade configuration to settings page (#13512 )	2026-02-23 18:08:52 -05:00
Jesse Gross	f20dc6b698	mlx: don't default to affine quantization for unquantized models Otherwise the BF16 version of models trigger segfaults when they call into quantized kernels.	2026-02-23 15:03:53 -08:00
Jeffrey Morgan	4b2ac1f369	model: improvements to LFM architectures (#14368 )	2026-02-23 14:38:10 -08:00
Jesse Gross	8daf47fb3a	mlxrunner: Fix duplicate log prefixes and reduce log noise Pass subprocess stdout/stderr through to the parent's stderr directly instead of re-wrapping each line with slog. The subprocess already writes structured slog output, so the re-wrapping produced nested timestamps, levels, and message fields that were hard to read. Also downgrade verbose KV cache debug logs to trace level.	2026-02-23 14:09:20 -08:00
Eva H	6c980579cd	ui: use capability-based detection for web search (#14336 )	2026-02-23 15:00:09 -05:00
Jesse Gross	5c73c4e2ee	mlxrunner: Simplify KV cache to single-entry prefix matching The KV cache previously used a tree structure which could store multiple divergent sequences, which is good for cache reuse. However, this is typically used in conjunction with paged attention so each node in the tree can store just a chunk of the KV cache and they can be stitched together later. We don't currently do this, so the cache was storing copies of the full cache for each past sequence. This redundancy plus the lack of resource limits, caused significant memory use as a conversation grew. Instead, this changes to store a single entry for the cache, which can be prefix matched. Although it is less ideal for multiple users, it largely matches Ollama's current behavior. It can be improved as additional pieces are fleshed out.	2026-02-23 09:50:07 -08:00
Jesse Gross	5daf59cc66	mlxrunner: Fix memory leaks with pin/sweep lifecycle management The previous approach tracked array lifecycles through reference counting, where each array recorded its inputs and a reference count that was decremented as dependents were freed. This is not really necessary as MLX tracks references internally. It is also error prone as it is easy to create new arrays and forget to free them when the Go variable goes out of scope. Instead, we can pin just the arrays we want (typically outputs and specific intermediates, like the cache). All other arrays are freed by default when we run sweep. This avoids most causes of memory leaks while still giving the freedom to save what we want.	2026-02-23 09:50:07 -08:00
Jeffrey Morgan	0ade9205cc	models: add nemotronh architecture support (#14356 )	2026-02-22 15:09:14 -08:00
Parth Sareen	06edabdde1	cmd/config: install web search plugin to user-level extensions dir (#14362 ) v0.17.0-rc2 v0.17.0	2026-02-22 02:17:03 -08:00
Jeffrey Morgan	8b4e5a82a8	mlx: remove noisy error output from dynamic library loading (#14346 ) The recent change in #14322 added tryLoadByName() which attempts to load libmlxc.dylib via rpath before searching directories. This is an optimization for Homebrew installations where rpath is correctly set. However, when rpath isn't set (which is the common case for app bundle installations), dlopen fails and the CHECK macro prints an error to stderr: ERROR - dynamic.c:21 - CHECK failed: handle->ctx != NULL This error is misleading because it's an expected failure path - the code correctly falls back to searching the executable directory and loads the library successfully. The error message causes user confusion and makes it appear that something is broken. Replace the CHECK macro with a simple return code so the C code fails silently. The Go code already handles error logging appropriately: tryLoadByName() fails silently (intentional fallback), while tryLoadFromDir() logs via slog.Error() when explicit path loading fails. v0.17.0-rc1	2026-02-20 23:46:07 -08:00
Parth Sareen	3445223311	cmd: openclaw onboarding (#14344 ) v0.17.0-rc0	2026-02-20 19:08:38 -08:00
Jeffrey Morgan	fa6c0127e6	app: expose server's default context length to UI (#14037 ) Parse the default_num_ctx from the server's "vram-based default context" log line and expose it through the inference compute API. This eliminates duplicate VRAM tier calculation logic in the frontend. - Add InferenceInfo struct with Computes and DefaultContextLength - Rename GetInferenceComputer to GetInferenceInfo - Handle missing default context line gracefully (older servers) - Add DefaultContextLength to InferenceComputeResponse - Update Settings UI to use server's default, disable slider while loading - Add disabled prop to Slider component (grays out + hides handle) - Migrate existing users with context_length=4096 to 0 (auto mode)	2026-02-20 18:56:30 -08:00
Patrick Devine	97323d1c68	consolidate the tokenizer (#14327 ) This change adds a new x/tokenizer package which includes: * New BPE and SentencePiece tokenizers * Removing the dependency on the imagegen tokenizers * Fixes to multibyte decoding in the pipeline * Various correctness and benchmark tests Not included in this PR is the WordPiece tokenizer for BERT models which will be added when we add embedding models. The imagegen tokenizers will also be removed in a follow-up PR.	2026-02-19 15:55:45 -08:00
natl-set	458dd1b9d9	mlx: try loading library via rpath before searching directories (#14322 ) The existing code manually searches directories for libmlxc.* and passes full paths to dlopen, bypassing the binary's rpath. This means MLX libraries installed via package managers (e.g., Homebrew) aren't found even when rpath is correctly set at link time. This change adds a fallback that tries loading via rpath first (using just the library name), before falling back to the existing directory search. This follows standard Unix/macOS conventions and works with any installation that sets rpath. Fixes library loading on macOS with Homebrew-installed mlx-c without requiring OLLAMA_LIBRARY_PATH environment variable. Co-authored-by: Natl <nat@MacBook-Pro.local>	2026-02-19 10:55:02 -08:00
Bruce MacDonald	9d02d1d767	install: prevent partial download script execution (#14311 ) Wrap script in main function so that a truncated partial download doesn't end up executing half a script. v0.16.3 v0.16.3-rc2	2026-02-18 18:32:45 -08:00
Bruce MacDonald	1a636fb47a	cmd: set codex env vars on launch and handle zstd request bodies (#14122 ) The Codex runner was not setting OPENAI_BASE_URL or OPENAI_API_KEY, this prevents Codex from sending requests to api.openai.com instead of the local Ollama server. This mirrors the approach used by the Claude runner. Codex v0.98.0 sends zstd-compressed request bodies to the /v1/responses endpoint. Add decompression support in ResponsesMiddleware with an 8MB max decompressed size limit to prevent resource exhaustion.	2026-02-18 17:19:36 -08:00
Patrick Devine	0759fface9	Revert "chore: update mlx-c bindings to 0.5.0 (#14303 )" (#14316 ) This reverts commit `f01a9a7859`.	2026-02-18 17:01:25 -08:00
Parth Sareen	325b72bc31	cmd/tui: default to single-select for editor integrations (#14302 ) v0.16.3-rc1	2026-02-17 18:17:27 -08:00
Patrick Devine	f01a9a7859	chore: update mlx-c bindings to 0.5.0 (#14303 )	2026-02-17 16:48:16 -08:00
Patrick Devine	9aefd2dfee	model: add qwen3 support to mlxrunner (#14293 ) v0.16.3-rc0	2026-02-17 13:58:49 -08:00
Patrick Devine	d07e4a1dd3	bugfix: better mlx model scheduling (#14290 ) This fixes a bug with current MLX based models which don't get loaded/unloaded correctly. The first model currently gets loaded and then subsequent model starts get shunted to the first runner which results in the wrong model being run.	2026-02-17 13:57:05 -08:00
Parth Sareen	8a257ec00a	docs: make integrations more discoverable (#14301 ) * docs: add Pi integration page * docs: flatten integration sidebar with expanded subheadings * docs: add OpenClaw and Claude Code to quickstart	2026-02-17 13:27:25 -08:00
Parth Sareen	2f4de1acf7	cmd: ollama launch always show model picker (#14299 )	2026-02-17 12:02:14 -08:00
Parth Sareen	ec95c45f70	cmd/config: ollama launch cline CLI (#14294 )	2026-02-17 11:37:53 -08:00
Patrick Devine	3a88f7eb20	bugfix: add missing linear layer factory (#14289 )	2026-02-16 17:22:20 -08:00
Patrick Devine	0d5da826d4	bugfix: display the parameter count correctly in mlx for ollama show (#14285 )	2026-02-16 13:03:34 -08:00
Patrick Devine	9b795698b8	model: add llama3 architecture to mlxrunner (#14277 )	2026-02-15 23:06:28 -08:00
Patrick Devine	041fb77639	model: add gemma3 to the mlxrunner (#14276 ) This change adds the gemma3 model to the mlxrunner and simplifies some of the quantization code for loading weights.	2026-02-15 22:47:59 -08:00
Saumil Shah	8224cce583	readme: update download link for macOS (#1 ) (#14271 )	2026-02-15 15:25:15 -08:00
Patrick Devine	d18dcd7775	mlxrunner fixes (#14247 ) * load glm4_moe_lite from the mlxrunner * fix loading diffusion models * remove log lines * fix --imagegen flag v0.16.2-rc0 v0.16.2	2026-02-13 22:30:42 -08:00
Parth Sareen	5f5ef20131	anthropic: enable websearch (#14246 )	2026-02-13 19:20:46 -08:00
Parth Sareen	f0a07a353b	cmd/tui: fix powershell search (#14242 )	2026-02-13 15:53:11 -08:00
Devon Rifkin	948de6bbd2	add ability to disable cloud (#14221 ) * add ability to disable cloud Users can now easily opt-out of cloud inference and web search by setting ``` "disable_ollama_cloud": true ``` in their `~/.ollama/server.json` settings file. After a setting update, the server must be restarted. Alternatively, setting the environment variable `OLLAMA_NO_CLOUD=1` will also disable cloud features. While users previously were able to avoid cloud models by not pulling or `ollama run`ing them, this gives them an easy way to enforce that decision. Any attempt to run a cloud model when cloud is disabled will fail. The app's old "airplane mode" setting, which did a similar thing for hiding cloud models within the app is now unified with this new cloud disabled mode. That setting has been replaced with a "Cloud" toggle, which behind the scenes edits `server.json` and then restarts the server. * gate cloud models across TUI and launch flows when cloud is disabled Block cloud models from being selected, launched, or written to integration configs when cloud mode is turned off: - TUI main menu: open model picker instead of launching with a disabled cloud model - cmd.go: add IsCloudModelDisabled checks for all Selection* paths - LaunchCmd: filter cloud models from saved Editor configs before launch, fall through to picker if none remain - Editor Run() methods (droid, opencode, openclaw): filter cloud models before calling Edit() and persist the cleaned list - Export SaveIntegration, remove SaveIntegrationModel wrapper that was accumulating models instead of replacing them * rename saveIntegration to SaveIntegration in config.go and tests * cmd/config: add --model guarding and empty model list fixes * Update docs/faq.mdx Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com> * Update internal/cloud/policy.go Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com> * Update internal/cloud/policy.go Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com> * Update server/routes.go Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com> * Revert "Update internal/cloud/policy.go" This reverts commit `8bff8615f9`. Since this error shows up in other integrations, we want it to be prefixed with Ollama * rename cloud status * more status renaming * fix tests that weren't updated after rename --------- Co-authored-by: ParthSareen <parth.sareen@ollama.com> Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>	2026-02-12 15:47:00 -08:00
Parth Sareen	598b74d42c	cmd/config: add minimax-m2.5 (#14223 ) v0.16.1	2026-02-12 14:29:50 -08:00
Jeffrey Morgan	935a48ed1a	scripts: skip macOS symlink creation if already correct (#14142 )	2026-02-12 12:44:42 -08:00
Daniel Hiltgen	de39e24bf7	win: progress reporting on install download (#14219 ) * win: progress reporting on install download Downloading Ollama... [#################################### ] 91% 1106.6 / 1204.2 MB * review comments	2026-02-12 12:06:56 -08:00
Eva H	519b11eba1	site: update readme (#14217 )	2026-02-12 12:14:13 -05:00
Eva H	379fd64fa8	Revert "update README (#14213 )" (#14215 )	2026-02-12 12:06:00 -05:00
frob	59c019a6fb	x: configurable model load timeout (#14204 ) Co-authored-by: rick <rick@frob.com.au>	2026-02-12 09:05:42 -08:00
Eva H	fad3bcccb2	update README (#14213 )	2026-02-12 11:59:42 -05:00
Bruce MacDonald	bd6697ad95	docs: update quickstart for tui (#14208 )	2026-02-12 08:44:33 -08:00
SamareshSingh	f8dc7c9f54	docs: fix openapi schema for /api/ps and /api/tags endpoints (#14210 ) v0.16.0-rc2 v0.16.0	2026-02-11 17:37:40 -08:00
Patrick Devine	4a3741129d	bug: fix loading non-mlx models when ollama is built with mlx support (#14211 ) This change fixes an issue where GGML based models (for either the Ollama runner or the legacy llama.cpp runner) would try to load the mlx library. That would panic and the model fails to start. v0.16.0-rc1 v0.16.0-rc0	2026-02-11 14:48:33 -08:00

1 2 3 4 5 ...

5115 Commits