ollama

mirror of https://github.com/ollama/ollama.git synced 2026-05-05 23:53:43 -05:00

Files

Patrick Devine 15e6076d79 mlx: Gemma4 MTP speculative decoding (#15980 )

This change adds support for MTP (multi-token prediction) speculative decoding for the
gemma4 model family.

It includes:
  * support for importing safetensors based gemma4 draft models with `ollama create`
  * a new DRAFT command in the Modelfile for specifying draft models
  * a --quantize-draft flag for the ollama create command to quantize the draft model
  * cache support for speculation
  * changes to the rotating cache to be able to handle MTP correctly
  * sampling support for draft model token prediction

---------

Co-authored-by: Daniel Hiltgen <daniel@ollama.com>

2026-05-05 08:55:04 -07:00

support

app: add code for macOS and Windows apps under 'app' (#12933 )

2025-11-04 11:40:17 -08:00

.this-is-the-create-dmg-repo

app: add code for macOS and Windows apps under 'app' (#12933 )

2025-11-04 11:40:17 -08:00

build_darwin.sh

Update MLX and MLX-C with threading fixes (#15845 )

2026-05-03 10:03:14 -07:00

build_docker.sh

…