This change adds support for MTP (multi-token prediction) speculative decoding for the
gemma4 model family.
It includes:
* support for importing safetensors based gemma4 draft models with `ollama create`
* a new DRAFT command in the Modelfile for specifying draft models
* a --quantize-draft flag for the ollama create command to quantize the draft model
* cache support for speculation
* changes to the rotating cache to be able to handle MTP correctly
* sampling support for draft model token prediction
---------
Co-authored-by: Daniel Hiltgen <daniel@ollama.com>