mirror of
https://github.com/ollama/ollama.git
synced 2026-03-09 07:16:38 -05:00
The MLX runner previously reported a static VRAM estimate that was computed at load time and consisted only of the weights. This is strictly less than the actual memory usage, as it does not include the KV cache or compute graph.