20 of 20 workloads now schema-valid; 9 of 11 measurable workloads have
evidence-bound regime values backed by sidecars in roofline/. The
linter passes --verify-against-sidecars across the suite. 13 prior
guess-classifications were corrected by measurement; the surprises
(DLRM compute-bound, ResNet bandwidth-bound, Diffusion bandwidth-bound)
will inform paper prose. Branch parked.
Folds in: bench/measure_peaks.py (real per-machine peak FLOPS + BW
measurement), roofline.py reading from cache, manifest.py rejecting
dirty trees on closed division, check_taxonomy.py
--verify-against-sidecars flag, nanogpt_prefill emitting sidecars.
Empirical findings: hardcoded M1 peaks were 5.5-7.7x off for this
machine (M-series Pro/Max). The verify-against-sidecars flag caught
a YAML claim that didn't survive real measurement (nanogpt-prefill
dispatch claim was calibrated against wrong peaks).
Branch parked. 6 of 10 iterations complete (counting 5.5).
Snapshots iter-4 from standalone repo. Adds tools/check_taxonomy.py
(linter that gates schema completeness + threshold consistency) and
the migration that converted all 20 workload entries to use the new
3-axis regime schema. Working group sign-off: Emer (proposer + verifier).
Branch parked; not for merge to dev.