Align public README and site messaging around the curriculum components, adoption paths, and current early-release status so newcomers can move from reading to building, deployment, practice, and teaching.
Add `text` language tag to 25 unlabeled fenced code blocks across the
public-facing READMEs. Mostly directory-tree listings, all-contributors
bot instructions, and pseudo-output ASCII blocks — none were getting
syntax highlighting anyway, but the explicit tag silences markdownlint
MD040 and signals intent ("this is plain text, not a forgotten lang").
GitHub's github-markdown-css applies:
.markdown-body table { display: block; width: max-content; max-width: 100%; }
The HTML width="100%" attribute is a presentational hint with lower
specificity than the class selector, so tables with short cell content
were sizing to max-content and not stretching to fill the column.
Tables with long sentences per cell stretched fine, masking the bug.
Add inline style="width:100%" (specificity 1,0,0,0) which overrides
the class-selector rule. Keep width="100%" attribute as a fallback for
non-GitHub renderers (VSCode preview, GitLab, plain HTML viewers).
54 tables updated across 10 READMEs + the two contributor-sync scripts
that regenerate auto-managed tables.
The sub-project READMEs used an old-school nested-table card design
with hardcoded bgcolor="#ffffff", "#cfd6dd", "#eef2f7" plus deprecated
HTML4 attributes (cellpadding, cellspacing, border). It looked good in
light mode but produced harsh white islands in GitHub's dark theme,
which is what most readers see today.
Across 11 sub-READMEs:
- Strip the card wrapper so data tables are just clean
<table width="100%"> with semantic <thead>/<tbody>. Headers keep
their column widths; bgcolor/valign/zebra-stripe cruft is removed
(GitHub provides its own theme-aware striping).
- Convert the early-release callouts (and mlperf-edu's two-tier
status block + "source of truth" note + interviews' two info boxes)
to GitHub-native > [!NOTE] / > [!WARNING] / > [!TIP] callouts.
These are theme-aware, get proper icons, and render correctly in
light AND dark mode.
Net result: 528 lines of HTML cruft removed, 230 lines of clean
markdown added. Visual identity is preserved (callouts still stand
out, tables still stretch full-width) while becoming dark-mode safe
and consistent with the main README.
- Add wrap_readme_data_tables.py to frame <table>+<thead>/<tbody> blocks in a
98% width panel (#cfd6dd border, #eef2f7 headers, zebra body rows where
applied manually in converted tables).
- Apply wraps to book, kits, labs, slides, tinytorch; tbody wraps for kits
docs/related and instructors overview.
- Convert remaining Markdown tables in mlsysim, mlperf-edu, and interviews to
the same HTML pattern; replace StaffML markdown callouts with HTML panels.
- Add thead rows to kits/instructors body-only tables for clearer hierarchy.
- mlperf-edu: stacked HTML panels (amber under construction + gray early-work
note) at top; replace workload-table markdown quote with HTML callout.
- slides, labs, mlsysim: rename DEV-BANNER comments to EARLY-RELEASE-CALLOUT
for consistency with other project READMEs.
Every hook now passes on `pre-commit run --all-files` (exit 0 after one
auto-fix pass + one verification pass — the standard pre-commit contract).
Unblocks book-validate-dev, which has been red on various hooks since the
mlsysim.core import failure finally cleared.
Fixes applied (source-traced, not suppressed):
1. codespell: 'OT' in mlperf-edu/reference/cloud/micro_lstm.py is the
column name for Oil Temperature in the ETTh1 dataset (Zhou et al.,
AAAI 2021), not a typo for 'to/of/or/not/it'. Added 'ot' to
.codespell-ignore-words.txt (case-insensitive, covers OT).
2. bib-lint §5 bibliography hygiene: 12 entries in
mlperf-edu/paper/refs.bib missing required publisher/journal per the
canonical mapping in book-prose-merged.md §5. Added canonical
publishers (MLSys → mlsys.org, ICLR → OpenReview.net, CVPR → IEEE,
NAACL → ACL, etc.); promoted krizhevsky2009cifar from @article to
@techreport with institution = University of Toronto. banbury2021mlperf
uses Curran Associates Inc. (pre-2022 NeurIPS rule); flagged
banbury2024wakevision for author review since the booktitle says CVPR
but web verification suggests it is still an arXiv preprint.
3. Over-eager 'vs.' style sweep corrupted anchor IDs: 77 instances of
-vs.- inside {#sec-...}, {#tbl-...}, {#fig-...} definitions and their
@-references across 27 QMDs. Anchor IDs must be literal strings without
periods per the repo's own section-ID naming rule; stripped the period
from all anchor tokens while preserving 'vs.' in visible prose.
4. 4 broken SVG filename references from the same sweep (pam4-vs.-nrz,
traditional-vs.-ml-fleet, tco-build-vs.-buy, centralized-vs.-decentralized)
— filenames on disk use vs- (no period) so refs restored to match.
5. Malformed XML declaration in bathtub-curve.svg:
'<?xml version="utf-8"?>' → '<?xml version="1.0" encoding="utf-8"?>'.
6. 21 quad-asterisks (****term****) in training.qmd collapsed to **term**.
7. bibtex-tidy auto-reformatted mlperf-edu/paper/refs.bib (alphabetical
order + consistent indentation + wrapped author lists) and pipe-table
prettifier realigned columns across ~20 QMDs. These are all cosmetic
formatter output — no content changes.
Verified: pre-commit run --all-files run #1 modified files (exit 1),
run #2 exit 0 with 61 Passed / 0 Failed.
Replace hardcoded /Users/VJ/GitHub/mlperf-edu/paper/figures/... paths
in generate_all_curves.py and generate_all_curves_v2.py with paths
derived from os.path.dirname(__file__), so the figure-generation
scripts work for any user/checkout location.
Six surgical updates to the paper to reflect what the 10-iteration loop
discovered. Does NOT rewrite the whole paper; preserves structure and
contribution ordering.
1. Contribution #3 rewrite (line 180): replace "Bottleneck-diverse
canonical core" with "Three-axis regime taxonomy." Introduces the
orthogonality argument (batch size shifts dispatch, table size shifts
working set, prefill->decode shifts intensity) as the paper's new
primary classification frame.
2. Workload Suite intro (line 234): replace "five fundamentally distinct
system bottleneck classes" with the 3-axis framing. Add a paragraph
("Regime classification") describing the M1-base reference
thresholds, the unmeasured grey-band policy, and the iter-5.6 finding
that 15 static-analysis labels did not survive measurement. Surface
the (cache, bw, saturated) unreachability finding with forward
pointer to Results.
3. Table 1 caption: "Canonical Core spanning distinct bottleneck
classes" -> "spanning distinct positions on the three-axis regime
grid." Note that measurements were taken on M5 Max (dev host) while
M1 base remains the canonical reference platform.
4. Table 1 NanoGPT row: 85.9M -> 11.1M. Dataset label
"TinyShakes." -> "TinyShakes. (char)".
5. NanoGPT prose (line 306 post-edit): 85.9M -> 11.1M with an honest
one-sentence acknowledgement that the initial 85.9M figure was
inflated by a char-level-unreachable BPE vocabulary, reconciled
during iteration. Add iter-3's visceral "1 decode step = 1175
prefill tokens of throughput" headline with forward pointer to
NanoGPT-Decode in Results.
6. Convergence analysis (line 396): the "17.4M vs 85.9M" parameter
comparison is recomputed as "17.4M top-2 routed -> ~4.4M active vs
NanoGPT 11.1M dense." Restated claim: expert specialization delivers
better loss per active FLOP, not "despite fewer total parameters"
(which is no longer true post-reclassification).
7. Bottleneck Analysis (line 429): split into three paragraphs. Keep
the DLRM-vs-NanoGPT intensity contrast. Add new paragraph "An
unreachable cell, made explicit" documenting the (cache,bw,sat)
structural impossibility on PyTorch+MPS and naming the production
fusion stacks (vLLM, TensorRT-LLM, llama.cpp, MLX) that bridge it.
Add "Measurement-driven reclassification" paragraph naming the
three most surprising measurement-driven revisions (ResNet-18 bw-
bound, Micro-DLRM compute-bound, Micro-Diffusion bw-bound) and
explaining why the unmeasured-grey-band policy prevented these
mislabels from shipping.
8. Inline comment line 502: 85.9M -> 11.1M (char-level) consistency.
LaTeX structure unchanged (same \\begin/\\end count); all cross-refs
resolve. Does not touch figures, bibliography, or the appendix.
20 of 20 workloads now schema-valid; 9 of 11 measurable workloads have
evidence-bound regime values backed by sidecars in roofline/. The
linter passes --verify-against-sidecars across the suite. 13 prior
guess-classifications were corrected by measurement; the surprises
(DLRM compute-bound, ResNet bandwidth-bound, Diffusion bandwidth-bound)
will inform paper prose. Branch parked.
Folds in: bench/measure_peaks.py (real per-machine peak FLOPS + BW
measurement), roofline.py reading from cache, manifest.py rejecting
dirty trees on closed division, check_taxonomy.py
--verify-against-sidecars flag, nanogpt_prefill emitting sidecars.
Empirical findings: hardcoded M1 peaks were 5.5-7.7x off for this
machine (M-series Pro/Max). The verify-against-sidecars flag caught
a YAML claim that didn't survive real measurement (nanogpt-prefill
dispatch claim was calibrated against wrong peaks).
Branch parked. 6 of 10 iterations complete (counting 5.5).
Iter-5 from standalone: real Merkle-style provenance manifest
(src/mlperf/manifest.py) replacing the iter-1 era str(report) self-hash,
plus a roofline-coordinate emitter (src/mlperf/roofline.py) that
populates the iter-4 taxonomy axes empirically.
Smoke test: tamper detection works (mutated 1 byte -> weights.sha256
FAIL). Roofline emitter SNR 179x (gate >= 4x).
Working group sign-off: Dean (proposer + verifier).
Branch parked; not for merge to dev. 5 of 10 iterations complete.
Snapshots iter-4 from standalone repo. Adds tools/check_taxonomy.py
(linter that gates schema completeness + threshold consistency) and
the migration that converted all 20 workload entries to use the new
3-axis regime schema. Working group sign-off: Emer (proposer + verifier).
Branch parked; not for merge to dev.
Snapshots iter-3 from the standalone repo. Adds:
- Real KV-cache plumbing in gpt2_infer.py (CausalSelfAttention,
GPTBlock, GPT2WhiteBox now support use_kv_cache + past_key_values).
- NanoGPTWhiteBox unified forward signature returning either
(logits, loss) for training or (logits, present_kvs) for inference.
max_seq_len bumped 1024 -> 2048 per Dean's sizing math.
- Two new workloads (nanogpt-prefill, nanogpt-decode) sharing the
same trained checkpoint. Prefill demonstrates compute-bound
behavior (~289 FLOP/byte at ctx=1792); decode demonstrates the
bandwidth-bound regime (~0.5 FLOP/byte) that dominates LLM serving.
- smoke_nanogpt_phases.py harness with intensity-ratio gate >= 5x;
measured 578x on M-series MPS.
Working group sign-off: Dean (proposer + verifier).
Branch parked; not for merge to dev. Three iterations complete; seven
remaining per the autonomous loop plan.
Snapshots the autonomous-iteration work happening in the standalone
/Users/VJ/GitHub/mlperf-edu/ repo. Two iterations folded in:
iter-1: code-defect cleanup (Patterson + Dean sign-off)
- Remove dead simulated_loss + load_real_wikitext_data from
nanogpt_train.py; align NanoGPTWhiteBox vocab to char-level
(50,257 -> 128, dropping 19.3M unused embedding params).
- Fix two broken examples.{edge,mobile} imports in inference paths.
- Reconcile README benchmark table with workloads.yaml (was wrong
on 7 of 16 workloads).
iter-2: DLRM DRAM-resident variant (Emer sign-off)
- New MicroDLRMDRAM with 2M-row hash-mapped virtual EmbeddingBag,
sized so per-batch byte transfer (8 MB at B=8192, m_spa=256)
exceeds PyTorch's ~50 us dispatch floor and exhibits the
bandwidth-bound regime production DLRM lives in.
- Smoke test asserts pure-lookup gap >= 3x; current host shows
4.29x end-to-end and 3.49x lookup-only.
Branch is parked; not for merge to dev. Iteration log lives in the
standalone repo under .iteration_log/ (gitignored locally).