20 Commits

Author SHA1 Message Date
Vijay Janapa Reddi
400f0e3027 docs: clarify MLSysBook ecosystem paths
Align public README and site messaging around the curriculum components, adoption paths, and current early-release status so newcomers can move from reading to building, deployment, practice, and teaching.
2026-04-25 08:48:38 -04:00
Vijay Janapa Reddi
9f20e7f20d docs(readmes): add language hints to bare code fences (markdownlint MD040)
Add `text` language tag to 25 unlabeled fenced code blocks across the
public-facing READMEs. Mostly directory-tree listings, all-contributors
bot instructions, and pseudo-output ASCII blocks — none were getting
syntax highlighting anyway, but the explicit tag silences markdownlint
MD040 and signals intent ("this is plain text, not a forgotten lang").
2026-04-22 16:56:08 -04:00
Vijay Janapa Reddi
434417d69f docs(readmes): force table width via inline style (override GitHub CSS)
GitHub's github-markdown-css applies:
  .markdown-body table { display: block; width: max-content; max-width: 100%; }

The HTML width="100%" attribute is a presentational hint with lower
specificity than the class selector, so tables with short cell content
were sizing to max-content and not stretching to fill the column.
Tables with long sentences per cell stretched fine, masking the bug.

Add inline style="width:100%" (specificity 1,0,0,0) which overrides
the class-selector rule. Keep width="100%" attribute as a fallback for
non-GitHub renderers (VSCode preview, GitLab, plain HTML viewers).

54 tables updated across 10 READMEs + the two contributor-sync scripts
that regenerate auto-managed tables.
2026-04-22 16:20:38 -04:00
Vijay Janapa Reddi
eb27858591 docs(readmes): replace HTML card pattern with native GitHub callouts
The sub-project READMEs used an old-school nested-table card design
with hardcoded bgcolor="#ffffff", "#cfd6dd", "#eef2f7" plus deprecated
HTML4 attributes (cellpadding, cellspacing, border). It looked good in
light mode but produced harsh white islands in GitHub's dark theme,
which is what most readers see today.

Across 11 sub-READMEs:

- Strip the card wrapper so data tables are just clean
  <table width="100%"> with semantic <thead>/<tbody>. Headers keep
  their column widths; bgcolor/valign/zebra-stripe cruft is removed
  (GitHub provides its own theme-aware striping).
- Convert the early-release callouts (and mlperf-edu's two-tier
  status block + "source of truth" note + interviews' two info boxes)
  to GitHub-native > [!NOTE] / > [!WARNING] / > [!TIP] callouts.
  These are theme-aware, get proper icons, and render correctly in
  light AND dark mode.

Net result: 528 lines of HTML cruft removed, 230 lines of clean
markdown added. Visual identity is preserved (callouts still stand
out, tables still stretch full-width) while becoming dark-mode safe
and consistent with the main README.
2026-04-22 16:12:20 -04:00
Vijay Janapa Reddi
59ecd34f51 docs(readme): standardize wide HTML tables across product READMEs
- Add wrap_readme_data_tables.py to frame <table>+<thead>/<tbody> blocks in a
  98% width panel (#cfd6dd border, #eef2f7 headers, zebra body rows where
  applied manually in converted tables).
- Apply wraps to book, kits, labs, slides, tinytorch; tbody wraps for kits
  docs/related and instructors overview.
- Convert remaining Markdown tables in mlsysim, mlperf-edu, and interviews to
  the same HTML pattern; replace StaffML markdown callouts with HTML panels.
- Add thead rows to kits/instructors body-only tables for clearer hierarchy.
2026-04-21 08:51:04 -04:00
Vijay Janapa Reddi
ee1d7ac814 docs(readme): mlperf-edu under construction + unify status banner markers
- mlperf-edu: stacked HTML panels (amber under construction + gray early-work
  note) at top; replace workload-table markdown quote with HTML callout.
- slides, labs, mlsysim: rename DEV-BANNER comments to EARLY-RELEASE-CALLOUT
  for consistency with other project READMEs.
2026-04-21 08:45:29 -04:00
Vijay Janapa Reddi
1bb6ac9780 docs(readme): add early-release HTML callouts to book, kits, mlperf-edu, site
Align contributor-facing trees with the same centered table banner used
elsewhere. Add site/README.md describing the unified landing Quarto project.
2026-04-21 08:27:22 -04:00
Vijay Janapa Reddi
3a42c025df fix(ci): unblock pre-commit after cascade of latent regressions
Every hook now passes on `pre-commit run --all-files` (exit 0 after one
auto-fix pass + one verification pass — the standard pre-commit contract).
Unblocks book-validate-dev, which has been red on various hooks since the
mlsysim.core import failure finally cleared.

Fixes applied (source-traced, not suppressed):

1. codespell: 'OT' in mlperf-edu/reference/cloud/micro_lstm.py is the
   column name for Oil Temperature in the ETTh1 dataset (Zhou et al.,
   AAAI 2021), not a typo for 'to/of/or/not/it'. Added 'ot' to
   .codespell-ignore-words.txt (case-insensitive, covers OT).

2. bib-lint §5 bibliography hygiene: 12 entries in
   mlperf-edu/paper/refs.bib missing required publisher/journal per the
   canonical mapping in book-prose-merged.md §5. Added canonical
   publishers (MLSys → mlsys.org, ICLR → OpenReview.net, CVPR → IEEE,
   NAACL → ACL, etc.); promoted krizhevsky2009cifar from @article to
   @techreport with institution = University of Toronto. banbury2021mlperf
   uses Curran Associates Inc. (pre-2022 NeurIPS rule); flagged
   banbury2024wakevision for author review since the booktitle says CVPR
   but web verification suggests it is still an arXiv preprint.

3. Over-eager 'vs.' style sweep corrupted anchor IDs: 77 instances of
   -vs.- inside {#sec-...}, {#tbl-...}, {#fig-...} definitions and their
   @-references across 27 QMDs. Anchor IDs must be literal strings without
   periods per the repo's own section-ID naming rule; stripped the period
   from all anchor tokens while preserving 'vs.' in visible prose.

4. 4 broken SVG filename references from the same sweep (pam4-vs.-nrz,
   traditional-vs.-ml-fleet, tco-build-vs.-buy, centralized-vs.-decentralized)
   — filenames on disk use vs- (no period) so refs restored to match.

5. Malformed XML declaration in bathtub-curve.svg:
   '<?xml version="utf-8"?>' → '<?xml version="1.0" encoding="utf-8"?>'.

6. 21 quad-asterisks (****term****) in training.qmd collapsed to **term**.

7. bibtex-tidy auto-reformatted mlperf-edu/paper/refs.bib (alphabetical
   order + consistent indentation + wrapped author lists) and pipe-table
   prettifier realigned columns across ~20 QMDs. These are all cosmetic
   formatter output — no content changes.

Verified: pre-commit run --all-files run #1 modified files (exit 1),
run #2 exit 0 with 61 Passed / 0 Failed.
2026-04-20 17:58:52 -04:00
Vijay Janapa Reddi
e1863d1a38 fix(mlperf-edu): use script-relative paths for paper figure outputs
Replace hardcoded /Users/VJ/GitHub/mlperf-edu/paper/figures/... paths
in generate_all_curves.py and generate_all_curves_v2.py with paths
derived from os.path.dirname(__file__), so the figure-generation
scripts work for any user/checkout location.
2026-04-20 14:52:49 -04:00
Vijay Janapa Reddi
b1592a6956 paper.tex: bake in iter-1..10 findings (final synthesis)
Six surgical updates to the paper to reflect what the 10-iteration loop
discovered. Does NOT rewrite the whole paper; preserves structure and
contribution ordering.

1. Contribution #3 rewrite (line 180): replace "Bottleneck-diverse
   canonical core" with "Three-axis regime taxonomy." Introduces the
   orthogonality argument (batch size shifts dispatch, table size shifts
   working set, prefill->decode shifts intensity) as the paper's new
   primary classification frame.

2. Workload Suite intro (line 234): replace "five fundamentally distinct
   system bottleneck classes" with the 3-axis framing. Add a paragraph
   ("Regime classification") describing the M1-base reference
   thresholds, the unmeasured grey-band policy, and the iter-5.6 finding
   that 15 static-analysis labels did not survive measurement. Surface
   the (cache, bw, saturated) unreachability finding with forward
   pointer to Results.

3. Table 1 caption: "Canonical Core spanning distinct bottleneck
   classes" -> "spanning distinct positions on the three-axis regime
   grid." Note that measurements were taken on M5 Max (dev host) while
   M1 base remains the canonical reference platform.

4. Table 1 NanoGPT row: 85.9M -> 11.1M. Dataset label
   "TinyShakes." -> "TinyShakes. (char)".

5. NanoGPT prose (line 306 post-edit): 85.9M -> 11.1M with an honest
   one-sentence acknowledgement that the initial 85.9M figure was
   inflated by a char-level-unreachable BPE vocabulary, reconciled
   during iteration. Add iter-3's visceral "1 decode step = 1175
   prefill tokens of throughput" headline with forward pointer to
   NanoGPT-Decode in Results.

6. Convergence analysis (line 396): the "17.4M vs 85.9M" parameter
   comparison is recomputed as "17.4M top-2 routed -> ~4.4M active vs
   NanoGPT 11.1M dense." Restated claim: expert specialization delivers
   better loss per active FLOP, not "despite fewer total parameters"
   (which is no longer true post-reclassification).

7. Bottleneck Analysis (line 429): split into three paragraphs. Keep
   the DLRM-vs-NanoGPT intensity contrast. Add new paragraph "An
   unreachable cell, made explicit" documenting the (cache,bw,sat)
   structural impossibility on PyTorch+MPS and naming the production
   fusion stacks (vLLM, TensorRT-LLM, llama.cpp, MLX) that bridge it.
   Add "Measurement-driven reclassification" paragraph naming the
   three most surprising measurement-driven revisions (ResNet-18 bw-
   bound, Micro-DLRM compute-bound, Micro-Diffusion bw-bound) and
   explaining why the unmeasured-grey-band policy prevented these
   mislabels from shipping.

8. Inline comment line 502: 85.9M -> 11.1M (char-level) consistency.

LaTeX structure unchanged (same \\begin/\\end count); all cross-refs
resolve. Does not touch figures, bibliography, or the appendix.
2026-04-17 14:57:51 -04:00
Vijay Janapa Reddi
b693a0832d mlperf-edu: sync iters 7-10 (LoRA + compression + cost+DQ + distributed) 2026-04-16 18:28:49 -04:00
Vijay Janapa Reddi
d16c7585c8 mlperf-edu: sync iter-6 (LLM serving, 23 workloads, 16 measured) 2026-04-16 17:48:30 -04:00
Vijay Janapa Reddi
41a5e3d20a mlperf-edu: sync iter-5.6 follow-up (13 of 20 workloads measured) 2026-04-16 17:10:40 -04:00
Vijay Janapa Reddi
599fd0b39a mlperf-edu: sync iter-5.6 (bulk regime measurement + YAML sync)
20 of 20 workloads now schema-valid; 9 of 11 measurable workloads have
evidence-bound regime values backed by sidecars in roofline/. The
linter passes --verify-against-sidecars across the suite. 13 prior
guess-classifications were corrected by measurement; the surprises
(DLRM compute-bound, ResNet bandwidth-bound, Diffusion bandwidth-bound)
will inform paper prose. Branch parked.
2026-04-16 17:07:03 -04:00
Vijay Janapa Reddi
a88d77e63f mlperf-edu: sync iter-5.5 (integration sweep)
Folds in: bench/measure_peaks.py (real per-machine peak FLOPS + BW
measurement), roofline.py reading from cache, manifest.py rejecting
dirty trees on closed division, check_taxonomy.py
--verify-against-sidecars flag, nanogpt_prefill emitting sidecars.

Empirical findings: hardcoded M1 peaks were 5.5-7.7x off for this
machine (M-series Pro/Max). The verify-against-sidecars flag caught
a YAML claim that didn't survive real measurement (nanogpt-prefill
dispatch claim was calibrated against wrong peaks).

Branch parked. 6 of 10 iterations complete (counting 5.5).
2026-04-16 15:31:44 -04:00
Vijay Janapa Reddi
9aa876e2ed mlperf-edu: sync iter-5 (provenance + roofline emitter)
Iter-5 from standalone: real Merkle-style provenance manifest
(src/mlperf/manifest.py) replacing the iter-1 era str(report) self-hash,
plus a roofline-coordinate emitter (src/mlperf/roofline.py) that
populates the iter-4 taxonomy axes empirically.

Smoke test: tamper detection works (mutated 1 byte -> weights.sha256
FAIL). Roofline emitter SNR 179x (gate >= 4x).

Working group sign-off: Dean (proposer + verifier).
Branch parked; not for merge to dev. 5 of 10 iterations complete.
2026-04-16 15:27:03 -04:00
Vijay Janapa Reddi
1e4f43d35a mlperf-edu: sync iter-4 (3-axis taxonomy + check_taxonomy.py)
Snapshots iter-4 from standalone repo. Adds tools/check_taxonomy.py
(linter that gates schema completeness + threshold consistency) and
the migration that converted all 20 workload entries to use the new
3-axis regime schema. Working group sign-off: Emer (proposer + verifier).
Branch parked; not for merge to dev.
2026-04-16 15:16:49 -04:00
Vijay Janapa Reddi
30f80aaf1f mlperf-edu: sync iter-3 (NanoGPT prefill/decode split)
Snapshots iter-3 from the standalone repo. Adds:
  - Real KV-cache plumbing in gpt2_infer.py (CausalSelfAttention,
    GPTBlock, GPT2WhiteBox now support use_kv_cache + past_key_values).
  - NanoGPTWhiteBox unified forward signature returning either
    (logits, loss) for training or (logits, present_kvs) for inference.
    max_seq_len bumped 1024 -> 2048 per Dean's sizing math.
  - Two new workloads (nanogpt-prefill, nanogpt-decode) sharing the
    same trained checkpoint. Prefill demonstrates compute-bound
    behavior (~289 FLOP/byte at ctx=1792); decode demonstrates the
    bandwidth-bound regime (~0.5 FLOP/byte) that dominates LLM serving.
  - smoke_nanogpt_phases.py harness with intensity-ratio gate >= 5x;
    measured 578x on M-series MPS.

Working group sign-off: Dean (proposer + verifier).
Branch parked; not for merge to dev. Three iterations complete; seven
remaining per the autonomous loop plan.
2026-04-16 15:08:22 -04:00
Vijay Janapa Reddi
efaa075ba8 mlperf-edu: sync iter-1 and iter-2 from standalone repo
Snapshots the autonomous-iteration work happening in the standalone
/Users/VJ/GitHub/mlperf-edu/ repo. Two iterations folded in:

  iter-1: code-defect cleanup (Patterson + Dean sign-off)
    - Remove dead simulated_loss + load_real_wikitext_data from
      nanogpt_train.py; align NanoGPTWhiteBox vocab to char-level
      (50,257 -> 128, dropping 19.3M unused embedding params).
    - Fix two broken examples.{edge,mobile} imports in inference paths.
    - Reconcile README benchmark table with workloads.yaml (was wrong
      on 7 of 16 workloads).

  iter-2: DLRM DRAM-resident variant (Emer sign-off)
    - New MicroDLRMDRAM with 2M-row hash-mapped virtual EmbeddingBag,
      sized so per-batch byte transfer (8 MB at B=8192, m_spa=256)
      exceeds PyTorch's ~50 us dispatch floor and exhibits the
      bandwidth-bound regime production DLRM lives in.
    - Smoke test asserts pure-lookup gap >= 3x; current host shows
      4.29x end-to-end and 3.49x lookup-only.

Branch is parked; not for merge to dev. Iteration log lives in the
standalone repo under .iteration_log/ (gitignored locally).
2026-04-16 14:59:42 -04:00
Vijay Janapa Reddi
a9878ad6bd feat: import mlperf-edu pedagogical benchmark suite
Snapshot of the standalone /Users/VJ/GitHub/mlperf-edu/ repo as of
2026-04-16, brought into MLSysBook as a parked feature branch for
backup and iteration. Not for merge to dev.

Contents (88 files, ~2.3 MB):
- 16 reference workloads (cloud / edge / tiny / agent divisions)
- LoadGen proxy harness + SUT plugin protocol
- Compliance checker, autograder, hardware fingerprint
- Paper draft (paper.tex) with TikZ/SVG figure sources
- Three lab examples + practitioner workflow configs
- Workload + dataset YAML registries (single source of truth)

Excluded (per mlperf-edu/.gitignore + size constraints):
- Datasets (6.6 GB), checkpoints (260 MB), gpt2 weights (523 MB)
- Generated PDFs, .venv, build artifacts
2026-04-16 14:15:05 -04:00