cs249r_book

mirror of https://github.com/harvard-edge/cs249r_book.git synced 2026-05-06 09:38:33 -05:00

Author	SHA1	Message	Date
Vijay Janapa Reddi	400f0e3027	docs: clarify MLSysBook ecosystem paths Align public README and site messaging around the curriculum components, adoption paths, and current early-release status so newcomers can move from reading to building, deployment, practice, and teaching.	2026-04-25 08:48:38 -04:00
Vijay Janapa Reddi	9f20e7f20d	docs(readmes): add language hints to bare code fences (markdownlint MD040) Add `text` language tag to 25 unlabeled fenced code blocks across the public-facing READMEs. Mostly directory-tree listings, all-contributors bot instructions, and pseudo-output ASCII blocks — none were getting syntax highlighting anyway, but the explicit tag silences markdownlint MD040 and signals intent ("this is plain text, not a forgotten lang").	2026-04-22 16:56:08 -04:00
Vijay Janapa Reddi	434417d69f	docs(readmes): force table width via inline style (override GitHub CSS) GitHub's github-markdown-css applies: .markdown-body table { display: block; width: max-content; max-width: 100%; } The HTML width="100%" attribute is a presentational hint with lower specificity than the class selector, so tables with short cell content were sizing to max-content and not stretching to fill the column. Tables with long sentences per cell stretched fine, masking the bug. Add inline style="width:100%" (specificity 1,0,0,0) which overrides the class-selector rule. Keep width="100%" attribute as a fallback for non-GitHub renderers (VSCode preview, GitLab, plain HTML viewers). 54 tables updated across 10 READMEs + the two contributor-sync scripts that regenerate auto-managed tables.	2026-04-22 16:20:38 -04:00
Vijay Janapa Reddi	eb27858591	docs(readmes): replace HTML card pattern with native GitHub callouts The sub-project READMEs used an old-school nested-table card design with hardcoded bgcolor="#ffffff", "#cfd6dd", "#eef2f7" plus deprecated HTML4 attributes (cellpadding, cellspacing, border). It looked good in light mode but produced harsh white islands in GitHub's dark theme, which is what most readers see today. Across 11 sub-READMEs: - Strip the card wrapper so data tables are just clean <table width="100%"> with semantic <thead>/<tbody>. Headers keep their column widths; bgcolor/valign/zebra-stripe cruft is removed (GitHub provides its own theme-aware striping). - Convert the early-release callouts (and mlperf-edu's two-tier status block + "source of truth" note + interviews' two info boxes) to GitHub-native > [!NOTE] / > [!WARNING] / > [!TIP] callouts. These are theme-aware, get proper icons, and render correctly in light AND dark mode. Net result: 528 lines of HTML cruft removed, 230 lines of clean markdown added. Visual identity is preserved (callouts still stand out, tables still stretch full-width) while becoming dark-mode safe and consistent with the main README.	2026-04-22 16:12:20 -04:00
Vijay Janapa Reddi	59ecd34f51	docs(readme): standardize wide HTML tables across product READMEs - Add wrap_readme_data_tables.py to frame <table>+<thead>/<tbody> blocks in a 98% width panel (#cfd6dd border, #eef2f7 headers, zebra body rows where applied manually in converted tables). - Apply wraps to book, kits, labs, slides, tinytorch; tbody wraps for kits docs/related and instructors overview. - Convert remaining Markdown tables in mlsysim, mlperf-edu, and interviews to the same HTML pattern; replace StaffML markdown callouts with HTML panels. - Add thead rows to kits/instructors body-only tables for clearer hierarchy.	2026-04-21 08:51:04 -04:00
Vijay Janapa Reddi	ee1d7ac814	docs(readme): mlperf-edu under construction + unify status banner markers - mlperf-edu: stacked HTML panels (amber under construction + gray early-work note) at top; replace workload-table markdown quote with HTML callout. - slides, labs, mlsysim: rename DEV-BANNER comments to EARLY-RELEASE-CALLOUT for consistency with other project READMEs.	2026-04-21 08:45:29 -04:00
Vijay Janapa Reddi	1bb6ac9780	docs(readme): add early-release HTML callouts to book, kits, mlperf-edu, site Align contributor-facing trees with the same centered table banner used elsewhere. Add site/README.md describing the unified landing Quarto project.	2026-04-21 08:27:22 -04:00
Vijay Janapa Reddi	3a42c025df	fix(ci): unblock pre-commit after cascade of latent regressions Every hook now passes on `pre-commit run --all-files` (exit 0 after one auto-fix pass + one verification pass — the standard pre-commit contract). Unblocks book-validate-dev, which has been red on various hooks since the mlsysim.core import failure finally cleared. Fixes applied (source-traced, not suppressed): 1. codespell: 'OT' in mlperf-edu/reference/cloud/micro_lstm.py is the column name for Oil Temperature in the ETTh1 dataset (Zhou et al., AAAI 2021), not a typo for 'to/of/or/not/it'. Added 'ot' to .codespell-ignore-words.txt (case-insensitive, covers OT). 2. bib-lint §5 bibliography hygiene: 12 entries in mlperf-edu/paper/refs.bib missing required publisher/journal per the canonical mapping in book-prose-merged.md §5. Added canonical publishers (MLSys → mlsys.org, ICLR → OpenReview.net, CVPR → IEEE, NAACL → ACL, etc.); promoted krizhevsky2009cifar from @article to @techreport with institution = University of Toronto. banbury2021mlperf uses Curran Associates Inc. (pre-2022 NeurIPS rule); flagged banbury2024wakevision for author review since the booktitle says CVPR but web verification suggests it is still an arXiv preprint. 3. Over-eager 'vs.' style sweep corrupted anchor IDs: 77 instances of -vs.- inside {#sec-...}, {#tbl-...}, {#fig-...} definitions and their @-references across 27 QMDs. Anchor IDs must be literal strings without periods per the repo's own section-ID naming rule; stripped the period from all anchor tokens while preserving 'vs.' in visible prose. 4. 4 broken SVG filename references from the same sweep (pam4-vs.-nrz, traditional-vs.-ml-fleet, tco-build-vs.-buy, centralized-vs.-decentralized) — filenames on disk use vs- (no period) so refs restored to match. 5. Malformed XML declaration in bathtub-curve.svg: '<?xml version="utf-8"?>' → '<?xml version="1.0" encoding="utf-8"?>'. 6. 21 quad-asterisks (**term) in training.qmd collapsed to term**. 7. bibtex-tidy auto-reformatted mlperf-edu/paper/refs.bib (alphabetical order + consistent indentation + wrapped author lists) and pipe-table prettifier realigned columns across ~20 QMDs. These are all cosmetic formatter output — no content changes. Verified: pre-commit run --all-files run #1 modified files (exit 1), run #2 exit 0 with 61 Passed / 0 Failed.	2026-04-20 17:58:52 -04:00
Vijay Janapa Reddi	e1863d1a38	fix(mlperf-edu): use script-relative paths for paper figure outputs Replace hardcoded /Users/VJ/GitHub/mlperf-edu/paper/figures/... paths in generate_all_curves.py and generate_all_curves_v2.py with paths derived from os.path.dirname(__file__), so the figure-generation scripts work for any user/checkout location.	2026-04-20 14:52:49 -04:00
Vijay Janapa Reddi	b1592a6956	paper.tex: bake in iter-1..10 findings (final synthesis) Six surgical updates to the paper to reflect what the 10-iteration loop discovered. Does NOT rewrite the whole paper; preserves structure and contribution ordering. 1. Contribution #3 rewrite (line 180): replace "Bottleneck-diverse canonical core" with "Three-axis regime taxonomy." Introduces the orthogonality argument (batch size shifts dispatch, table size shifts working set, prefill->decode shifts intensity) as the paper's new primary classification frame. 2. Workload Suite intro (line 234): replace "five fundamentally distinct system bottleneck classes" with the 3-axis framing. Add a paragraph ("Regime classification") describing the M1-base reference thresholds, the unmeasured grey-band policy, and the iter-5.6 finding that 15 static-analysis labels did not survive measurement. Surface the (cache, bw, saturated) unreachability finding with forward pointer to Results. 3. Table 1 caption: "Canonical Core spanning distinct bottleneck classes" -> "spanning distinct positions on the three-axis regime grid." Note that measurements were taken on M5 Max (dev host) while M1 base remains the canonical reference platform. 4. Table 1 NanoGPT row: 85.9M -> 11.1M. Dataset label "TinyShakes." -> "TinyShakes. (char)". 5. NanoGPT prose (line 306 post-edit): 85.9M -> 11.1M with an honest one-sentence acknowledgement that the initial 85.9M figure was inflated by a char-level-unreachable BPE vocabulary, reconciled during iteration. Add iter-3's visceral "1 decode step = 1175 prefill tokens of throughput" headline with forward pointer to NanoGPT-Decode in Results. 6. Convergence analysis (line 396): the "17.4M vs 85.9M" parameter comparison is recomputed as "17.4M top-2 routed -> ~4.4M active vs NanoGPT 11.1M dense." Restated claim: expert specialization delivers better loss per active FLOP, not "despite fewer total parameters" (which is no longer true post-reclassification). 7. Bottleneck Analysis (line 429): split into three paragraphs. Keep the DLRM-vs-NanoGPT intensity contrast. Add new paragraph "An unreachable cell, made explicit" documenting the (cache,bw,sat) structural impossibility on PyTorch+MPS and naming the production fusion stacks (vLLM, TensorRT-LLM, llama.cpp, MLX) that bridge it. Add "Measurement-driven reclassification" paragraph naming the three most surprising measurement-driven revisions (ResNet-18 bw- bound, Micro-DLRM compute-bound, Micro-Diffusion bw-bound) and explaining why the unmeasured-grey-band policy prevented these mislabels from shipping. 8. Inline comment line 502: 85.9M -> 11.1M (char-level) consistency. LaTeX structure unchanged (same \\begin/\\end count); all cross-refs resolve. Does not touch figures, bibliography, or the appendix.	2026-04-17 14:57:51 -04:00
Vijay Janapa Reddi	b693a0832d	mlperf-edu: sync iters 7-10 (LoRA + compression + cost+DQ + distributed)	2026-04-16 18:28:49 -04:00
Vijay Janapa Reddi	d16c7585c8	mlperf-edu: sync iter-6 (LLM serving, 23 workloads, 16 measured)	2026-04-16 17:48:30 -04:00
Vijay Janapa Reddi	41a5e3d20a	mlperf-edu: sync iter-5.6 follow-up (13 of 20 workloads measured)	2026-04-16 17:10:40 -04:00
Vijay Janapa Reddi	599fd0b39a	mlperf-edu: sync iter-5.6 (bulk regime measurement + YAML sync) 20 of 20 workloads now schema-valid; 9 of 11 measurable workloads have evidence-bound regime values backed by sidecars in roofline/. The linter passes --verify-against-sidecars across the suite. 13 prior guess-classifications were corrected by measurement; the surprises (DLRM compute-bound, ResNet bandwidth-bound, Diffusion bandwidth-bound) will inform paper prose. Branch parked.	2026-04-16 17:07:03 -04:00
Vijay Janapa Reddi	a88d77e63f	mlperf-edu: sync iter-5.5 (integration sweep) Folds in: bench/measure_peaks.py (real per-machine peak FLOPS + BW measurement), roofline.py reading from cache, manifest.py rejecting dirty trees on closed division, check_taxonomy.py --verify-against-sidecars flag, nanogpt_prefill emitting sidecars. Empirical findings: hardcoded M1 peaks were 5.5-7.7x off for this machine (M-series Pro/Max). The verify-against-sidecars flag caught a YAML claim that didn't survive real measurement (nanogpt-prefill dispatch claim was calibrated against wrong peaks). Branch parked. 6 of 10 iterations complete (counting 5.5).	2026-04-16 15:31:44 -04:00
Vijay Janapa Reddi	9aa876e2ed	mlperf-edu: sync iter-5 (provenance + roofline emitter) Iter-5 from standalone: real Merkle-style provenance manifest (src/mlperf/manifest.py) replacing the iter-1 era str(report) self-hash, plus a roofline-coordinate emitter (src/mlperf/roofline.py) that populates the iter-4 taxonomy axes empirically. Smoke test: tamper detection works (mutated 1 byte -> weights.sha256 FAIL). Roofline emitter SNR 179x (gate >= 4x). Working group sign-off: Dean (proposer + verifier). Branch parked; not for merge to dev. 5 of 10 iterations complete.	2026-04-16 15:27:03 -04:00
Vijay Janapa Reddi	1e4f43d35a	mlperf-edu: sync iter-4 (3-axis taxonomy + check_taxonomy.py) Snapshots iter-4 from standalone repo. Adds tools/check_taxonomy.py (linter that gates schema completeness + threshold consistency) and the migration that converted all 20 workload entries to use the new 3-axis regime schema. Working group sign-off: Emer (proposer + verifier). Branch parked; not for merge to dev.	2026-04-16 15:16:49 -04:00
Vijay Janapa Reddi	30f80aaf1f	mlperf-edu: sync iter-3 (NanoGPT prefill/decode split) Snapshots iter-3 from the standalone repo. Adds: - Real KV-cache plumbing in gpt2_infer.py (CausalSelfAttention, GPTBlock, GPT2WhiteBox now support use_kv_cache + past_key_values). - NanoGPTWhiteBox unified forward signature returning either (logits, loss) for training or (logits, present_kvs) for inference. max_seq_len bumped 1024 -> 2048 per Dean's sizing math. - Two new workloads (nanogpt-prefill, nanogpt-decode) sharing the same trained checkpoint. Prefill demonstrates compute-bound behavior (~289 FLOP/byte at ctx=1792); decode demonstrates the bandwidth-bound regime (~0.5 FLOP/byte) that dominates LLM serving. - smoke_nanogpt_phases.py harness with intensity-ratio gate >= 5x; measured 578x on M-series MPS. Working group sign-off: Dean (proposer + verifier). Branch parked; not for merge to dev. Three iterations complete; seven remaining per the autonomous loop plan.	2026-04-16 15:08:22 -04:00
Vijay Janapa Reddi	efaa075ba8	mlperf-edu: sync iter-1 and iter-2 from standalone repo Snapshots the autonomous-iteration work happening in the standalone /Users/VJ/GitHub/mlperf-edu/ repo. Two iterations folded in: iter-1: code-defect cleanup (Patterson + Dean sign-off) - Remove dead simulated_loss + load_real_wikitext_data from nanogpt_train.py; align NanoGPTWhiteBox vocab to char-level (50,257 -> 128, dropping 19.3M unused embedding params). - Fix two broken examples.{edge,mobile} imports in inference paths. - Reconcile README benchmark table with workloads.yaml (was wrong on 7 of 16 workloads). iter-2: DLRM DRAM-resident variant (Emer sign-off) - New MicroDLRMDRAM with 2M-row hash-mapped virtual EmbeddingBag, sized so per-batch byte transfer (8 MB at B=8192, m_spa=256) exceeds PyTorch's ~50 us dispatch floor and exhibits the bandwidth-bound regime production DLRM lives in. - Smoke test asserts pure-lookup gap >= 3x; current host shows 4.29x end-to-end and 3.49x lookup-only. Branch is parked; not for merge to dev. Iteration log lives in the standalone repo under .iteration_log/ (gitignored locally).	2026-04-16 14:59:42 -04:00
Vijay Janapa Reddi	a9878ad6bd	feat: import mlperf-edu pedagogical benchmark suite Snapshot of the standalone /Users/VJ/GitHub/mlperf-edu/ repo as of 2026-04-16, brought into MLSysBook as a parked feature branch for backup and iteration. Not for merge to dev. Contents (88 files, ~2.3 MB): - 16 reference workloads (cloud / edge / tiny / agent divisions) - LoadGen proxy harness + SUT plugin protocol - Compliance checker, autograder, hardware fingerprint - Paper draft (paper.tex) with TikZ/SVG figure sources - Three lab examples + practitioner workflow configs - Workload + dataset YAML registries (single source of truth) Excluded (per mlperf-edu/.gitignore + size constraints): - Datasets (6.6 GB), checkpoints (260 MB), gpt2 weights (523 MB) - Generated PDFs, .venv, build artifacts	2026-04-16 14:15:05 -04:00

20 Commits