cs249r_book

mirror of https://github.com/harvard-edge/cs249r_book.git synced 2026-05-08 02:28:25 -05:00

Author	SHA1	Message	Date
Vijay Janapa Reddi	b693a0832d	mlperf-edu: sync iters 7-10 (LoRA + compression + cost+DQ + distributed)	2026-04-16 18:28:49 -04:00
Vijay Janapa Reddi	d16c7585c8	mlperf-edu: sync iter-6 (LLM serving, 23 workloads, 16 measured)	2026-04-16 17:48:30 -04:00
Vijay Janapa Reddi	599fd0b39a	mlperf-edu: sync iter-5.6 (bulk regime measurement + YAML sync) 20 of 20 workloads now schema-valid; 9 of 11 measurable workloads have evidence-bound regime values backed by sidecars in roofline/. The linter passes --verify-against-sidecars across the suite. 13 prior guess-classifications were corrected by measurement; the surprises (DLRM compute-bound, ResNet bandwidth-bound, Diffusion bandwidth-bound) will inform paper prose. Branch parked.	2026-04-16 17:07:03 -04:00
Vijay Janapa Reddi	a88d77e63f	mlperf-edu: sync iter-5.5 (integration sweep) Folds in: bench/measure_peaks.py (real per-machine peak FLOPS + BW measurement), roofline.py reading from cache, manifest.py rejecting dirty trees on closed division, check_taxonomy.py --verify-against-sidecars flag, nanogpt_prefill emitting sidecars. Empirical findings: hardcoded M1 peaks were 5.5-7.7x off for this machine (M-series Pro/Max). The verify-against-sidecars flag caught a YAML claim that didn't survive real measurement (nanogpt-prefill dispatch claim was calibrated against wrong peaks). Branch parked. 6 of 10 iterations complete (counting 5.5).	2026-04-16 15:31:44 -04:00
Vijay Janapa Reddi	30f80aaf1f	mlperf-edu: sync iter-3 (NanoGPT prefill/decode split) Snapshots iter-3 from the standalone repo. Adds: - Real KV-cache plumbing in gpt2_infer.py (CausalSelfAttention, GPTBlock, GPT2WhiteBox now support use_kv_cache + past_key_values). - NanoGPTWhiteBox unified forward signature returning either (logits, loss) for training or (logits, present_kvs) for inference. max_seq_len bumped 1024 -> 2048 per Dean's sizing math. - Two new workloads (nanogpt-prefill, nanogpt-decode) sharing the same trained checkpoint. Prefill demonstrates compute-bound behavior (~289 FLOP/byte at ctx=1792); decode demonstrates the bandwidth-bound regime (~0.5 FLOP/byte) that dominates LLM serving. - smoke_nanogpt_phases.py harness with intensity-ratio gate >= 5x; measured 578x on M-series MPS. Working group sign-off: Dean (proposer + verifier). Branch parked; not for merge to dev. Three iterations complete; seven remaining per the autonomous loop plan.	2026-04-16 15:08:22 -04:00
Vijay Janapa Reddi	efaa075ba8	mlperf-edu: sync iter-1 and iter-2 from standalone repo Snapshots the autonomous-iteration work happening in the standalone /Users/VJ/GitHub/mlperf-edu/ repo. Two iterations folded in: iter-1: code-defect cleanup (Patterson + Dean sign-off) - Remove dead simulated_loss + load_real_wikitext_data from nanogpt_train.py; align NanoGPTWhiteBox vocab to char-level (50,257 -> 128, dropping 19.3M unused embedding params). - Fix two broken examples.{edge,mobile} imports in inference paths. - Reconcile README benchmark table with workloads.yaml (was wrong on 7 of 16 workloads). iter-2: DLRM DRAM-resident variant (Emer sign-off) - New MicroDLRMDRAM with 2M-row hash-mapped virtual EmbeddingBag, sized so per-batch byte transfer (8 MB at B=8192, m_spa=256) exceeds PyTorch's ~50 us dispatch floor and exhibits the bandwidth-bound regime production DLRM lives in. - Smoke test asserts pure-lookup gap >= 3x; current host shows 4.29x end-to-end and 3.49x lookup-only. Branch is parked; not for merge to dev. Iteration log lives in the standalone repo under .iteration_log/ (gitignored locally).	2026-04-16 14:59:42 -04:00
Vijay Janapa Reddi	a9878ad6bd	feat: import mlperf-edu pedagogical benchmark suite Snapshot of the standalone /Users/VJ/GitHub/mlperf-edu/ repo as of 2026-04-16, brought into MLSysBook as a parked feature branch for backup and iteration. Not for merge to dev. Contents (88 files, ~2.3 MB): - 16 reference workloads (cloud / edge / tiny / agent divisions) - LoadGen proxy harness + SUT plugin protocol - Compliance checker, autograder, hardware fingerprint - Paper draft (paper.tex) with TikZ/SVG figure sources - Three lab examples + practitioner workflow configs - Workload + dataset YAML registries (single source of truth) Excluded (per mlperf-edu/.gitignore + size constraints): - Datasets (6.6 GB), checkpoints (260 MB), gpt2 weights (523 MB) - Generated PDFs, .venv, build artifacts	2026-04-16 14:15:05 -04:00

7 Commits