mirror of
https://github.com/harvard-edge/cs249r_book.git
synced 2026-05-07 02:03:55 -05:00
Snapshot of the standalone /Users/VJ/GitHub/mlperf-edu/ repo as of 2026-04-16, brought into MLSysBook as a parked feature branch for backup and iteration. Not for merge to dev. Contents (88 files, ~2.3 MB): - 16 reference workloads (cloud / edge / tiny / agent divisions) - LoadGen proxy harness + SUT plugin protocol - Compliance checker, autograder, hardware fingerprint - Paper draft (paper.tex) with TikZ/SVG figure sources - Three lab examples + practitioner workflow configs - Workload + dataset YAML registries (single source of truth) Excluded (per mlperf-edu/.gitignore + size constraints): - Datasets (6.6 GB), checkpoints (260 MB), gpt2 weights (523 MB) - Generated PDFs, .venv, build artifacts
3.4 KiB
3.4 KiB
MLPerf EDU — Getting Started Guide
Setup
# Clone the repository
git clone https://github.com/harvard-edge/mlperf-edu.git
cd mlperf-edu
# Create virtual environment and install
python3 -m venv .venv
source .venv/bin/activate
pip install -e .
Quick Start: Your First Benchmark
1. Train NanoGPT (5 minutes)
mlperf run cloud --task nanogpt-12m
This trains an 85.9M-parameter GPT-2 variant on TinyShakespeare. You'll see:
- Training loss converging from ~4.3 to ~2.25
- Inference latency measured at the end
- A JSON submission file saved to
submissions/
2. Generate a Report
mlperf report --submission submissions/<your_file>.json
Open the generated HTML report in your browser. It shows:
- Metrics summary (loss, latency, throughput)
- Hardware fingerprint (for auditability)
- Convergence behavior
- SHA-256 hashes (anti-tampering)
3. Run All Workloads
mlperf train --all # Train all 16 workloads
mlperf train --division cloud # Just the cloud suite
Lab Structure
Lab 1: Training Optimization (Closed Division)
Goal: Reduce ResNet-18 training time by 20% without dropping below the quality target.
# Baseline run
mlperf run edge --task resnet18
# Your optimized run
python examples/lab1_optimization.py
What you'll learn:
- Batch size vs. convergence tradeoffs
- Data loading bottlenecks (num_workers)
- Learning rate scheduling
Lab 2: Inference Architecture (Open Division)
Goal: Build a System Under Test (SUT) that handles the load generator's query stream.
python examples/lab2_inference_sut.py
What you'll learn:
- Latency percentiles (p50/p90/p99)
- Throughput vs. latency tradeoffs
- Batching strategies
Lab 3: Architecture Comparison
Goal: Compare dense (NanoGPT) vs. sparse (Nano-MoE) architectures.
python examples/lab3_arch_comparison.py
What you'll learn:
- Expert specialization in MoE
- Routing overhead vs. quality improvement
- Parameter efficiency
Declarative Interface (YAML)
# experiment.yaml
workload: nanogpt-12m # S.Model
dataset: tinyshakespeare # S.Data
target_quality: 2.3 # S.Constraints
epochs: 25 # S.Constraints
mlperf config experiment.yaml
Available Workloads
| Division | Workload | Time | Key Concept |
|---|---|---|---|
| Cloud | NanoGPT | 89s | O(N²) attention scaling |
| Cloud | Nano-MoE | 158s | Conditional compute |
| Cloud | DLRM | 5s | Sparse vs. dense memory |
| Cloud | Diffusion | 41s | Denoising step count |
| Cloud | GCN | 2s | Message passing |
| Cloud | BERT | 45s | Bidirectional attention |
| Cloud | LSTM | 20s | Sequential bottleneck |
| Cloud | RL | 1s | Policy gradient variance |
| Edge | ResNet-18 | 64s | Skip connections + batch norm |
| Edge | MobileNetV2 | 60s | Depthwise-sep. convolutions |
| Tiny | DS-CNN | 51s | Spectrogram features |
| Tiny | Anomaly AE | 6s | Reconstruction error |
| Tiny | VWW | 10s | Sub-10K model compression |
Submission & Grading
After each run, the harness produces a JSON submission:
# Verify your submission
mlperf verify --submission submissions/your_run.json
# Generate a grading artifact (for TAs)
mlperf submit
Need Help?
mlperf about— Architecture overviewmlperf list— All available workloadsmlperf --help— Full CLI reference