mirror of https://github.com/harvard-edge/cs249r_book.git synced 2026-05-07 02:03:55 -05:00

Files

Vijay Janapa Reddi a9878ad6bd feat: import mlperf-edu pedagogical benchmark suite

Snapshot of the standalone /Users/VJ/GitHub/mlperf-edu/ repo as of
2026-04-16, brought into MLSysBook as a parked feature branch for
backup and iteration. Not for merge to dev.

Contents (88 files, ~2.3 MB):
- 16 reference workloads (cloud / edge / tiny / agent divisions)
- LoadGen proxy harness + SUT plugin protocol
- Compliance checker, autograder, hardware fingerprint
- Paper draft (paper.tex) with TikZ/SVG figure sources
- Three lab examples + practitioner workflow configs
- Workload + dataset YAML registries (single source of truth)

Excluded (per mlperf-edu/.gitignore + size constraints):
- Datasets (6.6 GB), checkpoints (260 MB), gpt2 weights (523 MB)
- Generated PDFs, .venv, build artifacts

2026-04-16 14:15:05 -04:00

3.4 KiB

Raw Permalink Blame History

MLPerf EDU — Getting Started Guide

Setup

# Clone the repository
git clone https://github.com/harvard-edge/mlperf-edu.git
cd mlperf-edu

# Create virtual environment and install
python3 -m venv .venv
source .venv/bin/activate
pip install -e .

Quick Start: Your First Benchmark

1. Train NanoGPT (5 minutes)

mlperf run cloud --task nanogpt-12m

This trains an 85.9M-parameter GPT-2 variant on TinyShakespeare. You'll see:

Training loss converging from ~4.3 to ~2.25
Inference latency measured at the end
A JSON submission file saved to submissions/

2. Generate a Report

mlperf report --submission submissions/<your_file>.json

Open the generated HTML report in your browser. It shows:

Metrics summary (loss, latency, throughput)
Hardware fingerprint (for auditability)
Convergence behavior
SHA-256 hashes (anti-tampering)

3. Run All Workloads

mlperf train --all          # Train all 16 workloads
mlperf train --division cloud   # Just the cloud suite

Lab Structure

Lab 1: Training Optimization (Closed Division)

Goal: Reduce ResNet-18 training time by 20% without dropping below the quality target.

# Baseline run
mlperf run edge --task resnet18

# Your optimized run
python examples/lab1_optimization.py

What you'll learn:

Batch size vs. convergence tradeoffs
Data loading bottlenecks (num_workers)
Learning rate scheduling

Lab 2: Inference Architecture (Open Division)

Goal: Build a System Under Test (SUT) that handles the load generator's query stream.

python examples/lab2_inference_sut.py

What you'll learn:

Latency percentiles (p50/p90/p99)
Throughput vs. latency tradeoffs
Batching strategies

Lab 3: Architecture Comparison

Goal: Compare dense (NanoGPT) vs. sparse (Nano-MoE) architectures.

python examples/lab3_arch_comparison.py

What you'll learn:

Expert specialization in MoE
Routing overhead vs. quality improvement
Parameter efficiency

Declarative Interface (YAML)

# experiment.yaml
workload: nanogpt-12m      # S.Model
dataset: tinyshakespeare    # S.Data
target_quality: 2.3         # S.Constraints
epochs: 25                  # S.Constraints

mlperf config experiment.yaml

Available Workloads

Division	Workload	Time	Key Concept
Cloud	NanoGPT	89s	O(N²) attention scaling
Cloud	Nano-MoE	158s	Conditional compute
Cloud	DLRM	5s	Sparse vs. dense memory
Cloud	Diffusion	41s	Denoising step count
Cloud	GCN	2s	Message passing
Cloud	BERT	45s	Bidirectional attention
Cloud	LSTM	20s	Sequential bottleneck
Cloud	RL	1s	Policy gradient variance
Edge	ResNet-18	64s	Skip connections + batch norm
Edge	MobileNetV2	60s	Depthwise-sep. convolutions
Tiny	DS-CNN	51s	Spectrogram features
Tiny	Anomaly AE	6s	Reconstruction error
Tiny	VWW	10s	Sub-10K model compression

Submission & Grading

After each run, the harness produces a JSON submission:

# Verify your submission
mlperf verify --submission submissions/your_run.json

# Generate a grading artifact (for TAs)
mlperf submit

Need Help?

mlperf about — Architecture overview
mlperf list — All available workloads
mlperf --help — Full CLI reference

3.4 KiB Raw Permalink Blame History

MLPerf EDU — Getting Started Guide

Setup

Quick Start: Your First Benchmark

1. Train NanoGPT (5 minutes)

2. Generate a Report

3. Run All Workloads

Lab Structure

Lab 1: Training Optimization (Closed Division)

Lab 2: Inference Architecture (Open Division)

Lab 3: Architecture Comparison

Declarative Interface (YAML)

Available Workloads

Submission & Grading

Need Help?

3.4 KiB

Raw Permalink Blame History