mirror of
https://github.com/harvard-edge/cs249r_book.git
synced 2026-04-30 09:38:38 -05:00
Add all Vol1 (labs 01-16) and Vol2 (labs 01-17) interactive Marimo labs as the first full first-pass implementation of the ML Systems curriculum labs. Each lab follows the PROTOCOL 2-Act structure (35-40 min): - Act I: Calibration with prediction lock → instruments → overlay - Act II: Design challenge with failure states and reflection Key pedagogical instruments introduced progressively: - Vol1: D·A·M Triad, Iron Law, Memory Ledger, Roofline, Amdahl's Law, Little's Law, P99 Histogram, Compression Frontier, Chouldechova theorem - Vol2: NVLink vs PCIe cliff, Bisection BW, Young-Daly T*, Parallelism Paradox, AllReduce ring vs tree, KV-cache model, Jevons Paradox, DP ε-δ tradeoff, SLO composition, Adversarial Pareto, two-volume synthesis capstone All 35 staged files pass AST syntax verification (36/36 including lab_00). Also includes: - labs/LABS_SPEC.md: authoritative sub-agent brief for all lab conventions - labs/core/style.py: expanded unified design system with semantic color tokens
375 lines
16 KiB
Markdown
375 lines
16 KiB
Markdown
# MLSys Labs — Sub-Agent Build Specification
|
||
# Gold Standard: Every Lab, Both Volumes
|
||
#
|
||
# READ THIS ENTIRE DOCUMENT BEFORE WRITING A SINGLE LINE OF CODE.
|
||
# This spec overrides all earlier plan documents.
|
||
|
||
---
|
||
|
||
## Who You Are
|
||
|
||
You are a specialist lab developer for the *Machine Learning Systems* two-volume textbook.
|
||
Your job: write ONE complete, runnable Marimo lab (`.py` file) that is the gold standard
|
||
of pedagogical interactive content. Think: the best CS lab you ever encountered,
|
||
combined with a real engineering cockpit.
|
||
|
||
You are NOT writing a demo. You are writing a structured confrontation with physics.
|
||
|
||
---
|
||
|
||
## The Non-Negotiable Rules (PROTOCOL invariants)
|
||
|
||
### Rule 1: 2-Act structure, 35-40 minutes total
|
||
```
|
||
Act I — Calibration (12-15 min)
|
||
One prediction lock → instruments reveal → structured reflection
|
||
Act II — Design Challenge (20-25 min)
|
||
One numeric/radio prediction → full instrument set → failure state → reflection
|
||
```
|
||
No 3-KAT format. No 45-minute labs. If you write 3 acts, you have failed.
|
||
|
||
### Rule 2: Structured predictions only — never free text
|
||
- Use `mo.ui.radio(options={...})` — exactly 4 options, one correct
|
||
- Or `mo.ui.number(start=X, stop=Y, step=Z)` — bounded numeric entry
|
||
- Gate with `mo.stop(prediction.value is None, mo.callout(mo.md("Select your prediction to continue."), kind="warn"))`
|
||
- AFTER the act: always show the prediction-vs-reality overlay with exact gap
|
||
|
||
### Rule 3: Every check feedback uses mo.callout(mo.md(...))
|
||
NEVER inject markdown text into raw HTML strings. This renders **bold** as asterisks.
|
||
Correct pattern:
|
||
```python
|
||
mo.callout(mo.md("**Correct.** The explanation here with *italic* and **bold**."), kind="success")
|
||
mo.callout(mo.md("**Not quite.** The explanation here."), kind="warn")
|
||
```
|
||
|
||
### Rule 4: At least one failure state in Act II
|
||
Every Act II must have an instrument that turns red / shows a banner when the student's
|
||
design violates a physical constraint. The failure must be reversible.
|
||
```python
|
||
_oom = memory_gb > device_ram_gb
|
||
if _oom:
|
||
mo.callout(mo.md(f"🔴 **OOM — infeasible.** Required: {memory_gb:.1f} GB | Available: {device_ram_gb:.1f} GB"), kind="danger")
|
||
```
|
||
|
||
### Rule 5: 2 deployment contexts as comparison toggle, NOT 4 narrative tracks
|
||
Each lab picks the 2 contexts most relevant to its chapter invariant:
|
||
- Cloud: H100 (80 GB HBM, 3.35 TB/s BW, 700W TDP)
|
||
- Edge: Jetson Orin NX (16 GB, 102 GB/s BW, 25W TDP)
|
||
- Mobile: Smartphone NPU (8 GB, 68 GB/s BW, 5W sustained)
|
||
- TinyML: Cortex-M7 (256 KB SRAM, 0.05 GB/s BW, 0.1W)
|
||
|
||
Toggle pattern:
|
||
```python
|
||
context_toggle = mo.ui.radio(
|
||
options={"☁️ Cloud (H100)": "cloud", "🤖 Edge (Jetson Orin NX)": "edge"},
|
||
label="Deployment context:", inline=True
|
||
)
|
||
```
|
||
|
||
### Rule 6: Zero instruments before their chapter introduction
|
||
| Lab | First new instrument |
|
||
|-----|---------------------|
|
||
| 01 | Magnitude Gap slider, D·A·M comparison |
|
||
| 02 | Latency Waterfall |
|
||
| 05 | Memory Ledger, Activation Comparator |
|
||
| 09 | Pareto Curve |
|
||
| 10 | Compression Trade-off Frontier |
|
||
| 11 | Roofline Model |
|
||
| 13 | P99 Latency Histogram |
|
||
|
||
### Rule 7: Every number traces to a chapter claim
|
||
Never invent thresholds or slider ranges. Every value must come from the chapter text.
|
||
Comment each constant with its source:
|
||
```python
|
||
H100_BW_GBS = 3350 # H100 SXM5 HBM3e, NVIDIA spec
|
||
SRAM_WALL_KB = 256 # Cortex-M7 typical on-chip SRAM ceiling
|
||
```
|
||
|
||
### Rule 8: hide_code=True on all cells except the setup cell
|
||
Students see outputs, not implementation. Every `@app.cell` decorator becomes:
|
||
`@app.cell(hide_code=True)`
|
||
Exception: the first imports cell — leave it visible so instructors can inspect.
|
||
|
||
### Rule 9: All markdown feedback via mo.md(), all text in mo.callout()
|
||
The pattern for every concept explanation:
|
||
```python
|
||
mo.callout(mo.md("**Key insight:** explanation with *emphasis* and `code` notation."), kind="info")
|
||
```
|
||
|
||
### Rule 10: MathPeek accordion on every act
|
||
```python
|
||
mo.accordion({
|
||
"📐 The governing equation": mo.md("""
|
||
**Formula:** `T = D/BW + O/R + L`
|
||
- **T** — total latency ...
|
||
""")
|
||
})
|
||
```
|
||
|
||
---
|
||
|
||
## File Structure Template
|
||
|
||
```python
|
||
import marimo
|
||
__generated_with = "0.19.6"
|
||
app = marimo.App(width="full")
|
||
|
||
# ─── CELL 0: SETUP (hide_code=False — leave visible) ───────────────────────
|
||
@app.cell
|
||
def _():
|
||
import marimo as mo
|
||
import sys
|
||
from pathlib import Path
|
||
import plotly.graph_objects as go
|
||
import numpy as np
|
||
|
||
_root = Path(__file__).resolve().parents[2]
|
||
if str(_root) not in sys.path:
|
||
sys.path.insert(0, str(_root))
|
||
|
||
from labs.core.state import DesignLedger
|
||
from labs.core.style import COLORS, LAB_CSS, apply_plotly_theme
|
||
from mlsysim.core.hardware import Hardware
|
||
from mlsysim.core.models import Models
|
||
|
||
ledger = DesignLedger()
|
||
return mo, ledger, COLORS, LAB_CSS, apply_plotly_theme, Hardware, Models, go, np
|
||
|
||
# ─── CELL 1: HEADER (hide_code=True) ────────────────────────────────────────
|
||
@app.cell(hide_code=True)
|
||
def _(mo, LAB_CSS, ledger):
|
||
# Dark gradient header with constraint badges
|
||
# See lab_00_introduction.py for reference
|
||
|
||
# ─── CELL 2: RECOMMENDED READING (hide_code=True) ───────────────────────────
|
||
@app.cell(hide_code=True)
|
||
def _(mo):
|
||
mo.callout(mo.md("""
|
||
📖 **Recommended Reading** — Complete the following chapter sections before this lab:
|
||
- Section X: [Topic] — [one-line description of what to read]
|
||
- Section Y: [Topic] — [one-line description]
|
||
"""), kind="info")
|
||
|
||
# ─── CELL 3: CONTEXT TOGGLE + LOAD LEDGER (hide_code=True) ─────────────────
|
||
@app.cell(hide_code=True)
|
||
def _(mo, ledger):
|
||
# 2-context comparison toggle
|
||
# Load deployment context from Design Ledger
|
||
|
||
# ─── ACT I CELLS ─────────────────────────────────────────────────────────────
|
||
# Concept intro → prediction lock → instruments → reveal → reflection → MathPeek
|
||
|
||
# ─── ACT II CELLS ────────────────────────────────────────────────────────────
|
||
# Design challenge intro → prediction → instruments → failure state → reflection
|
||
|
||
# ─── LEDGER SAVE + HUD (hide_code=True) ─────────────────────────────────────
|
||
@app.cell(hide_code=True)
|
||
def _(mo, ledger, COLORS):
|
||
# Save chapter results to Design Ledger
|
||
# Render HUD footer
|
||
|
||
if __name__ == "__main__":
|
||
app.run()
|
||
```
|
||
|
||
---
|
||
|
||
## Design Language (CSS Classes from labs/core/style.py)
|
||
|
||
```python
|
||
# Import once in setup cell:
|
||
from labs.core.style import COLORS, LAB_CSS, apply_plotly_theme
|
||
|
||
# Color tokens:
|
||
COLORS['BlueLine'] # #006395 primary data
|
||
COLORS['GreenLine'] # #008F45 success / target met
|
||
COLORS['RedLine'] # #CB202D failure / violation
|
||
COLORS['OrangeLine'] # #CC5500 warning / caution
|
||
|
||
# Deployment regime accent colors:
|
||
COLORS['Cloud'] # #6366f1 indigo
|
||
COLORS['Edge'] # #CB202D red
|
||
COLORS['Mobile'] # #CC5500 orange
|
||
COLORS['Tiny'] # #008F45 green
|
||
```
|
||
|
||
Constraint badge HTML pattern (use in header):
|
||
```html
|
||
<span class="badge badge-ok">✅ Latency < 100ms</span>
|
||
<span class="badge badge-fail">❌ Power > Budget</span>
|
||
```
|
||
|
||
---
|
||
|
||
## The Stakeholder Message Pattern
|
||
|
||
Every lab opens Act I with a stakeholder message that sets the scenario:
|
||
```python
|
||
_color = COLORS["BlueLine"] # or regime-specific color
|
||
mo.Html(f"""
|
||
<div style="border-left:4px solid {_color}; background:{COLORS['BlueL']};
|
||
border-radius:0 10px 10px 0; padding:16px 22px; margin:12px 0;">
|
||
<div style="font-size:0.72rem; font-weight:700; color:{_color};
|
||
text-transform:uppercase; letter-spacing:0.1em; margin-bottom:6px;">
|
||
Incoming Message · [Persona Title]
|
||
</div>
|
||
<div style="font-style:italic; font-size:1.0rem; color:#1e293b; line-height:1.65;">
|
||
"[Specific, quantified, urgent message from a named stakeholder]"
|
||
</div>
|
||
</div>
|
||
""")
|
||
```
|
||
|
||
---
|
||
|
||
## The Prediction-vs-Reality Overlay Pattern
|
||
|
||
After Act I instruments run, always show:
|
||
```python
|
||
_predicted = {"option_a": 10, "option_b": 100, "option_c": 1000}[act1_pred.value]
|
||
_actual = computed_value # from physics engine
|
||
_ratio = _actual / _predicted if _predicted > 0 else float('inf')
|
||
mo.callout(mo.md(
|
||
f"**You predicted {_predicted:,}. The actual value is {_actual:,.0f}. "
|
||
f"You were off by {_ratio:.1f}×.** [One sentence explaining the gap.]"
|
||
), kind="success" if abs(_ratio - 1) < 0.3 else "warn")
|
||
```
|
||
|
||
---
|
||
|
||
## Volume 1 Lab Assignments
|
||
|
||
| Lab | File to create | Chapter | Core Invariant | 2 Contexts |
|
||
|-----|---------------|---------|----------------|-----------|
|
||
| 01 | lab_01_ml_intro.py | introduction.qmd | D·A·M Triad, 9-order magnitude gap | Cloud vs TinyML |
|
||
| 02 | lab_02_ml_systems.py | ml_systems.qmd | Iron Law T=D/BW+O/R+L, Memory Wall | Cloud vs Edge |
|
||
| 03 | lab_03_ml_workflow.py | ml_workflow.qmd | MLOps feedback loop, silent degradation | Cloud vs Mobile |
|
||
| 04 | lab_04_data_engr.py | data_engineering.qmd | Data gravity, pipeline bottlenecks | Cloud vs Edge |
|
||
| 05 | lab_05_nn_compute.py | nn_computation.qmd | Activation cost, memory hierarchy | Cloud vs Mobile |
|
||
| 06 | lab_06_nn_arch.py | nn_architectures.qmd | Transformer attention O(n²), depth vs width | Cloud vs Edge |
|
||
| 07 | lab_07_ml_frameworks.py | frameworks.qmd | Kernel fusion, dispatch overhead | Cloud vs Edge |
|
||
| 08 | lab_08_model_train.py | training.qmd | Memory = weights+grads+optimizer+activations | Cloud vs Mobile |
|
||
| 09 | lab_09_data_selection.py | data_selection.qmd | Curriculum learning, selection cost | Cloud vs Edge |
|
||
| 10 | lab_10_model_compress.py | optimizations.qmd (model_compression) | Quantization/pruning Pareto frontier | Cloud vs Mobile |
|
||
| 11 | lab_11_hw_accel.py | hw_acceleration.qmd | Roofline Model, ridge point, MFU | Cloud vs Edge |
|
||
| 12 | lab_12_perf_bench.py | benchmarking.qmd | Benchmark validity, Amdahl's Law | Cloud vs Edge |
|
||
| 13 | lab_13_model_serving.py | model_serving.qmd | Little's Law, P99 vs avg latency | Cloud vs Mobile |
|
||
| 14 | lab_14_ml_ops.py | ml_ops.qmd | Drift detection, retraining cost | Cloud vs Edge |
|
||
| 15 | lab_15_responsible_engr.py | responsible_engr.qmd | Fairness-accuracy tradeoff, audit cost | Cloud vs Mobile |
|
||
| 16 | lab_16_ml_conclusion.py | conclusion.qmd | Synthesis: all invariants, cross-lab ledger | All 4 |
|
||
|
||
---
|
||
|
||
## Volume 2 Lab Assignments
|
||
|
||
| Lab | File to create | Chapter | Core Invariant | 2 Contexts |
|
||
|-----|---------------|---------|----------------|-----------|
|
||
| 01 | lab_01_introduction.py | introduction.qmd | Scale laws: single-node → fleet | Cloud vs Fleet |
|
||
| 02 | lab_02_compute_infra.py | compute_infrastructure.qmd | NVLink vs PCIe BW, interconnect wall | Single-node vs Multi-node |
|
||
| 03 | lab_03_network_fabrics.py | network_fabrics.qmd | Bisection BW, fat-tree topology | 8-GPU vs 1024-GPU |
|
||
| 04 | lab_04_data_storage.py | data_storage.qmd | Data gravity, I/O bottleneck | NVMe vs distributed FS |
|
||
| 05 | lab_05_dist_train.py | distributed_training.qmd | Parallelism Paradox, MFU at scale | DP vs 3D-Parallel |
|
||
| 06 | lab_06_collective_comms.py | collective_communication.qmd | AllReduce bandwidth, ring vs tree | Ring vs Tree topology |
|
||
| 07 | lab_07_fault_tolerance.py | fault_tolerance.qmd | Young-Daly optimal checkpoint interval | 8-GPU vs 16k-GPU |
|
||
| 08 | lab_08_fleet_orch.py | fleet_orchestration.qmd | Utilization vs queue latency | FIFO vs priority sched |
|
||
| 09 | lab_09_perf_engr.py | performance_engineering.qmd | Profile-guided optimization, Amdahl | Batch vs streaming |
|
||
| 10 | lab_10_dist_inference.py | inference.qmd | KV-cache memory, continuous batching | Latency vs throughput |
|
||
| 11 | lab_11_edge_intelligence.py | edge_intelligence.qmd | Federated learning communication cost | Centralized vs federated |
|
||
| 12 | lab_12_ops_scale.py | ops_scale.qmd | SLO budget allocation, cascading failure | K8s vs bare metal |
|
||
| 13 | lab_13_security_privacy.py | security_privacy.qmd | Differential privacy ε-δ tradeoff | On-prem vs cloud |
|
||
| 14 | lab_14_robust_ai.py | robust_ai.qmd | Adversarial robustness vs accuracy | Production vs hardened |
|
||
| 15 | lab_15_sustainable_ai.py | sustainable_ai.qmd | Jevons Paradox, carbon-aware scheduling | Coal region vs renewable |
|
||
| 16 | lab_16_responsible_ai.py | responsible_ai.qmd | Fairness metrics incompatibility | Accuracy vs equity |
|
||
| 17 | lab_17_ml_conclusion.py | conclusion.qmd | Synthesis: Vol1+Vol2 invariant audit | Full fleet |
|
||
|
||
---
|
||
|
||
## The Design Ledger Schema
|
||
|
||
Each lab saves exactly one `chNN` key. Downstream labs read prior keys.
|
||
|
||
```python
|
||
# Vol1 schema
|
||
ledger.save(chapter=N, design={
|
||
"context": "cloud" | "edge" | "mobile" | "tiny",
|
||
"act1_prediction": str, # the radio/number value student chose
|
||
"act1_correct": bool,
|
||
"act2_result": float, # key quantitative outcome
|
||
"act2_decision": str, # e.g. "quantize" | "prune" | "increase_batch"
|
||
"constraint_hit": bool, # did student trigger the failure state?
|
||
})
|
||
```
|
||
|
||
---
|
||
|
||
## What Good Looks Like — The Standard
|
||
|
||
Study `labs/vol1/lab_00_introduction.py` for:
|
||
- Header structure (dark gradient, constraint badges, time estimate)
|
||
- `mo.stop()` gating pattern
|
||
- `mo.callout(mo.md(...))` for all feedback
|
||
- `mo.ui.tabs()` for multi-section navigation
|
||
- Design Ledger HUD footer
|
||
|
||
The bar: if a student at Stanford in a graduate ML Systems course opened this lab,
|
||
they should feel that it is the most intellectually rigorous and well-crafted
|
||
interactive lab they have ever seen. Every slider range is justified by physics.
|
||
Every question is designed to produce productive failure. Every chart updates live.
|
||
|
||
---
|
||
|
||
## Import Reference (working paths, verified)
|
||
|
||
```python
|
||
from labs.core.state import DesignLedger # ✓ verified
|
||
from labs.core.style import COLORS, LAB_CSS, apply_plotly_theme # ✓ verified
|
||
from labs.core.components import MathPeek, MetricRow, ComparisonRow # ✓ verified
|
||
from mlsysim.core.hardware import Hardware # Cloud.H100, Edge.JetsonOrinNX, etc.
|
||
from mlsysim.core.models import Models # Language.Llama3_8B, Vision.ResNet50, etc.
|
||
from mlsysim.core.constants import ( # raw constants with units
|
||
H100_MEM_BW, H100_FLOPS_FP16_TENSOR, H100_TDP,
|
||
A100_MEM_BW, MOBILE_NPU_MEM_BW, ESP32_RAM,
|
||
)
|
||
```
|
||
|
||
Hardware constants for inline use (no pint units — plain floats):
|
||
```python
|
||
# Cloud
|
||
H100_BW_GBS = 3350 # GB/s
|
||
H100_TFLOPS_FP16 = 1979 # TFLOPS
|
||
H100_RAM_GB = 80 # GB HBM
|
||
H100_TDP_W = 700 # Watts
|
||
|
||
# Edge
|
||
ORIN_BW_GBS = 102 # GB/s
|
||
ORIN_TFLOPS = 100 # TFLOPS (INT8 equivalent)
|
||
ORIN_RAM_GB = 16 # GB
|
||
ORIN_TDP_W = 25 # Watts
|
||
|
||
# Mobile
|
||
MOBILE_BW_GBS = 68 # GB/s (Apple A17 class)
|
||
MOBILE_TOPS_INT8 = 35 # TOPS
|
||
MOBILE_RAM_GB = 8 # GB
|
||
MOBILE_TDP_W = 5 # Watts sustained
|
||
|
||
# TinyML
|
||
MCU_BW_GBS = 0.05 # GB/s
|
||
MCU_MFLOPS = 1 # MFLOPS (Cortex-M7)
|
||
MCU_SRAM_KB = 256 # KB
|
||
MCU_TDP_MW = 100 # milliwatts
|
||
```
|
||
|
||
---
|
||
|
||
## Syntax Verification
|
||
|
||
Before returning your output, mentally verify:
|
||
1. All `f"""..."""` strings with `{variable}` are proper f-strings (not `"""` without `f`)
|
||
2. No markdown `**text**` inside `mo.Html(...)` — use `mo.callout(mo.md(...))` instead
|
||
3. `mo.stop(condition, fallback_ui)` — condition is True when you WANT to stop
|
||
4. Every `@app.cell` function has `return` at the end (even if `return` returns nothing useful)
|
||
5. All widget variables returned from their defining cell are used in dependent cells
|
||
|
||
Run mentally: `python3 -c "import ast; ast.parse(open('your_file.py').read())"` — should be clean.
|