mirror of https://github.com/harvard-edge/cs249r_book.git synced 2026-03-09 07:15:51 -05:00

Files

Vijay Janapa Reddi 6f5732558f feat: add complete first-draft labs for both volumes (33 Marimo labs)

Add all Vol1 (labs 01-16) and Vol2 (labs 01-17) interactive Marimo labs
as the first full first-pass implementation of the ML Systems curriculum labs.

Each lab follows the PROTOCOL 2-Act structure (35-40 min):
- Act I: Calibration with prediction lock → instruments → overlay
- Act II: Design challenge with failure states and reflection

Key pedagogical instruments introduced progressively:
- Vol1: D·A·M Triad, Iron Law, Memory Ledger, Roofline, Amdahl's Law,
  Little's Law, P99 Histogram, Compression Frontier, Chouldechova theorem
- Vol2: NVLink vs PCIe cliff, Bisection BW, Young-Daly T*, Parallelism Paradox,
  AllReduce ring vs tree, KV-cache model, Jevons Paradox, DP ε-δ tradeoff,
  SLO composition, Adversarial Pareto, two-volume synthesis capstone

All 35 staged files pass AST syntax verification (36/36 including lab_00).

Also includes:
- labs/LABS_SPEC.md: authoritative sub-agent brief for all lab conventions
- labs/core/style.py: expanded unified design system with semantic color tokens

2026-03-01 19:59:04 -05:00

16 KiB

Raw Blame History

MLSys Labs — Sub-Agent Build Specification

Gold Standard: Every Lab, Both Volumes

READ THIS ENTIRE DOCUMENT BEFORE WRITING A SINGLE LINE OF CODE.

This spec overrides all earlier plan documents.

Who You Are

You are a specialist lab developer for the Machine Learning Systems two-volume textbook. Your job: write ONE complete, runnable Marimo lab (.py file) that is the gold standard of pedagogical interactive content. Think: the best CS lab you ever encountered, combined with a real engineering cockpit.

You are NOT writing a demo. You are writing a structured confrontation with physics.

The Non-Negotiable Rules (PROTOCOL invariants)

Rule 1: 2-Act structure, 35-40 minutes total

Act I  — Calibration (12-15 min)
  One prediction lock → instruments reveal → structured reflection
Act II — Design Challenge (20-25 min)
  One numeric/radio prediction → full instrument set → failure state → reflection

No 3-KAT format. No 45-minute labs. If you write 3 acts, you have failed.

Rule 2: Structured predictions only — never free text

Use mo.ui.radio(options={...}) — exactly 4 options, one correct
Or mo.ui.number(start=X, stop=Y, step=Z) — bounded numeric entry
Gate with mo.stop(prediction.value is None, mo.callout(mo.md("Select your prediction to continue."), kind="warn"))
AFTER the act: always show the prediction-vs-reality overlay with exact gap

Rule 3: Every check feedback uses mo.callout(mo.md(...))

NEVER inject markdown text into raw HTML strings. This renders bold as asterisks. Correct pattern:

mo.callout(mo.md("**Correct.** The explanation here with *italic* and **bold**."), kind="success")
mo.callout(mo.md("**Not quite.** The explanation here."), kind="warn")

Rule 4: At least one failure state in Act II

Every Act II must have an instrument that turns red / shows a banner when the student's design violates a physical constraint. The failure must be reversible.

_oom = memory_gb > device_ram_gb
if _oom:
    mo.callout(mo.md(f"🔴 **OOM — infeasible.** Required: {memory_gb:.1f} GB | Available: {device_ram_gb:.1f} GB"), kind="danger")

Rule 5: 2 deployment contexts as comparison toggle, NOT 4 narrative tracks

Each lab picks the 2 contexts most relevant to its chapter invariant:

Cloud: H100 (80 GB HBM, 3.35 TB/s BW, 700W TDP)
Edge: Jetson Orin NX (16 GB, 102 GB/s BW, 25W TDP)
Mobile: Smartphone NPU (8 GB, 68 GB/s BW, 5W sustained)
TinyML: Cortex-M7 (256 KB SRAM, 0.05 GB/s BW, 0.1W)

Toggle pattern:

context_toggle = mo.ui.radio(
    options={"☁️ Cloud (H100)": "cloud", "🤖 Edge (Jetson Orin NX)": "edge"},
    label="Deployment context:", inline=True
)

Rule 6: Zero instruments before their chapter introduction

Lab	First new instrument
01	Magnitude Gap slider, D·A·M comparison
02	Latency Waterfall
05	Memory Ledger, Activation Comparator
09	Pareto Curve
10	Compression Trade-off Frontier
11	Roofline Model
13	P99 Latency Histogram

Rule 7: Every number traces to a chapter claim

Never invent thresholds or slider ranges. Every value must come from the chapter text. Comment each constant with its source:

H100_BW_GBS = 3350  # H100 SXM5 HBM3e, NVIDIA spec
SRAM_WALL_KB = 256  # Cortex-M7 typical on-chip SRAM ceiling

Rule 8: hide_code=True on all cells except the setup cell

Students see outputs, not implementation. Every @app.cell decorator becomes: @app.cell(hide_code=True) Exception: the first imports cell — leave it visible so instructors can inspect.

Rule 9: All markdown feedback via mo.md(), all text in mo.callout()

The pattern for every concept explanation:

mo.callout(mo.md("**Key insight:** explanation with *emphasis* and `code` notation."), kind="info")

Rule 10: MathPeek accordion on every act

mo.accordion({
    "📐 The governing equation": mo.md("""
    **Formula:** `T = D/BW + O/R + L`
    - **T** — total latency ...
    """)
})

File Structure Template

import marimo
__generated_with = "0.19.6"
app = marimo.App(width="full")

# ─── CELL 0: SETUP (hide_code=False — leave visible) ───────────────────────
@app.cell
def _():
    import marimo as mo
    import sys
    from pathlib import Path
    import plotly.graph_objects as go
    import numpy as np

    _root = Path(__file__).resolve().parents[2]
    if str(_root) not in sys.path:
        sys.path.insert(0, str(_root))

    from labs.core.state import DesignLedger
    from labs.core.style import COLORS, LAB_CSS, apply_plotly_theme
    from mlsysim.core.hardware import Hardware
    from mlsysim.core.models import Models

    ledger = DesignLedger()
    return mo, ledger, COLORS, LAB_CSS, apply_plotly_theme, Hardware, Models, go, np

# ─── CELL 1: HEADER (hide_code=True) ────────────────────────────────────────
@app.cell(hide_code=True)
def _(mo, LAB_CSS, ledger):
    # Dark gradient header with constraint badges
    # See lab_00_introduction.py for reference

# ─── CELL 2: RECOMMENDED READING (hide_code=True) ───────────────────────────
@app.cell(hide_code=True)
def _(mo):
    mo.callout(mo.md("""
    📖 **Recommended Reading** — Complete the following chapter sections before this lab:
    - Section X: [Topic] — [one-line description of what to read]
    - Section Y: [Topic] — [one-line description]
    """), kind="info")

# ─── CELL 3: CONTEXT TOGGLE + LOAD LEDGER (hide_code=True) ─────────────────
@app.cell(hide_code=True)
def _(mo, ledger):
    # 2-context comparison toggle
    # Load deployment context from Design Ledger

# ─── ACT I CELLS ─────────────────────────────────────────────────────────────
# Concept intro → prediction lock → instruments → reveal → reflection → MathPeek

# ─── ACT II CELLS ────────────────────────────────────────────────────────────
# Design challenge intro → prediction → instruments → failure state → reflection

# ─── LEDGER SAVE + HUD (hide_code=True) ─────────────────────────────────────
@app.cell(hide_code=True)
def _(mo, ledger, COLORS):
    # Save chapter results to Design Ledger
    # Render HUD footer

if __name__ == "__main__":
    app.run()

Design Language (CSS Classes from labs/core/style.py)

# Import once in setup cell:
from labs.core.style import COLORS, LAB_CSS, apply_plotly_theme

# Color tokens:
COLORS['BlueLine']   # #006395  primary data
COLORS['GreenLine']  # #008F45  success / target met
COLORS['RedLine']    # #CB202D  failure / violation
COLORS['OrangeLine'] # #CC5500  warning / caution

# Deployment regime accent colors:
COLORS['Cloud']  # #6366f1  indigo
COLORS['Edge']   # #CB202D  red
COLORS['Mobile'] # #CC5500  orange
COLORS['Tiny']   # #008F45  green

Constraint badge HTML pattern (use in header):

<span class="badge badge-ok">✅ Latency < 100ms</span>
<span class="badge badge-fail">❌ Power > Budget</span>

The Stakeholder Message Pattern

Every lab opens Act I with a stakeholder message that sets the scenario:

_color = COLORS["BlueLine"]  # or regime-specific color
mo.Html(f"""
<div style="border-left:4px solid {_color}; background:{COLORS['BlueL']};
            border-radius:0 10px 10px 0; padding:16px 22px; margin:12px 0;">
    <div style="font-size:0.72rem; font-weight:700; color:{_color};
                text-transform:uppercase; letter-spacing:0.1em; margin-bottom:6px;">
        Incoming Message · [Persona Title]
    </div>
    <div style="font-style:italic; font-size:1.0rem; color:#1e293b; line-height:1.65;">
        "[Specific, quantified, urgent message from a named stakeholder]"
    </div>
</div>
""")

The Prediction-vs-Reality Overlay Pattern

After Act I instruments run, always show:

_predicted = {"option_a": 10, "option_b": 100, "option_c": 1000}[act1_pred.value]
_actual = computed_value  # from physics engine
_ratio = _actual / _predicted if _predicted > 0 else float('inf')
mo.callout(mo.md(
    f"**You predicted {_predicted:,}. The actual value is {_actual:,.0f}. "
    f"You were off by {_ratio:.1f}×.** [One sentence explaining the gap.]"
), kind="success" if abs(_ratio - 1) < 0.3 else "warn")

Volume 1 Lab Assignments

Lab	File to create	Chapter	Core Invariant	2 Contexts
01	lab_01_ml_intro.py	introduction.qmd	D·A·M Triad, 9-order magnitude gap	Cloud vs TinyML
02	lab_02_ml_systems.py	ml_systems.qmd	Iron Law T=D/BW+O/R+L, Memory Wall	Cloud vs Edge
03	lab_03_ml_workflow.py	ml_workflow.qmd	MLOps feedback loop, silent degradation	Cloud vs Mobile
04	lab_04_data_engr.py	data_engineering.qmd	Data gravity, pipeline bottlenecks	Cloud vs Edge
05	lab_05_nn_compute.py	nn_computation.qmd	Activation cost, memory hierarchy	Cloud vs Mobile
06	lab_06_nn_arch.py	nn_architectures.qmd	Transformer attention O(n²), depth vs width	Cloud vs Edge
07	lab_07_ml_frameworks.py	frameworks.qmd	Kernel fusion, dispatch overhead	Cloud vs Edge
08	lab_08_model_train.py	training.qmd	Memory = weights+grads+optimizer+activations	Cloud vs Mobile
09	lab_09_data_selection.py	data_selection.qmd	Curriculum learning, selection cost	Cloud vs Edge
10	lab_10_model_compress.py	optimizations.qmd (model_compression)	Quantization/pruning Pareto frontier	Cloud vs Mobile
11	lab_11_hw_accel.py	hw_acceleration.qmd	Roofline Model, ridge point, MFU	Cloud vs Edge
12	lab_12_perf_bench.py	benchmarking.qmd	Benchmark validity, Amdahl's Law	Cloud vs Edge
13	lab_13_model_serving.py	model_serving.qmd	Little's Law, P99 vs avg latency	Cloud vs Mobile
14	lab_14_ml_ops.py	ml_ops.qmd	Drift detection, retraining cost	Cloud vs Edge
15	lab_15_responsible_engr.py	responsible_engr.qmd	Fairness-accuracy tradeoff, audit cost	Cloud vs Mobile
16	lab_16_ml_conclusion.py	conclusion.qmd	Synthesis: all invariants, cross-lab ledger	All 4

Volume 2 Lab Assignments

Lab	File to create	Chapter	Core Invariant	2 Contexts
01	lab_01_introduction.py	introduction.qmd	Scale laws: single-node → fleet	Cloud vs Fleet
02	lab_02_compute_infra.py	compute_infrastructure.qmd	NVLink vs PCIe BW, interconnect wall	Single-node vs Multi-node
03	lab_03_network_fabrics.py	network_fabrics.qmd	Bisection BW, fat-tree topology	8-GPU vs 1024-GPU
04	lab_04_data_storage.py	data_storage.qmd	Data gravity, I/O bottleneck	NVMe vs distributed FS
05	lab_05_dist_train.py	distributed_training.qmd	Parallelism Paradox, MFU at scale	DP vs 3D-Parallel
06	lab_06_collective_comms.py	collective_communication.qmd	AllReduce bandwidth, ring vs tree	Ring vs Tree topology
07	lab_07_fault_tolerance.py	fault_tolerance.qmd	Young-Daly optimal checkpoint interval	8-GPU vs 16k-GPU
08	lab_08_fleet_orch.py	fleet_orchestration.qmd	Utilization vs queue latency	FIFO vs priority sched
09	lab_09_perf_engr.py	performance_engineering.qmd	Profile-guided optimization, Amdahl	Batch vs streaming
10	lab_10_dist_inference.py	inference.qmd	KV-cache memory, continuous batching	Latency vs throughput
11	lab_11_edge_intelligence.py	edge_intelligence.qmd	Federated learning communication cost	Centralized vs federated
12	lab_12_ops_scale.py	ops_scale.qmd	SLO budget allocation, cascading failure	K8s vs bare metal
13	lab_13_security_privacy.py	security_privacy.qmd	Differential privacy ε-δ tradeoff	On-prem vs cloud
14	lab_14_robust_ai.py	robust_ai.qmd	Adversarial robustness vs accuracy	Production vs hardened
15	lab_15_sustainable_ai.py	sustainable_ai.qmd	Jevons Paradox, carbon-aware scheduling	Coal region vs renewable
16	lab_16_responsible_ai.py	responsible_ai.qmd	Fairness metrics incompatibility	Accuracy vs equity
17	lab_17_ml_conclusion.py	conclusion.qmd	Synthesis: Vol1+Vol2 invariant audit	Full fleet

The Design Ledger Schema

Each lab saves exactly one chNN key. Downstream labs read prior keys.

# Vol1 schema
ledger.save(chapter=N, design={
    "context":        "cloud" | "edge" | "mobile" | "tiny",
    "act1_prediction": str,    # the radio/number value student chose
    "act1_correct":   bool,
    "act2_result":    float,   # key quantitative outcome
    "act2_decision":  str,     # e.g. "quantize" | "prune" | "increase_batch"
    "constraint_hit": bool,    # did student trigger the failure state?
})

What Good Looks Like — The Standard

Study labs/vol1/lab_00_introduction.py for:

Header structure (dark gradient, constraint badges, time estimate)
mo.stop() gating pattern
mo.callout(mo.md(...)) for all feedback
mo.ui.tabs() for multi-section navigation
Design Ledger HUD footer

The bar: if a student at Stanford in a graduate ML Systems course opened this lab, they should feel that it is the most intellectually rigorous and well-crafted interactive lab they have ever seen. Every slider range is justified by physics. Every question is designed to produce productive failure. Every chart updates live.

Import Reference (working paths, verified)

from labs.core.state import DesignLedger       # ✓ verified
from labs.core.style import COLORS, LAB_CSS, apply_plotly_theme  # ✓ verified
from labs.core.components import MathPeek, MetricRow, ComparisonRow  # ✓ verified
from mlsysim.core.hardware import Hardware     # Cloud.H100, Edge.JetsonOrinNX, etc.
from mlsysim.core.models import Models         # Language.Llama3_8B, Vision.ResNet50, etc.
from mlsysim.core.constants import (           # raw constants with units
    H100_MEM_BW, H100_FLOPS_FP16_TENSOR, H100_TDP,
    A100_MEM_BW, MOBILE_NPU_MEM_BW, ESP32_RAM,
)

Hardware constants for inline use (no pint units — plain floats):

# Cloud
H100_BW_GBS      = 3350   # GB/s
H100_TFLOPS_FP16 = 1979   # TFLOPS
H100_RAM_GB      = 80     # GB HBM
H100_TDP_W       = 700    # Watts

# Edge
ORIN_BW_GBS      = 102    # GB/s
ORIN_TFLOPS      = 100    # TFLOPS (INT8 equivalent)
ORIN_RAM_GB      = 16     # GB
ORIN_TDP_W       = 25     # Watts

# Mobile
MOBILE_BW_GBS    = 68     # GB/s (Apple A17 class)
MOBILE_TOPS_INT8 = 35     # TOPS
MOBILE_RAM_GB    = 8      # GB
MOBILE_TDP_W     = 5      # Watts sustained

# TinyML
MCU_BW_GBS       = 0.05   # GB/s
MCU_MFLOPS       = 1      # MFLOPS (Cortex-M7)
MCU_SRAM_KB      = 256    # KB
MCU_TDP_MW       = 100    # milliwatts

Syntax Verification

Before returning your output, mentally verify:

All f"""...""" strings with {variable} are proper f-strings (not """ without f)
No markdown **text** inside mo.Html(...) — use mo.callout(mo.md(...)) instead
mo.stop(condition, fallback_ui) — condition is True when you WANT to stop
Every @app.cell function has return at the end (even if return returns nothing useful)
All widget variables returned from their defining cell are used in dependent cells

Run mentally: python3 -c "import ast; ast.parse(open('your_file.py').read())" — should be clean.

16 KiB Raw Blame History Unescape Escape