Files
cs249r_book/mlsysim/docs/for-instructors.qmd
Vijay Janapa Reddi 85a58c65c2 fix(slides): repair blank-pages and Vol1/Vol2 collision in release PDFs
Two issues caused the deployed slide PDFs to be unusable:

1. Every chapter .tex declared `\setsansfont{Helvetica Neue}` — proprietary
   to Apple, not installed on the Ubuntu CI runner. xelatex bombed mid-frame,
   the workflow's `|| true` swallowed the error, and the resulting PDF had
   most text never typeset (blank pages with only logos/rules surviving).
   Switch all 35 decks to TeX Gyre Heros (sans) and TeX Gyre Cursor (mono),
   both bundled with texlive-fonts-extra — no external font downloads needed.
   Drop the JetBrains Mono wget step and fonts-liberation from both slide
   workflows accordingly.

2. Vol1 and Vol2 each ship `00_course_overview.pdf` and `01_introduction.pdf`.
   The publish workflow uploaded them to a flat GitHub Release namespace, so
   the second upload silently overwrote the first — clicking Vol I's Course
   Overview actually downloaded Vol II's deck. Stage prefixed copies
   (vol1_*.pdf, vol2_*.pdf) before upload, and update slides/vol{1,2}.qmd
   plus the mlsysim cross-links to point at the new prefixed URLs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 08:35:11 -04:00

179 lines
13 KiB
Plaintext

---
title: "For Instructors"
subtitle: "Reproducible, hardware-independent exercises — paired with 35 lecture decks and 266 diagrams."
---
MLSYSIM provides a framework for assigning analytically grounded problem sets where every answer is deterministic and reproducible — regardless of what hardware your students have access to. Combined with the companion [lecture slides](https://mlsysbook.ai/slides/), it forms a complete teaching toolkit for ML systems courses.
---
## Why MLSYSIM for Teaching?
| Challenge | How MLSYSIM Helps |
|:----------|:------------------|
| Students lack GPU access | All analysis runs on a laptop — no cloud credits needed |
| Homework answers vary by hardware | Vetted registry specs produce identical results everywhere |
| Hard to grade open-ended systems questions | Analytical solvers give deterministic, verifiable outputs |
| Specifications become stale | Registry updated from official datasheets; one update propagates everywhere |
| Students memorize without understanding | "Predict first" exercises build genuine intuition |
| No time to build slides from scratch | 35 Beamer decks with speaker notes, active learning, and SVG diagrams ready to use |
---
## The Teaching Ecosystem
MLSYSIM is one component of a larger open teaching toolkit:
| Resource | What It Provides | Link |
|:---------|:-----------------|:-----|
| **Textbook** | Two-volume open textbook — foundations (Vol I) and scale (Vol II) | [mlsysbook.ai](https://mlsysbook.ai) |
| **Lecture Slides** | 35 Beamer decks, 1,099 slides, 266 SVG diagrams, speaker notes on every slide | [Slides Portal](https://mlsysbook.ai/slides/) |
| **MLSYSIM** | 6 analytical solvers, typed hardware registry, deterministic assignments | [Getting Started](getting-started.qmd) |
| **TinyML Courseware** | 4-course sequence with 178 slide decks for embedded ML | [TinyML Slides](https://mlsysbook.ai/slides/tinyml.html) |
| **Teaching Guide** | 16-week semester plans, active learning taxonomy, customization guide | [Teaching Guide](https://mlsysbook.ai/slides/teaching.html) |
---
## Course Integration Patterns
### Pattern 1 — Textbook Companion (Full Semester)
Map MLSYSIM tutorials and assignments directly to textbook chapters and lecture decks. The table below shows one possible 16-week arrangement using Volume I slides.
| Week | Lecture Slides | Textbook Topic | MLSYSIM Assignment |
|:-----|:---------------|:---------------|:-------------------|
| 2 | [Introduction](https://github.com/harvard-edge/cs249r_book/releases/download/slides-latest/vol1_01_introduction.pdf) | The Iron Law of ML Systems | Read [Hello, Roofline](tutorials/00_hello_roofline.qmd) warmup — identify bottleneck equation |
| 5 | [NN Computation](https://github.com/harvard-edge/cs249r_book/releases/download/slides-latest/vol1_05_nn_computation.pdf) | FLOPs, memory footprint | [Hello, Roofline](tutorials/00_hello_roofline.qmd) — roofline analysis, batch size sweep |
| 8 | [Model Training](https://github.com/harvard-edge/cs249r_book/releases/download/slides-latest/vol1_08_training.pdf) | Training memory budget | [Solver Guide](solver-guide.qmd) — TrainingStateSolver, ZeRO stages |
| 11 | [HW Acceleration](https://github.com/harvard-edge/cs249r_book/releases/download/slides-latest/vol1_11_hw_acceleration.pdf) | Roofline model, accelerator comparison | Hardware comparison assignment (see below) |
| 13 | [Model Serving](https://github.com/harvard-edge/cs249r_book/releases/download/slides-latest/vol1_13_model_serving.pdf) | TTFT, ITL, KV-cache | [Two Phases of Inference](tutorials/02_two_phases.qmd) — serving latency analysis |
For a **Volume II** course on distributed systems:
| Week | Lecture Slides | Textbook Topic | MLSYSIM Assignment |
|:-----|:---------------|:---------------|:-------------------|
| 3 | [Compute Infrastructure](https://github.com/harvard-edge/cs249r_book/releases/download/slides-latest/vol2_02_compute_infrastructure.pdf) | GPU clusters, interconnects | TCO analysis with EconomicsModel |
| 5 | [Distributed Training](https://github.com/harvard-edge/cs249r_book/releases/download/slides-latest/vol2_05_distributed_training.pdf) | 3D parallelism, scaling | [Scaling to 1000 GPUs](tutorials/06_scaling_1000_gpus.qmd) — parallelism strategies |
| 7 | [Fault Tolerance](https://github.com/harvard-edge/cs249r_book/releases/download/slides-latest/vol2_07_fault_tolerance.pdf) | Checkpointing, MTBF | ReliabilityModel — Young-Daly checkpoint interval |
| 10 | [Performance Engineering](https://github.com/harvard-edge/cs249r_book/releases/download/slides-latest/vol2_10_performance_engineering.pdf) | Profiling, optimization | Multi-solver composition (see capstone ideas below) |
| 15 | [Sustainable AI](https://github.com/harvard-edge/cs249r_book/releases/download/slides-latest/vol2_15_sustainable_ai.pdf) | Energy, carbon, water | [Geography is a Systems Variable](tutorials/07_geography.qmd) — carbon footprint |
::: {.callout-tip}
## Semester Plans
The [Teaching Guide](https://mlsysbook.ai/slides/teaching.html#suggested-semester-plans) provides complete 16-week schedules for Volume I, Volume II, and a combined 32-week sequence — with timing estimates for every deck.
:::
### Pattern 2 — Standalone Labs
Use individual tutorials as self-contained lab assignments in any systems course. Each tutorial includes exercises with clear expected outputs:
| Tutorial | Duration | Key Concepts | Pairs With Slides |
|:---------|:---------|:-------------|:------------------|
| [Hello, Roofline](tutorials/00_hello_roofline.qmd) | 15 min | Roofline model, memory vs. compute bound | [HW Acceleration](https://github.com/harvard-edge/cs249r_book/releases/download/slides-latest/vol1_11_hw_acceleration.pdf) |
| [Geography is a Systems Variable](tutorials/07_geography.qmd) | 20 min | Energy, carbon footprint, regional grids | [Sustainable AI](https://github.com/harvard-edge/cs249r_book/releases/download/slides-latest/vol2_15_sustainable_ai.pdf) |
| [Two Phases of Inference](tutorials/02_two_phases.qmd) | 25 min | TTFT vs. ITL, KV-cache pressure | [Model Serving](https://github.com/harvard-edge/cs249r_book/releases/download/slides-latest/vol1_13_model_serving.pdf) |
| [Scaling to 1000 GPUs](tutorials/06_scaling_1000_gpus.qmd) | 30 min | Data/tensor/pipeline parallelism | [Distributed Training](https://github.com/harvard-edge/cs249r_book/releases/download/slides-latest/vol2_05_distributed_training.pdf) |
### Pattern 3 — Capstone Projects
Advanced students compose multiple solvers to answer research-style questions. See [Writing a Custom Solver](solver-guide.qmd#writing-a-custom-solver) for the custom solver API.
---
## Assignment Ideas
### Homework: Hardware Comparison (30 min)
> Using `Engine.solve()`, compare ResNet-50 inference latency on the A100, H100, and Jetson AGX at batch sizes 1, 32, and 256. For each configuration, state whether the workload is memory-bound or compute-bound and explain why the bottleneck shifts with batch size.
**Pairs with**: [HW Acceleration slides](https://github.com/harvard-edge/cs249r_book/releases/download/slides-latest/vol1_11_hw_acceleration.pdf) (roofline model, ridge point) and [Benchmarking slides](https://github.com/harvard-edge/cs249r_book/releases/download/slides-latest/vol1_12_benchmarking.pdf) (measurement methodology).
### Homework: Training Memory Budget (30 min)
> Using the TrainingStateSolver, calculate the memory required to train GPT-2 (1.5B parameters) in FP16 with Adam optimizer under ZeRO Stage 0, Stage 1, and Stage 3. Explain *why* each stage reduces memory and what trade-off it introduces.
**Pairs with**: [Model Training slides](https://github.com/harvard-edge/cs249r_book/releases/download/slides-latest/vol1_08_training.pdf) and [Distributed Training slides](https://github.com/harvard-edge/cs249r_book/releases/download/slides-latest/vol2_05_distributed_training.pdf).
### Lab: Carbon-Aware Training (45 min)
> Using the SustainabilityModel, calculate the carbon footprint of training GPT-3 on a 256-GPU H100 cluster in Quebec vs. US Average vs. Poland. Produce a table and a 2-paragraph analysis of why datacenter location matters more than hardware choice for carbon.
**Pairs with**: [Sustainable AI slides](https://github.com/harvard-edge/cs249r_book/releases/download/slides-latest/vol2_15_sustainable_ai.pdf) (grid carbon intensity, PUE).
### Lab: LLM Serving Capacity Planning (45 min)
> Using the ServingModel, determine the maximum sequence length at which Llama-3.1-70B can serve a single request on an 8-GPU H100 node without exceeding memory. Then calculate TTFT and ITL at sequence lengths of 1K, 4K, and 16K tokens. At what point does KV-cache pressure dominate?
**Pairs with**: [Model Serving slides](https://github.com/harvard-edge/cs249r_book/releases/download/slides-latest/vol1_13_model_serving.pdf) and [Inference at Scale slides](https://github.com/harvard-edge/cs249r_book/releases/download/slides-latest/vol2_09_inference.pdf).
### Exam Question: Back-of-Envelope
> A GPU has 1,979 TFLOP/s peak compute (FP16) and 3.35 TB/s memory bandwidth. (a) What is the ridge point in FLOP/Byte? (b) A model layer has arithmetic intensity of 50 FLOP/Byte — is it compute-bound or memory-bound? (c) Another layer has arithmetic intensity of 400 FLOP/Byte — which regime is it in, and what does that imply about the benefit of moving to a GPU with 2x the bandwidth? Show your work.
**Pairs with**: [HW Acceleration slides](https://github.com/harvard-edge/cs249r_book/releases/download/slides-latest/vol1_11_hw_acceleration.pdf) (roofline model, ridge point derivation).
### Capstone: Multi-Solver Design Study (1 week)
> Design a training cluster for a 70B-parameter model. Use the DistributedModel to select a parallelism strategy, the EconomicsModel for TCO over 6 months, the SustainabilityModel to compare three datacenter locations, and the ReliabilityModel to determine checkpoint frequency. Present your analysis as a 3-page technical memo with quantitative justification for each decision.
**Pairs with**: the full [Volume II slide set](https://mlsysbook.ai/slides/vol2.html) — infrastructure, training, fault tolerance, and sustainability.
---
## Grading Notes
Because MLSYSIM produces deterministic output from vetted specifications:
- **Answer keys are stable** — the same `mlsysim` version produces identical numbers for every student, every semester
- **Partial credit is straightforward** — grade the reasoning (which solver, which inputs, which bottleneck explanation), not just the number
- **"Predict first" questions are easy to assess** — students submit their prediction *before* running code; compare the two for a conceptual understanding score
::: {.callout-note}
## Version Pinning
Pin the version in your assignment instructions (`pip install mlsysim==0.1.0`) so answer keys remain valid even after new releases update specifications.
:::
---
## Reproducibility Guarantee
All specifications in the [MLSys Zoo](zoo/index.qmd) are:
- **Sourced** from official manufacturer datasheets and published benchmarks
- **Typed** with `pint.Quantity` for dimensional correctness — unit errors are caught at runtime
- **Frozen** per release — `mlsysim==0.1.0` always produces the same answers
This means your answer key works for every student, every semester.
---
## Jupyter & Quarto Compatibility
All tutorials run in:
- **Jupyter Notebooks** — standard `.ipynb` workflow
- **Quarto documents** — render to HTML, PDF, or slides with `quarto render`
- **Google Colab** — `pip install mlsysim` in the first cell, then go
No GPU runtime required. CPU-only environments work perfectly because MLSYSIM computes from equations, not empirical profiling.
---
## Getting Started
1. Point students to the [Getting Started](getting-started.qmd) guide for installation
2. Assign the [Hello, Roofline](tutorials/00_hello_roofline.qmd) tutorial as a warmup
3. Browse the [Solver Guide](solver-guide.qmd) to select solvers for your course topics
4. Pair each assignment with the relevant [lecture slides](https://mlsysbook.ai/slides/) for classroom context
5. Use the [MLSys Zoo](zoo/index.qmd) for available hardware, model, and infrastructure specifications
---
## Related Resources
- **[Solver Guide](solver-guide.qmd)** — which solver maps to which topic
- **[Math Foundations](math.qmd)** — all equations, for your own reference and exam prep
- **[Accuracy & Validation](accuracy.qmd)** — how close are analytical estimates to empirical results?
- **[Paper PDF](mlsysim-paper.pdf)** — the MLSys·im paper describing the framework design and validation
- **[Lecture Slides Portal](https://mlsysbook.ai/slides/)** — 35 Beamer decks with speaker notes and active learning
- **[Teaching Guide](https://mlsysbook.ai/slides/teaching.html)** — semester plans, customization, and the active learning taxonomy