Files
cs249r_book/mlsysim/tutorial/instructor-quickstart.md
Vijay Janapa Reddi f62dc8cca2 docs(tutorial): instructor quick-start guide — 15 min adoption path
Covers: install, first demo, first homework, semester plan, auto-grading
hints, and material inventory. Designed to minimize instructor adoption
friction for ML systems courses.
2026-04-01 23:19:21 -04:00

126 lines
4.4 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Instructor Quick-Start Guide
> **Get mlsysim into your ML Systems course in 15 minutes.**
---
## The 15-Minute Adoption Path
### Minute 03: Install and Verify
```bash
pip install mlsysim
python3 -c "import mlsysim; print(mlsysim.__version__)"
# Should print: 0.1.0
```
### Minute 38: Your First Live Demo
Copy this into a Jupyter cell or Python script. This IS your first lecture demo:
```python
import mlsysim
# "Is Llama-3 8B compute-bound or memory-bound on an H100?"
profile = mlsysim.Engine.solve(
mlsysim.Models.Language.Llama3_8B,
mlsysim.Hardware.Cloud.H100,
batch_size=1,
)
print(f"Bottleneck: {profile.bottleneck}") # → Memory
print(f"MFU: {profile.mfu:.3f}") # → 0.003 (nearly idle compute!)
print(f"Latency: {profile.latency:.2f}") # → 5.60 ms
# Now change batch_size to 256 and watch the bottleneck shift...
```
**The teaching moment:** Students predict "Compute-bound, because GPUs are fast."
They run it and see "Memory-bound, MFU=0.3%." Their intuition breaks. You rebuild
it with the Roofline model. That's the entire first lecture.
### Minute 812: Your First Homework Problem
> **Problem:** The H100 has 6.3× more FLOPS than the A100 (989 vs 156 TFLOPS FP16 dense).
> Use `Engine.solve()` to compare Llama-3-8B inference at batch=1 on both.
> How much faster is H100? Why isn't it 6.3×?
**Expected answer:** ~1.7× faster. Memory-bound workload scales with bandwidth
ratio (3.35/2.04 ≈ 1.64×), not FLOPS ratio. Students learn that advertising
FLOPS means nothing for memory-bound workloads.
### Minute 1215: Plan Your Semester
| Week | Topic | mlsysim Exercise |
|------|-------|-----------------|
| 2 | Roofline Model | Batch-size sweep: find the compute↔memory crossover |
| 4 | LLM Serving | KV cache capacity: how many concurrent requests? |
| 6 | Quantization | Compress Llama-3 to INT4: what's the fleet impact? |
| 8 | Distributed Training | Scale from 8 to 256 GPUs: find the efficiency cliff |
| 10 | TCO & Carbon | Move training to Quebec: how much carbon saved? |
| 12 | Design Challenge | $5M budget, Llama-3 70B at 1000 QPS — design the fleet |
Each exercise takes 2030 minutes of class time. Solutions are in
[exercises.md](exercises.md).
---
## What Students Need
- Python 3.10+
- `pip install mlsysim`
- No GPU required. No cloud account. Everything runs on a laptop CPU.
- See [prerequisites.md](prerequisites.md) for detailed setup.
## What You Get
| Material | File | Description |
|----------|------|-------------|
| Tutorial slides | `slides/tutorial_part1.tex` + `tutorial_part2.tex` | 102 Beamer slides with speaker notes |
| 8 exercises | `exercises.md` | Hands-on problems with expected answers |
| Cheat sheet | `cheatsheet.md` | Single-page reference (Iron Law + key equations) |
| Pre/post quiz | `assessment/quiz.md` | 10-question assessment with distractor analysis |
| Backward design | `DESIGN.md` | Learning goals, "aha moments," facilitation notes |
| 6 SVG figures | `slides/images/svg/` | Publication-quality diagrams (Roofline, AllReduce, etc.) |
## Auto-Grading Hint
mlsysim returns typed Pydantic objects. You can auto-grade by checking:
```python
# Student submits their analysis
result = mlsysim.Engine.solve(model, hardware, batch_size=student_batch)
# Auto-grade: check the bottleneck label
assert result.bottleneck == "Memory", "Expected Memory-bound at batch=1"
# Auto-grade: check MFU is in the right range
assert 0.001 < result.mfu < 0.01, f"MFU {result.mfu:.3f} outside expected range"
# Auto-grade: check feasibility
assert result.feasible, "Model should fit on this hardware"
```
This works with Gradescope, nbgrader, or any Python-based autograder.
## The Pedagogical Framework
mlsysim is organized around the **Iron Law of ML Systems**:
```
Time = FLOPs / (N × Peak × MFU × η_scaling × Goodput)
```
Every concept in your course maps to one term in this equation. Every homework
problem is about understanding which term is the bottleneck and how to improve it.
The 22-wall taxonomy provides the vocabulary; the Iron Law provides the structure.
See [laws-explained.md](../docs/laws-explained.md) for plain-English explanations
of all 22 constraints.
## Getting Help
- **Issues:** [github.com/harvard-edge/cs249r_book/issues](https://github.com/harvard-edge/cs249r_book/issues) (use the mlsysim template)
- **Documentation:** [mlsysbook.ai/mlsysim](https://mlsysbook.ai/mlsysim)
- **Citation:** See CITATION.cff in the package root