Covers: install, first demo, first homework, semester plan, auto-grading hints, and material inventory. Designed to minimize instructor adoption friction for ML systems courses.
4.4 KiB
Instructor Quick-Start Guide
Get mlsysim into your ML Systems course in 15 minutes.
The 15-Minute Adoption Path
Minute 0–3: Install and Verify
pip install mlsysim
python3 -c "import mlsysim; print(mlsysim.__version__)"
# Should print: 0.1.0
Minute 3–8: Your First Live Demo
Copy this into a Jupyter cell or Python script. This IS your first lecture demo:
import mlsysim
# "Is Llama-3 8B compute-bound or memory-bound on an H100?"
profile = mlsysim.Engine.solve(
mlsysim.Models.Language.Llama3_8B,
mlsysim.Hardware.Cloud.H100,
batch_size=1,
)
print(f"Bottleneck: {profile.bottleneck}") # → Memory
print(f"MFU: {profile.mfu:.3f}") # → 0.003 (nearly idle compute!)
print(f"Latency: {profile.latency:.2f}") # → 5.60 ms
# Now change batch_size to 256 and watch the bottleneck shift...
The teaching moment: Students predict "Compute-bound, because GPUs are fast." They run it and see "Memory-bound, MFU=0.3%." Their intuition breaks. You rebuild it with the Roofline model. That's the entire first lecture.
Minute 8–12: Your First Homework Problem
Problem: The H100 has 6.3× more FLOPS than the A100 (989 vs 156 TFLOPS FP16 dense). Use
Engine.solve()to compare Llama-3-8B inference at batch=1 on both. How much faster is H100? Why isn't it 6.3×?
Expected answer: ~1.7× faster. Memory-bound workload scales with bandwidth ratio (3.35/2.04 ≈ 1.64×), not FLOPS ratio. Students learn that advertising FLOPS means nothing for memory-bound workloads.
Minute 12–15: Plan Your Semester
| Week | Topic | mlsysim Exercise |
|---|---|---|
| 2 | Roofline Model | Batch-size sweep: find the compute↔memory crossover |
| 4 | LLM Serving | KV cache capacity: how many concurrent requests? |
| 6 | Quantization | Compress Llama-3 to INT4: what's the fleet impact? |
| 8 | Distributed Training | Scale from 8 to 256 GPUs: find the efficiency cliff |
| 10 | TCO & Carbon | Move training to Quebec: how much carbon saved? |
| 12 | Design Challenge | $5M budget, Llama-3 70B at 1000 QPS — design the fleet |
Each exercise takes 20–30 minutes of class time. Solutions are in exercises.md.
What Students Need
- Python 3.10+
pip install mlsysim- No GPU required. No cloud account. Everything runs on a laptop CPU.
- See prerequisites.md for detailed setup.
What You Get
| Material | File | Description |
|---|---|---|
| Tutorial slides | slides/tutorial_part1.tex + tutorial_part2.tex |
102 Beamer slides with speaker notes |
| 8 exercises | exercises.md |
Hands-on problems with expected answers |
| Cheat sheet | cheatsheet.md |
Single-page reference (Iron Law + key equations) |
| Pre/post quiz | assessment/quiz.md |
10-question assessment with distractor analysis |
| Backward design | DESIGN.md |
Learning goals, "aha moments," facilitation notes |
| 6 SVG figures | slides/images/svg/ |
Publication-quality diagrams (Roofline, AllReduce, etc.) |
Auto-Grading Hint
mlsysim returns typed Pydantic objects. You can auto-grade by checking:
# Student submits their analysis
result = mlsysim.Engine.solve(model, hardware, batch_size=student_batch)
# Auto-grade: check the bottleneck label
assert result.bottleneck == "Memory", "Expected Memory-bound at batch=1"
# Auto-grade: check MFU is in the right range
assert 0.001 < result.mfu < 0.01, f"MFU {result.mfu:.3f} outside expected range"
# Auto-grade: check feasibility
assert result.feasible, "Model should fit on this hardware"
This works with Gradescope, nbgrader, or any Python-based autograder.
The Pedagogical Framework
mlsysim is organized around the Iron Law of ML Systems:
Time = FLOPs / (N × Peak × MFU × η_scaling × Goodput)
Every concept in your course maps to one term in this equation. Every homework problem is about understanding which term is the bottleneck and how to improve it. The 22-wall taxonomy provides the vocabulary; the Iron Law provides the structure.
See laws-explained.md for plain-English explanations of all 22 constraints.
Getting Help
- Issues: github.com/harvard-edge/cs249r_book/issues (use the mlsysim template)
- Documentation: mlsysbook.ai/mlsysim
- Citation: See CITATION.cff in the package root