mirror of
https://github.com/harvard-edge/cs249r_book.git
synced 2026-05-22 22:33:28 -05:00
Two issues caused the deployed slide PDFs to be unusable:
1. Every chapter .tex declared `\setsansfont{Helvetica Neue}` — proprietary
to Apple, not installed on the Ubuntu CI runner. xelatex bombed mid-frame,
the workflow's `|| true` swallowed the error, and the resulting PDF had
most text never typeset (blank pages with only logos/rules surviving).
Switch all 35 decks to TeX Gyre Heros (sans) and TeX Gyre Cursor (mono),
both bundled with texlive-fonts-extra — no external font downloads needed.
Drop the JetBrains Mono wget step and fonts-liberation from both slide
workflows accordingly.
2. Vol1 and Vol2 each ship `00_course_overview.pdf` and `01_introduction.pdf`.
The publish workflow uploaded them to a flat GitHub Release namespace, so
the second upload silently overwrote the first — clicking Vol I's Course
Overview actually downloaded Vol II's deck. Stage prefixed copies
(vol1_*.pdf, vol2_*.pdf) before upload, and update slides/vol{1,2}.qmd
plus the mlsysim cross-links to point at the new prefixed URLs.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
146 lines
8.7 KiB
Plaintext
146 lines
8.7 KiB
Plaintext
---
|
|
title: "For Students"
|
|
subtitle: "Build intuition for ML systems -- without needing GPU hardware."
|
|
---
|
|
|
|
Whether you are taking your first ML systems course or preparing for industry interviews, MLSYSIM lets you experiment with real hardware specifications and see exactly *why* systems behave the way they do. Every number comes from a real datasheet. Every equation is grounded in peer-reviewed literature.
|
|
|
|
---
|
|
|
|
## What You Will Learn
|
|
|
|
By working through the MLSYSIM tutorials and exercises, you will:
|
|
|
|
- **Identify bottlenecks** -- Determine whether a workload is memory-bound or compute-bound on any hardware, and understand *why*
|
|
- **Reason quantitatively** -- Use real datasheet numbers (not made-up examples) to calculate latency, throughput, and cost
|
|
- **Build systems intuition** -- See how batch size, precision, parallelism strategy, and datacenter location each affect performance
|
|
- **Think across the stack** -- Connect workload characteristics to hardware specs to infrastructure constraints
|
|
|
|
---
|
|
|
|
## Prerequisites
|
|
|
|
- **Python**: Comfortable with functions, loops, and f-strings
|
|
- **Math**: Basic algebra (no calculus required -- all solver equations are arithmetic)
|
|
- **ML**: Familiarity with terms like "model parameters," "inference," and "training" (the [Glossary](glossary.qmd) defines everything else)
|
|
|
|
No GPU, no cloud account, no special hardware required. Just:
|
|
|
|
```bash
|
|
pip install mlsysim
|
|
```
|
|
|
|
See the [Getting Started](getting-started.qmd) guide for development installs and Colab/Binder options.
|
|
|
|
---
|
|
|
|
## Quick Start
|
|
|
|
```python
|
|
import mlsysim
|
|
from mlsysim import Engine
|
|
|
|
# Load a model and hardware from the vetted registry
|
|
model = mlsysim.Models.ResNet50
|
|
gpu = mlsysim.Hardware.Cloud.A100
|
|
|
|
# Solve: is this workload memory-bound or compute-bound?
|
|
profile = Engine.solve(model=model, hardware=gpu, batch_size=1, precision="fp16")
|
|
|
|
print(f"Bottleneck: {profile.bottleneck}") # → Memory
|
|
print(f"Latency: {profile.latency.to('ms'):~.2f}")
|
|
```
|
|
|
|
---
|
|
|
|
## Your Learning Path
|
|
|
|
Start at the top and work through in order. Each tutorial builds on the one before it. The **Companion Slides** column links directly to the lecture deck that covers the same material -- use them for visual explanations, worked examples, and active learning exercises.
|
|
|
|
| Step | Tutorial | You Will Learn | Time | Companion Slides |
|
|
|:-----|:---------|:---------------|:-----|:-----------------|
|
|
| 1 | [Hello, Roofline](tutorials/00_hello_roofline.qmd) | The roofline model, memory-bound vs. compute-bound, batch size sweeps | 15 min | [Hardware Acceleration (Vol I, Ch 11)](https://github.com/harvard-edge/cs249r_book/releases/download/slides-latest/vol1_11_hw_acceleration.pdf){target="_blank"} |
|
|
| 2 | [Geography is a Systems Variable](tutorials/07_geography.qmd) | Energy, carbon footprint, regional grid effects | 20 min | [Sustainable AI (Vol II, Ch 15)](https://github.com/harvard-edge/cs249r_book/releases/download/slides-latest/vol2_15_sustainable_ai.pdf){target="_blank"} |
|
|
| 3 | [Two Phases of Inference](tutorials/02_two_phases.qmd) | TTFT vs. ITL, KV-cache pressure, the two phases of LLM inference | 25 min | [Model Serving (Vol I, Ch 13)](https://github.com/harvard-edge/cs249r_book/releases/download/slides-latest/vol1_13_model_serving.pdf){target="_blank"} and [Inference at Scale (Vol II, Ch 9)](https://github.com/harvard-edge/cs249r_book/releases/download/slides-latest/vol2_09_inference.pdf){target="_blank"} |
|
|
| 4 | [Distributed Training](tutorials/distributed.qmd) | Data/tensor/pipeline parallelism, communication overhead, scaling efficiency | 30 min | [Distributed Training (Vol II, Ch 5)](https://github.com/harvard-edge/cs249r_book/releases/download/slides-latest/vol2_05_distributed_training.pdf){target="_blank"} and [Collective Communication (Vol II, Ch 6)](https://github.com/harvard-edge/cs249r_book/releases/download/slides-latest/vol2_06_collective_communication.pdf){target="_blank"} |
|
|
|
|
: {tbl-colwidths="[5,15,35,7,38]"}
|
|
|
|
::: {.callout-tip}
|
|
## Predict Before You Compute
|
|
Every tutorial includes "predict first" exercises. Before running code, write down what you expect. This practice builds the mental models that make you effective at systems reasoning. The companion slide decks include the same predict-first methodology with 8--11 active learning moments per deck.
|
|
:::
|
|
|
|
---
|
|
|
|
## How MLSYSIM Maps to the Textbook and Slides
|
|
|
|
MLSYSIM is the companion framework for the [Machine Learning Systems](https://mlsysbook.ai) textbook. Each solver maps to specific chapters and slide decks. Use the slide links below to review the theory before (or after) running the solver.
|
|
|
|
| MLSYSIM Solver | What It Models | Textbook Topic | Slide Deck |
|
|
|:---------------|:---------------|:---------------|:-----------|
|
|
| SingleNodeModel | Roofline analysis, compute vs. memory bottleneck | Hardware Acceleration | [Vol I, Ch 11](https://github.com/harvard-edge/cs249r_book/releases/download/slides-latest/vol1_11_hw_acceleration.pdf){target="_blank"} |
|
|
| ServingModel | TTFT, ITL, KV-cache memory | Model Serving | [Vol I, Ch 13](https://github.com/harvard-edge/cs249r_book/releases/download/slides-latest/vol1_13_model_serving.pdf){target="_blank"} |
|
|
| DistributedModel | 3D parallelism, all-reduce, pipeline bubbles | Distributed Training | [Vol II, Ch 5](https://github.com/harvard-edge/cs249r_book/releases/download/slides-latest/vol2_05_distributed_training.pdf){target="_blank"} |
|
|
| EconomicsModel | CapEx, OpEx, TCO | Compute Infrastructure | [Vol II, Ch 2](https://github.com/harvard-edge/cs249r_book/releases/download/slides-latest/vol2_02_compute_infrastructure.pdf){target="_blank"} |
|
|
| SustainabilityModel | Energy, carbon, water usage | Sustainable AI | [Vol II, Ch 15](https://github.com/harvard-edge/cs249r_book/releases/download/slides-latest/vol2_15_sustainable_ai.pdf){target="_blank"} |
|
|
| ReliabilityModel | MTBF, checkpoint interval | Fault Tolerance | [Vol II, Ch 7](https://github.com/harvard-edge/cs249r_book/releases/download/slides-latest/vol2_07_fault_tolerance.pdf){target="_blank"} |
|
|
|
|
: {tbl-colwidths="[18,25,17,40]"}
|
|
|
|
Not using the textbook? No problem -- MLSYSIM is self-contained. The [Math Foundations](math.qmd) page documents every equation, and each slide deck stands on its own with full speaker notes.
|
|
|
|
---
|
|
|
|
## Recommended Study Workflow
|
|
|
|
Whether you are self-studying or following a course, this workflow maximizes retention:
|
|
|
|
1. **Read** the textbook chapter (or skim the slide deck) to get the conceptual framework
|
|
2. **Predict** what will happen before running any code -- write it down
|
|
3. **Simulate** using MLSYSIM to test your prediction against real hardware specs
|
|
4. **Explore** by changing one parameter at a time (batch size, precision, hardware) and observing the effect
|
|
5. **Reflect** on where your prediction was wrong -- that gap is where learning happens
|
|
|
|
::: {.callout-note}
|
|
## Self-Study vs. Classroom
|
|
If you are self-studying, the slide decks include **speaker notes** with timing guidance, teaching tips, and common misconceptions -- they are written to be useful even without an instructor. If you are in a course, your instructor may assign specific tutorials as homework; check the [Instructor Guide](for-instructors.qmd) for the recommended pairing.
|
|
:::
|
|
|
|
---
|
|
|
|
## Slides at a Glance
|
|
|
|
The full slide collection covers both volumes of the textbook. Every deck includes speaker notes, active learning exercises, and original SVG diagrams.
|
|
|
|
:::: {.columns}
|
|
|
|
::: {.column width="50%"}
|
|
**Volume I: Foundations** (17 decks, 570 slides)
|
|
|
|
Covers the single-machine ML stack: data engineering, neural computation, architectures, frameworks, training, compression, hardware acceleration, serving, and operations.
|
|
|
|
[Browse Vol I Decks](https://mlsysbook.ai/slides/vol1.html){target="_blank"} | [Download All (PDF)](https://github.com/harvard-edge/cs249r_book/releases/download/slides-latest/MLSysBook-Slides-Vol1-PDF.zip){target="_blank"}
|
|
:::
|
|
|
|
::: {.column width="50%"}
|
|
**Volume II: At Scale** (18 decks, 529 slides)
|
|
|
|
Covers distributed infrastructure: compute clusters, network fabrics, distributed training, fault tolerance, fleet orchestration, inference at scale, and governance.
|
|
|
|
[Browse Vol II Decks](https://mlsysbook.ai/slides/vol2.html){target="_blank"} | [Download All (PDF)](https://github.com/harvard-edge/cs249r_book/releases/download/slides-latest/MLSysBook-Slides-Vol2-PDF.zip){target="_blank"}
|
|
:::
|
|
|
|
::::
|
|
|
|
---
|
|
|
|
## Next Steps
|
|
|
|
- **[Getting Started](getting-started.qmd)** -- Install MLSYSIM and run your first analysis
|
|
- **[Hello, Roofline Tutorial](tutorials/00_hello_roofline.qmd)** -- Your first roofline analysis
|
|
- **[Solver Guide](solver-guide.qmd)** -- Deep dive into each solver's capabilities
|
|
- **[Glossary](glossary.qmd)** -- Look up any unfamiliar term
|
|
- **[Math Foundations](math.qmd)** -- The equations behind every solver
|
|
- **[All Slide Decks](https://mlsysbook.ai/slides/)** -- 35 Beamer decks with speaker notes and active learning exercises
|