cs249r_book/mlsysim/docs/for-students.qmd

---
title: "For Students"
subtitle: "Build intuition for ML systems -- without needing GPU hardware."
---

Whether you are taking your first ML systems course or preparing for industry interviews, MLSYSIM lets you experiment with real hardware specifications and see exactly *why* systems behave the way they do. Every number comes from a real datasheet. Every equation is grounded in peer-reviewed literature.

---

## What You Will Learn

By working through the MLSYSIM tutorials and exercises, you will:

- **Identify bottlenecks** -- Determine whether a workload is memory-bound or compute-bound on any hardware, and understand *why*
- **Reason quantitatively** -- Use real datasheet numbers (not made-up examples) to calculate latency, throughput, and cost
- **Build systems intuition** -- See how batch size, precision, parallelism strategy, and datacenter location each affect performance
- **Think across the stack** -- Connect workload characteristics to hardware specs to infrastructure constraints

---

## Prerequisites

- **Python**: Comfortable with functions, loops, and f-strings
- **Math**: Basic algebra (no calculus required -- all solver equations are arithmetic)
- **ML**: Familiarity with terms like "model parameters," "inference," and "training" (the [Glossary](glossary.qmd) defines everything else)

No GPU, no cloud account, no special hardware required. Just:

```bash
pip install mlsysim
```

See the [Getting Started](getting-started.qmd) guide for development installs and Colab/Binder options.

---

## Quick Start

```python
import mlsysim
from mlsysim import Engine

# Load a model and hardware from the vetted registry
model = mlsysim.Models.ResNet50
gpu   = mlsysim.Hardware.Cloud.A100

# Solve: is this workload memory-bound or compute-bound?
profile = Engine.solve(model=model, hardware=gpu, batch_size=1, precision="fp16")

print(f"Bottleneck: {profile.bottleneck}")   # → Memory
print(f"Latency:    {profile.latency.to('ms'):~.2f}")
```

---

## Your Learning Path

Start at the top and work through in order. Each tutorial builds on the one before it. The **Companion Slides** column links directly to the lecture deck that covers the same material -- use them for visual explanations, worked examples, and active learning exercises.

| Step | Tutorial | You Will Learn | Time | Companion Slides |
|:-----|:---------|:---------------|:-----|:-----------------|
| 1 | [Hello, Roofline](tutorials/00_hello_roofline.qmd) | The roofline model, memory-bound vs. compute-bound, batch size sweeps | 15 min | [Hardware Acceleration (Vol I, Ch 11)](https://github.com/harvard-edge/cs249r_book/releases/download/slides-latest/vol1_11_hw_acceleration.pdf){target="_blank"} |
| 2 | [Geography is a Systems Variable](tutorials/07_geography.qmd) | Energy, carbon footprint, regional grid effects | 20 min | [Sustainable AI (Vol II, Ch 15)](https://github.com/harvard-edge/cs249r_book/releases/download/slides-latest/vol2_15_sustainable_ai.pdf){target="_blank"} |
| 3 | [Two Phases of Inference](tutorials/02_two_phases.qmd) | TTFT vs. ITL, KV-cache pressure, the two phases of LLM inference | 25 min | [Model Serving (Vol I, Ch 13)](https://github.com/harvard-edge/cs249r_book/releases/download/slides-latest/vol1_13_model_serving.pdf){target="_blank"} and [Inference at Scale (Vol II, Ch 9)](https://github.com/harvard-edge/cs249r_book/releases/download/slides-latest/vol2_09_inference.pdf){target="_blank"} |
| 4 | [Distributed Training](tutorials/distributed.qmd) | Data/tensor/pipeline parallelism, communication overhead, scaling efficiency | 30 min | [Distributed Training (Vol II, Ch 5)](https://github.com/harvard-edge/cs249r_book/releases/download/slides-latest/vol2_05_distributed_training.pdf){target="_blank"} and [Collective Communication (Vol II, Ch 6)](https://github.com/harvard-edge/cs249r_book/releases/download/slides-latest/vol2_06_collective_communication.pdf){target="_blank"} |

: {tbl-colwidths="[5,15,35,7,38]"}

::: {.callout-tip}
## Predict Before You Compute
Every tutorial includes "predict first" exercises. Before running code, write down what you expect. This practice builds the mental models that make you effective at systems reasoning. The companion slide decks include the same predict-first methodology with 8--11 active learning moments per deck.
:::

---

## How MLSYSIM Maps to the Textbook and Slides

MLSYSIM is the companion framework for the [Machine Learning Systems](https://mlsysbook.ai) textbook. Each solver maps to specific chapters and slide decks. Use the slide links below to review the theory before (or after) running the solver.

| MLSYSIM Solver | What It Models | Textbook Topic | Slide Deck |
|:---------------|:---------------|:---------------|:-----------|
| SingleNodeModel | Roofline analysis, compute vs. memory bottleneck | Hardware Acceleration | [Vol I, Ch 11](https://github.com/harvard-edge/cs249r_book/releases/download/slides-latest/vol1_11_hw_acceleration.pdf){target="_blank"} |
| ServingModel | TTFT, ITL, KV-cache memory | Model Serving | [Vol I, Ch 13](https://github.com/harvard-edge/cs249r_book/releases/download/slides-latest/vol1_13_model_serving.pdf){target="_blank"} |
| DistributedModel | 3D parallelism, all-reduce, pipeline bubbles | Distributed Training | [Vol II, Ch 5](https://github.com/harvard-edge/cs249r_book/releases/download/slides-latest/vol2_05_distributed_training.pdf){target="_blank"} |
| EconomicsModel | CapEx, OpEx, TCO | Compute Infrastructure | [Vol II, Ch 2](https://github.com/harvard-edge/cs249r_book/releases/download/slides-latest/vol2_02_compute_infrastructure.pdf){target="_blank"} |
| SustainabilityModel | Energy, carbon, water usage | Sustainable AI | [Vol II, Ch 15](https://github.com/harvard-edge/cs249r_book/releases/download/slides-latest/vol2_15_sustainable_ai.pdf){target="_blank"} |
| ReliabilityModel | MTBF, checkpoint interval | Fault Tolerance | [Vol II, Ch 7](https://github.com/harvard-edge/cs249r_book/releases/download/slides-latest/vol2_07_fault_tolerance.pdf){target="_blank"} |

: {tbl-colwidths="[18,25,17,40]"}

Not using the textbook? No problem -- MLSYSIM is self-contained. The [Math Foundations](math.qmd) page documents every equation, and each slide deck stands on its own with full speaker notes.

---

## Recommended Study Workflow

Whether you are self-studying or following a course, this workflow maximizes retention:

1. **Read** the textbook chapter (or skim the slide deck) to get the conceptual framework
2. **Predict** what will happen before running any code -- write it down
3. **Simulate** using MLSYSIM to test your prediction against real hardware specs
4. **Explore** by changing one parameter at a time (batch size, precision, hardware) and observing the effect
5. **Reflect** on where your prediction was wrong -- that gap is where learning happens

::: {.callout-note}
## Self-Study vs. Classroom
If you are self-studying, the slide decks include **speaker notes** with timing guidance, teaching tips, and common misconceptions -- they are written to be useful even without an instructor. If you are in a course, your instructor may assign specific tutorials as homework; check the [Instructor Guide](for-instructors.qmd) for the recommended pairing.
:::

---

## Slides at a Glance

The full slide collection covers both volumes of the textbook. Every deck includes speaker notes, active learning exercises, and original SVG diagrams.

:::: {.columns}

::: {.column width="50%"}
**Volume I: Foundations** (17 decks, 570 slides)

Covers the single-machine ML stack: data engineering, neural computation, architectures, frameworks, training, compression, hardware acceleration, serving, and operations.

[Browse Vol I Decks](https://mlsysbook.ai/slides/vol1.html){target="_blank"} | [Download All (PDF)](https://github.com/harvard-edge/cs249r_book/releases/download/slides-latest/MLSysBook-Slides-Vol1-PDF.zip){target="_blank"}
:::

::: {.column width="50%"}
**Volume II: At Scale** (18 decks, 529 slides)

Covers distributed infrastructure: compute clusters, network fabrics, distributed training, fault tolerance, fleet orchestration, inference at scale, and governance.

[Browse Vol II Decks](https://mlsysbook.ai/slides/vol2.html){target="_blank"} | [Download All (PDF)](https://github.com/harvard-edge/cs249r_book/releases/download/slides-latest/MLSysBook-Slides-Vol2-PDF.zip){target="_blank"}
:::

::::

---

## Next Steps

- **[Getting Started](getting-started.qmd)** -- Install MLSYSIM and run your first analysis
- **[Hello, Roofline Tutorial](tutorials/00_hello_roofline.qmd)** -- Your first roofline analysis
- **[Solver Guide](solver-guide.qmd)** -- Deep dive into each solver's capabilities
- **[Glossary](glossary.qmd)** -- Look up any unfamiliar term
- **[Math Foundations](math.qmd)** -- The equations behind every solver
- **[All Slide Decks](https://mlsysbook.ai/slides/)** -- 35 Beamer decks with speaker notes and active learning exercises