mirror of
https://github.com/harvard-edge/cs249r_book.git
synced 2026-03-11 17:49:25 -05:00
Complete MLSYSIM v0.1.0 implementation with: - Documentation website (Quarto): landing page with animated hero and capability carousel, 4 tutorials (hello world, LLM serving, distributed training, sustainability), hardware/model/fleet/infra catalogs, solver guide, whitepaper, math foundations, glossary, and full quartodoc API reference - Typed registry system: Hardware (18 devices across 5 tiers), Models (15 workloads), Systems (fleets, clusters, fabrics), Infrastructure (grid profiles, rack configs, datacenters) - Core types: Pint-backed Quantity, Metadata provenance tracking, custom exception hierarchy (OOMError, SLAViolation) - SimulationConfig with YAML/JSON loading and pre-validation - Scenario system tying workloads to systems with SLA constraints - Multi-level evaluation scorecard (feasibility, performance, macro) - Examples, tests, and Jetson Orin NX spec fix (100 → 25 TFLOP/s) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
76 lines
2.5 KiB
Plaintext
76 lines
2.5 KiB
Plaintext
---
|
|
title: "Tutorials"
|
|
subtitle: "Step-by-step guides for modeling ML Systems."
|
|
format:
|
|
html:
|
|
toc: false
|
|
---
|
|
|
|
These tutorials are designed to build intuition for ML systems using the `mlsysim` framework.
|
|
They map directly to chapters in the *Machine Learning Systems* textbook—start at the beginning
|
|
or jump to any topic.
|
|
|
|
::: {.tutorial-grid}
|
|
|
|
::: {.tutorial-card}
|
|
[Beginner]{.tutorial-level .level-beginner}
|
|
|
|
### Hello World: Single-Node Roofline
|
|
|
|
Learn to lower a model onto hardware and identify the performance bottleneck.
|
|
Understand memory-bound vs. compute-bound in 5 minutes.
|
|
|
|
[Start Tutorial →](hello_world.qmd){.tutorial-arrow}
|
|
:::
|
|
|
|
::: {.tutorial-card}
|
|
[Intermediate]{.tutorial-level .level-intermediate}
|
|
|
|
### Sustainability Lab: Carbon Footprint
|
|
|
|
Calculate the energy and CO₂ cost of training a frontier LLM across different
|
|
geographical grid regions. Quebec vs. Poland—the numbers will surprise you.
|
|
|
|
[Start Tutorial →](sustainability.qmd){.tutorial-arrow}
|
|
:::
|
|
|
|
::: {.tutorial-card}
|
|
[Intermediate]{.tutorial-level .level-intermediate}
|
|
|
|
### LLM Serving: TTFT, ITL & the Memory Wall
|
|
|
|
Model the two physical regimes of autoregressive generation: the compute-bound
|
|
pre-fill phase and the memory-bound decoding phase. Discover how quantization
|
|
and hardware choice affect each phase differently.
|
|
|
|
[Start Tutorial →](llm_serving.qmd){.tutorial-arrow}
|
|
:::
|
|
|
|
::: {.tutorial-card}
|
|
[Advanced]{.tutorial-level .level-advanced}
|
|
|
|
### Distributed Training: 3D Parallelism
|
|
|
|
Explore Data, Tensor, and Pipeline parallelism overhead. Model the ring all-reduce
|
|
communication cost and pipeline bubble fraction on a 256-GPU H100 cluster.
|
|
|
|
[Start Tutorial →](distributed.qmd){.tutorial-arrow}
|
|
:::
|
|
|
|
:::
|
|
|
|
---
|
|
|
|
## Learning Path
|
|
|
|
If you're new to ML systems modeling, we recommend this sequence:
|
|
|
|
1. **[Hello World](hello_world.qmd)** — Understand the roofline model and what determines inference speed.
|
|
2. **[Sustainability Lab](sustainability.qmd)** — Apply the framework to a real-world carbon analysis.
|
|
3. **[LLM Serving Lab](llm_serving.qmd)** — Model TTFT, ITL, and KV-cache pressure for production LLM serving.
|
|
4. **[Distributed Training](distributed.qmd)** — Scale to hundreds of GPUs and analyze where efficiency is lost.
|
|
5. **[Hardware Zoo](../zoo/hardware.qmd)** — Explore the vetted hardware specifications across deployment tiers.
|
|
6. *(Optional)* **[Math Foundations](../math.qmd)** — The first-principles equations behind every solver.
|
|
|
|
> **Tip:** All tutorials are Jupyter/Quarto compatible. Run them locally after `pip install mlsysim`.
|