mirror of
https://github.com/harvard-edge/cs249r_book.git
synced 2026-05-22 22:33:28 -05:00
Two issues caused the deployed slide PDFs to be unusable:
1. Every chapter .tex declared `\setsansfont{Helvetica Neue}` — proprietary
to Apple, not installed on the Ubuntu CI runner. xelatex bombed mid-frame,
the workflow's `|| true` swallowed the error, and the resulting PDF had
most text never typeset (blank pages with only logos/rules surviving).
Switch all 35 decks to TeX Gyre Heros (sans) and TeX Gyre Cursor (mono),
both bundled with texlive-fonts-extra — no external font downloads needed.
Drop the JetBrains Mono wget step and fonts-liberation from both slide
workflows accordingly.
2. Vol1 and Vol2 each ship `00_course_overview.pdf` and `01_introduction.pdf`.
The publish workflow uploaded them to a flat GitHub Release namespace, so
the second upload silently overwrote the first — clicking Vol I's Course
Overview actually downloaded Vol II's deck. Stage prefixed copies
(vol1_*.pdf, vol2_*.pdf) before upload, and update slides/vol{1,2}.qmd
plus the mlsysim cross-links to point at the new prefixed URLs.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
301 lines
13 KiB
Plaintext
301 lines
13 KiB
Plaintext
---
|
|
title: "Getting Started"
|
|
subtitle: "Install MLSYSIM and run your first analysis in under 5 minutes."
|
|
---
|
|
|
|
::: {.callout-note}
|
|
## Prerequisites
|
|
MLSYSIM assumes basic Python familiarity (variables, functions, `pip install`). No prior ML or hardware knowledge is required. Key concepts like **roofline analysis**, **memory-bound vs. compute-bound**, and **FLOP/s** are explained in context throughout the tutorials. For a full reference of terms, see the [Glossary](glossary.qmd).
|
|
:::
|
|
|
|
## Installation
|
|
|
|
MLSYSIM requires Python 3.10+ and installs cleanly with pip:
|
|
|
|
```bash
|
|
pip install mlsysim
|
|
```
|
|
|
|
For development or to follow along with tutorials locally:
|
|
|
|
```bash
|
|
git clone https://github.com/harvard-edge/cs249r_book
|
|
cd cs249r_book/mlsysim
|
|
pip install -e ".[dev]"
|
|
```
|
|
|
|
Verify the installation:
|
|
|
|
```bash
|
|
python -c "import mlsysim; print(mlsysim.__version__)"
|
|
```
|
|
|
|
::: {.callout-tip}
|
|
## Local install recommended for now
|
|
Tutorials are pure Python and run in any Python 3.10+ environment. Hosted **Google Colab** and **Binder** launch buttons are planned for a future release; until then, install locally with the steps above.
|
|
:::
|
|
|
|
---
|
|
|
|
## Your First Analysis
|
|
|
|
Once installed, you can run a complete roofline analysis in five lines. The roofline model is the foundation of ML systems performance reasoning -- it determines whether your workload is limited by compute (arithmetic units) or memory (data movement). For a visual walkthrough, see the [Hardware Acceleration slide deck (Vol I, Ch 11)](https://github.com/harvard-edge/cs249r_book/releases/download/slides-latest/vol1_11_hw_acceleration.pdf){target="_blank"}.
|
|
|
|
```python
|
|
import mlsysim
|
|
from mlsysim import Engine
|
|
|
|
# 1. Load a model and hardware from the vetted Zoo
|
|
model = mlsysim.Models.ResNet50
|
|
hardware = mlsysim.Hardware.Cloud.A100
|
|
|
|
# 2. Solve -- the Engine applies the roofline model
|
|
profile = Engine.solve(model=model, hardware=hardware, batch_size=1, precision="fp16")
|
|
|
|
# 3. Read the results
|
|
print(f"Bottleneck: {profile.bottleneck}") # → 'Memory'
|
|
print(f"Latency: {profile.latency.to('ms'):~.2f}") # → 0.54 ms
|
|
print(f"Throughput: {profile.throughput:.0f}") # → 1843 / second
|
|
```
|
|
|
|
::: {.callout-note}
|
|
## Working with units
|
|
MLSYSIM uses the [Pint](https://pint.readthedocs.io/) library for physical units. All quantities carry attached units (ms, GB, TFLOP/s, etc.). Use `.to('ms')` to convert between units. Use `.magnitude` to extract the raw number when you need it for calculations or plotting.
|
|
:::
|
|
|
|
---
|
|
|
|
## Understanding the Output
|
|
|
|
`Engine.solve()` returns a `PerformanceProfile` -- a structured result containing everything the roofline model can tell you about your workload.
|
|
|
|
### Core fields
|
|
|
|
| Field | What it means |
|
|
|:------|:--------------|
|
|
| `bottleneck` | `'Memory'` or `'Compute'` -- which resource limits performance |
|
|
| `latency` | Time to process one batch, derived from the roofline ceiling |
|
|
| `throughput` | Samples per second = `batch_size / latency` |
|
|
| `latency_compute` | Time if only compute were the constraint |
|
|
| `latency_memory` | Time if only memory bandwidth were the constraint |
|
|
| `arithmetic_intensity` | Operations per byte -- the x-axis of the roofline plot |
|
|
|
|
### Extended fields
|
|
|
|
| Field | What it means |
|
|
|:------|:--------------|
|
|
| `energy` | Estimated energy consumption (Joules) |
|
|
| `memory_footprint` | Total memory required for the workload |
|
|
| `mfu` | Model FLOPs Utilization -- fraction of peak compute achieved |
|
|
| `feasible` | Whether the workload fits in device memory |
|
|
|
|
::: {.callout-tip}
|
|
## The key insight
|
|
If `latency_memory > latency_compute`, you are **memory-bound**: faster arithmetic units will not help.
|
|
You need to increase batch size, use a more compute-dense operation (e.g., fused attention), or reduce
|
|
data movement. If you are **compute-bound**, that is when parallelism and quantization pay off.
|
|
|
|
This is the same insight taught in the [Neural Network Computation slides (Vol I, Ch 5)](https://github.com/harvard-edge/cs249r_book/releases/download/slides-latest/vol1_05_nn_computation.pdf){target="_blank"} and the [Performance Engineering slides (Vol II, Ch 10)](https://github.com/harvard-edge/cs249r_book/releases/download/slides-latest/vol2_10_performance_engineering.pdf){target="_blank"}.
|
|
:::
|
|
|
|
---
|
|
|
|
## Exploring the Zoo
|
|
|
|
MLSYSIM ships with vetted registries of hardware, models, infrastructure, and systems -- all sourced from real datasheets. Use tab-completion to explore.
|
|
|
|
### Hardware
|
|
|
|
Five tiers spanning the full deployment spectrum:
|
|
|
|
```python
|
|
# Cloud accelerators
|
|
mlsysim.Hardware.Cloud.A100
|
|
mlsysim.Hardware.Cloud.H100
|
|
mlsysim.Hardware.Cloud.H200
|
|
|
|
# Workstation / desktop GPUs
|
|
mlsysim.Hardware.Workstation.DGX_Spark
|
|
|
|
# Mobile processors
|
|
mlsysim.Hardware.Mobile.iPhone15Pro
|
|
mlsysim.Hardware.Mobile.Snapdragon8Gen3
|
|
|
|
# Edge devices
|
|
mlsysim.Hardware.Edge.JetsonOrinNX
|
|
|
|
# Tiny / microcontroller targets
|
|
mlsysim.Hardware.Tiny.ESP32
|
|
mlsysim.Hardware.Tiny.HimaxWE1
|
|
```
|
|
|
|
For the theory behind this hardware spectrum, see the [Compute Infrastructure slides (Vol II, Ch 2)](https://github.com/harvard-edge/cs249r_book/releases/download/slides-latest/vol2_02_compute_infrastructure.pdf){target="_blank"}.
|
|
|
|
### Models
|
|
|
|
Organized by application domain:
|
|
|
|
```python
|
|
# Language models
|
|
mlsysim.Models.Language.GPT2
|
|
mlsysim.Models.Language.Llama3_8B
|
|
mlsysim.Models.Language.Llama3_70B
|
|
|
|
# Vision models
|
|
mlsysim.Models.Vision.ResNet50
|
|
mlsysim.Models.Vision.MobileNetV2
|
|
mlsysim.Models.Vision.AlexNet
|
|
|
|
# Tiny / edge models
|
|
mlsysim.Models.Tiny.DS_CNN
|
|
mlsysim.Models.Tiny.WakeVision
|
|
```
|
|
|
|
### Infrastructure
|
|
|
|
Regional grids and datacenter configurations for sustainability analysis:
|
|
|
|
```python
|
|
# Regional power grids -- carbon intensity varies by energy source
|
|
mlsysim.Infra.Grids.Quebec # hydro: ~20 gCO2/kWh
|
|
mlsysim.Infra.Grids.US_Avg # mixed: ~390 gCO2/kWh
|
|
mlsysim.Infra.Grids.Poland # coal: ~820 gCO2/kWh
|
|
```
|
|
|
|
The [Sustainable AI slides (Vol II, Ch 15)](https://github.com/harvard-edge/cs249r_book/releases/download/slides-latest/vol2_15_sustainable_ai.pdf){target="_blank"} explain why datacenter location is a first-class engineering decision.
|
|
|
|
### Systems
|
|
|
|
Cluster definitions for distributed analysis:
|
|
|
|
```python
|
|
# Network fabrics
|
|
mlsysim.Systems.Fabrics.InfiniBand_NDR
|
|
mlsysim.Systems.Fabrics.Ethernet_100G
|
|
|
|
# Pre-configured clusters
|
|
mlsysim.Systems.Clusters.Frontier_8K
|
|
mlsysim.Systems.Clusters.Research_256
|
|
```
|
|
|
|
For the full topology and cluster modeling, see the [Distributed Training slides (Vol II, Ch 5)](https://github.com/harvard-edge/cs249r_book/releases/download/slides-latest/vol2_05_distributed_training.pdf){target="_blank"} and [Network Fabrics slides (Vol II, Ch 3)](https://github.com/harvard-edge/cs249r_book/releases/download/slides-latest/vol2_03_network_fabrics.pdf){target="_blank"}.
|
|
|
|
Complete registry listings are available in the [Zoo reference pages](zoo/index.qmd).
|
|
|
|
---
|
|
|
|
## Adjusting the Efficiency Parameter
|
|
|
|
The `efficiency` parameter (η) is the single most important tuning knob in
|
|
MLSYSIM. It represents the fraction of theoretical peak hardware performance
|
|
that is actually achieved in practice. Most GPUs run at 2--5% of peak without optimization; well-tuned workloads reach 35--55%.
|
|
|
|
```python
|
|
# Default: well-optimized training (η = 0.5)
|
|
profile_default = Engine.solve(
|
|
model=model, hardware=hardware,
|
|
batch_size=32, precision="fp16", efficiency=0.5
|
|
)
|
|
|
|
# Conservative: typical inference workload (η = 0.35)
|
|
profile_inference = Engine.solve(
|
|
model=model, hardware=hardware,
|
|
batch_size=32, precision="fp16", efficiency=0.35
|
|
)
|
|
|
|
print(f"Training estimate: {profile_default.latency}")
|
|
print(f"Inference estimate: {profile_inference.latency}")
|
|
```
|
|
|
|
Typical efficiency ranges:
|
|
|
|
| Scenario | η range | Notes |
|
|
|:---------|:--------|:------|
|
|
| Well-optimized training (fp16) | 0.35--0.55 | Megatron-LM, DeepSpeed |
|
|
| Inference (fp16) | 0.25--0.45 | vLLM, TensorRT-LLM |
|
|
| Inference (int8) | 0.20--0.40 | Quantized serving |
|
|
|
|
See the [Accuracy & Validation](accuracy.qmd) page for guidance on choosing η
|
|
for different scenarios. The gap between theoretical peak and achieved throughput is covered in detail in the [Performance Engineering slides (Vol II, Ch 10)](https://github.com/harvard-edge/cs249r_book/releases/download/slides-latest/vol2_10_performance_engineering.pdf){target="_blank"}.
|
|
|
|
---
|
|
|
|
## Defining Custom Models
|
|
|
|
You are not limited to the Zoo. Define any model by specifying its parameters
|
|
and FLOPs:
|
|
|
|
```python
|
|
from mlsysim import TransformerWorkload
|
|
from mlsysim import ureg
|
|
|
|
my_model = TransformerWorkload(
|
|
name="My-Custom-LLM",
|
|
architecture="Transformer",
|
|
parameters=13e9 * ureg.param,
|
|
layers=40,
|
|
hidden_dim=5120,
|
|
heads=40,
|
|
kv_heads=8,
|
|
inference_flops=2 * 13e9 * ureg.flop # Rule of thumb: ~2 FLOPs per parameter
|
|
)
|
|
|
|
profile = Engine.solve(model=my_model, hardware=hardware, batch_size=1)
|
|
print(f"Bottleneck: {profile.bottleneck}")
|
|
print(f"Latency: {profile.latency}")
|
|
print(f"Feasible: {profile.feasible}") # Does the model fit in device memory?
|
|
```
|
|
|
|
The [Model Compression slides (Vol I, Ch 10)](https://github.com/harvard-edge/cs249r_book/releases/download/slides-latest/vol1_10_model_compression.pdf){target="_blank"} explain why parameter count and precision together determine both the memory footprint and the arithmetic intensity of a workload.
|
|
|
|
---
|
|
|
|
## Companion Slide Decks
|
|
|
|
MLSYSIM is the hands-on companion to the [Machine Learning Systems](https://mlsysbook.ai) textbook. The concepts you model with MLSYSIM are taught visually in 35 Beamer slide decks (1,099 slides total) with speaker notes and active learning exercises.
|
|
|
|
| Concept in MLSYSIM | Slide Deck | Key Topics |
|
|
|:--------------------|:-----------|:-----------|
|
|
| `Engine.solve()` and the roofline model | [Hardware Acceleration (Vol I, Ch 11)](https://github.com/harvard-edge/cs249r_book/releases/download/slides-latest/vol1_11_hw_acceleration.pdf){target="_blank"} | Roofline model, arithmetic intensity, systolic arrays, memory wall |
|
|
| FLOPs, MACs, and compute cost | [Neural Network Computation (Vol I, Ch 5)](https://github.com/harvard-edge/cs249r_book/releases/download/slides-latest/vol1_05_nn_computation.pdf){target="_blank"} | Forward/backward pass cost, training memory breakdown |
|
|
| Training memory and mixed precision | [Model Training (Vol I, Ch 8)](https://github.com/harvard-edge/cs249r_book/releases/download/slides-latest/vol1_08_training.pdf){target="_blank"} | Iron Law of Training, gradient checkpointing, mixed precision |
|
|
| Quantization and compression | [Model Compression (Vol I, Ch 10)](https://github.com/harvard-edge/cs249r_book/releases/download/slides-latest/vol1_10_model_compression.pdf){target="_blank"} | Pruning, quantization, knowledge distillation |
|
|
| Hardware Zoo tiers | [Compute Infrastructure (Vol II, Ch 2)](https://github.com/harvard-edge/cs249r_book/releases/download/slides-latest/vol2_02_compute_infrastructure.pdf){target="_blank"} | Accelerator spectrum, HBM architecture, TCO |
|
|
| DistributedModel | [Distributed Training (Vol II, Ch 5)](https://github.com/harvard-edge/cs249r_book/releases/download/slides-latest/vol2_05_distributed_training.pdf){target="_blank"} | 3D parallelism, scaling efficiency, communication overhead |
|
|
| ServingModel and LLM inference | [Model Serving (Vol I, Ch 13)](https://github.com/harvard-edge/cs249r_book/releases/download/slides-latest/vol1_13_model_serving.pdf){target="_blank"} | TTFT, ITL, KV-cache, batching strategies |
|
|
| SustainabilityModel | [Sustainable AI (Vol II, Ch 15)](https://github.com/harvard-edge/cs249r_book/releases/download/slides-latest/vol2_15_sustainable_ai.pdf){target="_blank"} | Energy wall, carbon geography, PUE |
|
|
| Efficiency parameter (η) | [Performance Engineering (Vol II, Ch 10)](https://github.com/harvard-edge/cs249r_book/releases/download/slides-latest/vol2_10_performance_engineering.pdf){target="_blank"} | Operator fusion, FlashAttention, precision engineering |
|
|
| Benchmarking and validation | [Benchmarking (Vol I, Ch 12)](https://github.com/harvard-edge/cs249r_book/releases/download/slides-latest/vol1_12_benchmarking.pdf){target="_blank"} | MLPerf, measurement methodology, latency percentiles |
|
|
|
|
: {tbl-colwidths="[22,30,48]"}
|
|
|
|
:::: {.columns}
|
|
|
|
::: {.column width="50%"}
|
|
**[Volume I: Foundations](https://mlsysbook.ai/slides/vol1.html){target="_blank"}** -- 17 decks, 570 slides
|
|
|
|
[Download All PDFs (ZIP)](https://github.com/harvard-edge/cs249r_book/releases/download/slides-latest/MLSysBook-Slides-Vol1-PDF.zip){target="_blank"}
|
|
:::
|
|
|
|
::: {.column width="50%"}
|
|
**[Volume II: At Scale](https://mlsysbook.ai/slides/vol2.html){target="_blank"}** -- 18 decks, 529 slides
|
|
|
|
[Download All PDFs (ZIP)](https://github.com/harvard-edge/cs249r_book/releases/download/slides-latest/MLSysBook-Slides-Vol2-PDF.zip){target="_blank"}
|
|
:::
|
|
|
|
::::
|
|
|
|
---
|
|
|
|
## Next Steps
|
|
|
|
::: {.callout-tip}
|
|
## Recommended path
|
|
Follow the [structured learning path](tutorials/index.qmd) on the Tutorials page,
|
|
starting with the **[Hello, Roofline Tutorial](tutorials/00_hello_roofline.qmd)**. Each tutorial
|
|
pairs with a companion slide deck for visual explanations and active learning exercises.
|
|
|
|
For a complete reference of which solver to use for different questions, see the
|
|
**[Solver Guide](solver-guide.qmd)**.
|
|
:::
|