mirror of
https://github.com/harvard-edge/cs249r_book.git
synced 2026-04-26 17:39:07 -05:00
- Fix landing page latency to match accuracy.qmd (0.34→0.42 ms, 2941→2381 img/s) - Add Three-Level Evaluation wall mapping table to architecture.qmd - Align KV-cache formula: show both simplified and PagedAttention forms - Add 6 missing wall equation sections to math.qmd (Walls 3, 5, 9, 10, 12, 19)
469 lines
21 KiB
Plaintext
469 lines
21 KiB
Plaintext
---
|
|
title: "MLSys·im"
|
|
page-layout: custom
|
|
sidebar: false
|
|
title-block-style: none
|
|
include-after-body:
|
|
- text: |
|
|
<script src="styles/landing.js" defer></script>
|
|
---
|
|
<!-- ============================================================
|
|
HERO (one cohesive dark section)
|
|
============================================================ -->
|
|
|
|
::: {.im-hero}
|
|
::: {.im-hero-inner}
|
|
|
|
::: {.im-badge}
|
|
Open Source · Companion to [mlsysbook.ai](https://mlsysbook.ai)
|
|
:::
|
|
|
|
::: {.im-title}
|
|
MLSys·im
|
|
:::
|
|
|
|
::: {.im-subtitle}
|
|
Predict ML system performance, cost, and carbon.<br/>From first principles.
|
|
:::
|
|
|
|
<div class="im-stats">
|
|
<div class="im-stat"><span class="im-stat-num">22</span><span class="im-stat-label">Systems Walls</span></div>
|
|
<div class="im-stat"><span class="im-stat-num">21</span><span class="im-stat-label">Physics Solvers</span></div>
|
|
<div class="im-stat"><span class="im-stat-num">6</span><span class="im-stat-label">Constraint Domains</span></div>
|
|
<div class="im-stat"><span class="im-stat-num"><0.3s</span><span class="im-stat-label">Full Analysis</span></div>
|
|
</div>
|
|
|
|
::: {.im-hero-desc}
|
|
The MIPS of ML systems. Reason about workloads — from microcontrollers to GPU clusters — without provisioning any hardware. Identify binding constraints, sweep design spaces, and build systems intuition in sub-second iteration cycles.
|
|
:::
|
|
|
|
::: {.im-install}
|
|
<code class="im-cmd">pip install mlsysim</code>
|
|
<button class="im-copy-btn" id="copy-btn" onclick="copyInstall()">Copy</button>
|
|
:::
|
|
|
|
::: {.im-ctas}
|
|
[Get Started](getting-started.qmd){.im-btn .im-btn-primary}
|
|
[Tutorials](tutorials/index.qmd){.im-btn .im-btn-ghost}
|
|
[Research Paper](whitepaper.qmd){.im-btn .im-btn-ghost}
|
|
:::
|
|
|
|
:::
|
|
:::
|
|
|
|
```{=html}
|
|
<div class="im-hero im-hero-showcase">
|
|
<div class="im-hero-inner">
|
|
<div class="im-carousel">
|
|
<div class="im-carousel-track">
|
|
<button class="im-arrow im-arrow-prev" aria-label="Previous slide">‹</button>
|
|
<button class="im-arrow im-arrow-next" aria-label="Next slide">›</button>
|
|
<div class="im-slide im-slide-active" data-index="0">
|
|
<div class="im-slide-label">Roofline Analysis</div>
|
|
<div class="im-slide-viz">
|
|
<svg viewBox="0 0 320 130" class="im-roofline-svg">
|
|
<line x1="40" y1="100" x2="300" y2="100" stroke="rgba(148,163,184,0.3)" stroke-width="1"/>
|
|
<line x1="40" y1="20" x2="40" y2="100" stroke="rgba(148,163,184,0.3)" stroke-width="1"/>
|
|
<text x="170" y="115" fill="#64748b" font-size="9" text-anchor="middle">Arithmetic Intensity (FLOP/Byte)</text>
|
|
<text x="15" y="60" fill="#64748b" font-size="9" text-anchor="middle" transform="rotate(-90,15,60)">FLOP/s</text>
|
|
<line x1="40" y1="95" x2="160" y2="30" stroke="#38bdf8" stroke-width="2"/>
|
|
<line x1="160" y1="30" x2="300" y2="30" stroke="#38bdf8" stroke-width="2"/>
|
|
<circle cx="90" cy="68" r="5" fill="#f59e0b" opacity="0.9"><animate attributeName="cy" values="68;65;68" dur="3s" repeatCount="indefinite"/></circle>
|
|
<text x="100" y="63" fill="#f59e0b" font-size="8">Memory Bound</text>
|
|
<circle cx="230" cy="30" r="5" fill="#10b981" opacity="0.9"><animate attributeName="cy" values="30;27;30" dur="3s" repeatCount="indefinite"/></circle>
|
|
<text x="240" y="25" fill="#10b981" font-size="8">Compute Bound</text>
|
|
<text x="160" y="45" fill="#94a3b8" font-size="7">Ridge Point</text>
|
|
<line x1="160" y1="30" x2="160" y2="100" stroke="rgba(148,163,184,0.2)" stroke-width="1" stroke-dasharray="4,3"/>
|
|
</svg>
|
|
</div>
|
|
<div class="im-slide-caption">Identify whether your workload is memory-bound or compute-bound on any hardware.</div>
|
|
</div>
|
|
<div class="im-slide" data-index="1">
|
|
<div class="im-slide-label">Hardware Comparison</div>
|
|
<div class="im-slide-viz">
|
|
<svg viewBox="0 0 320 130" class="im-bars-svg">
|
|
<text x="50" y="22" fill="#94a3b8" font-size="9" text-anchor="end">H100</text>
|
|
<rect x="55" y="12" width="0" height="14" rx="3" fill="#38bdf8"><animate attributeName="width" from="0" to="200" dur="1.5s" fill="freeze" begin="0s"/></rect>
|
|
<text x="260" y="23" fill="#94a3b8" font-size="8">990 TFLOP/s</text>
|
|
|
|
<text x="50" y="47" fill="#94a3b8" font-size="9" text-anchor="end">A100</text>
|
|
<rect x="55" y="37" width="0" height="14" rx="3" fill="#38bdf8" opacity="0.7"><animate attributeName="width" from="0" to="120" dur="1.5s" fill="freeze" begin="0.1s"/></rect>
|
|
<text x="180" y="48" fill="#94a3b8" font-size="8">312 TFLOP/s</text>
|
|
|
|
<text x="50" y="72" fill="#94a3b8" font-size="9" text-anchor="end">Jetson</text>
|
|
<rect x="55" y="62" width="0" height="14" rx="3" fill="#38bdf8" opacity="0.4"><animate attributeName="width" from="0" to="15" dur="1.5s" fill="freeze" begin="0.2s"/></rect>
|
|
<text x="75" y="73" fill="#94a3b8" font-size="8">25 TFLOP/s</text>
|
|
|
|
<text x="50" y="97" fill="#94a3b8" font-size="9" text-anchor="end">ESP32</text>
|
|
<rect x="55" y="87" width="0" height="14" rx="3" fill="#38bdf8" opacity="0.2"><animate attributeName="width" from="0" to="2" dur="1.5s" fill="freeze" begin="0.3s"/></rect>
|
|
<text x="62" y="98" fill="#94a3b8" font-size="8">0.5 GFLOP/s</text>
|
|
</svg>
|
|
</div>
|
|
<div class="im-slide-caption">18+ devices from cloud GPUs to microcontrollers, all with vetted datasheet specs.</div>
|
|
</div>
|
|
<div class="im-slide" data-index="2">
|
|
<div class="im-slide-label">Sustainability Analysis</div>
|
|
<div class="im-slide-viz">
|
|
<svg viewBox="0 0 320 130" class="im-sustain-svg">
|
|
<line x1="85" y1="10" x2="85" y2="110" stroke="rgba(148,163,184,0.1)" stroke-width="1"/>
|
|
<text x="80" y="22" fill="#94a3b8" font-size="9" text-anchor="end">Quebec</text>
|
|
<rect x="85" y="12" width="0" height="14" rx="3" fill="#10b981">
|
|
<animate attributeName="width" from="0" to="10" dur="1.5s" fill="freeze" begin="0s"/>
|
|
</rect>
|
|
<text x="100" y="23" fill="#94a3b8" font-size="8">20 g CO₂/kWh</text>
|
|
<text x="80" y="47" fill="#94a3b8" font-size="9" text-anchor="end">Norway</text>
|
|
<rect x="85" y="37" width="0" height="14" rx="3" fill="#10b981" opacity="0.8">
|
|
<animate attributeName="width" from="0" to="5" dur="1.5s" fill="freeze" begin="0.1s"/>
|
|
</rect>
|
|
<text x="95" y="48" fill="#94a3b8" font-size="8">10 g CO₂/kWh</text>
|
|
<text x="80" y="72" fill="#94a3b8" font-size="9" text-anchor="end">US Avg</text>
|
|
<rect x="85" y="62" width="0" height="14" rx="3" fill="#f59e0b">
|
|
<animate attributeName="width" from="0" to="95" dur="1.5s" fill="freeze" begin="0.2s"/>
|
|
</rect>
|
|
<text x="185" y="73" fill="#94a3b8" font-size="8">390 g CO₂/kWh</text>
|
|
<text x="80" y="97" fill="#94a3b8" font-size="9" text-anchor="end">Poland</text>
|
|
<rect x="85" y="87" width="0" height="14" rx="3" fill="#ef4444">
|
|
<animate attributeName="width" from="0" to="200" dur="1.5s" fill="freeze" begin="0.3s"/>
|
|
</rect>
|
|
<text x="290" y="98" fill="#94a3b8" font-size="8">820 g CO₂/kWh</text>
|
|
</svg>
|
|
</div>
|
|
<div class="im-slide-caption">Same workload, different region. Up to 41x difference in carbon footprint.</div>
|
|
</div>
|
|
<div class="im-slide" data-index="3">
|
|
<div class="im-slide-label">Data Wall Analysis</div>
|
|
<div class="im-slide-viz">
|
|
<svg viewBox="0 0 320 130" class="im-data-wall-svg">
|
|
<text x="160" y="15" fill="#94a3b8" font-size="9" text-anchor="middle">Waymo AV Ingestion on Jetson</text>
|
|
<!-- Pipeline flow -->
|
|
<rect x="20" y="35" width="80" height="40" rx="4" fill="rgba(56,189,248,0.1)" stroke="#38bdf8" stroke-width="1"/>
|
|
<text x="60" y="52" fill="#38bdf8" font-size="8" font-weight="bold" text-anchor="middle">Storage</text>
|
|
<text x="60" y="68" fill="#94a3b8" font-size="7" text-anchor="middle">7.0 GB/s</text>
|
|
|
|
<text x="110" y="58" fill="#94a3b8" font-size="12">→</text>
|
|
|
|
<rect x="130" y="35" width="60" height="40" rx="4" fill="rgba(239,68,68,0.1)" stroke="#ef4444" stroke-width="1"/>
|
|
<text x="160" y="52" fill="#ef4444" font-size="8" font-weight="bold" text-anchor="middle">Demand</text>
|
|
<text x="160" y="68" fill="#94a3b8" font-size="7" text-anchor="middle">19.0 TB/hr</text>
|
|
|
|
<text x="200" y="58" fill="#94a3b8" font-size="12">→</text>
|
|
|
|
<rect x="220" y="35" width="80" height="40" rx="4" fill="rgba(148,163,184,0.1)" stroke="rgba(148,163,184,0.3)" stroke-width="1"/>
|
|
<text x="260" y="52" fill="#94a3b8" font-size="8" font-weight="bold" text-anchor="middle">Compute</text>
|
|
<text x="260" y="68" fill="#ef4444" font-size="7" text-anchor="middle">STALLED</text>
|
|
|
|
<rect x="70" y="90" width="180" height="22" rx="4" fill="rgba(239,68,68,0.1)" stroke="rgba(239,68,68,0.3)" stroke-width="1"/>
|
|
<text x="160" y="105" fill="#ef4444" font-size="8" font-weight="bold" text-anchor="middle">Utilization: 2.7x over capacity</text>
|
|
</svg>
|
|
</div>
|
|
<div class="im-slide-caption">Identify throughput bottlenecks between storage hierarchy, IO interconnects, and compute.</div>
|
|
</div>
|
|
<div class="im-slide" data-index="4">
|
|
<div class="im-slide-label">LLM Serving</div>
|
|
<div class="im-slide-viz">
|
|
<svg viewBox="0 0 320 130" class="im-serving-svg">
|
|
<text x="160" y="15" fill="#94a3b8" font-size="9" text-anchor="middle">Llama-3.1-8B on H100</text>
|
|
<rect x="30" y="30" width="120" height="50" rx="6" fill="rgba(56,189,248,0.1)" stroke="rgba(56,189,248,0.3)" stroke-width="1"/>
|
|
<text x="90" y="46" fill="#38bdf8" font-size="9" font-weight="bold" text-anchor="middle">Pre-fill</text>
|
|
<text x="90" y="64" fill="#7dd3fc" font-size="18" font-weight="bold" text-anchor="middle">4.2 ms</text>
|
|
<text x="90" y="76" fill="#64748b" font-size="7" text-anchor="middle">TTFT (compute-bound)</text>
|
|
<text x="160" y="60" fill="#94a3b8" font-size="14">→</text>
|
|
<rect x="180" y="30" width="120" height="50" rx="6" fill="rgba(16,185,129,0.1)" stroke="rgba(16,185,129,0.3)" stroke-width="1"/>
|
|
<text x="240" y="46" fill="#10b981" font-size="9" font-weight="bold" text-anchor="middle">Decode</text>
|
|
<text x="240" y="64" fill="#6ee7b7" font-size="18" font-weight="bold" text-anchor="middle">0.8 ms</text>
|
|
<text x="240" y="76" fill="#64748b" font-size="7" text-anchor="middle">ITL (memory-bound)</text>
|
|
<rect x="70" y="96" width="180" height="22" rx="4" fill="rgba(245,158,11,0.1)" stroke="rgba(245,158,11,0.3)" stroke-width="1"/>
|
|
<text x="160" y="111" fill="#f59e0b" font-size="8" text-anchor="middle">KV-Cache: 2.1 GB / 80 GB available</text>
|
|
</svg>
|
|
</div>
|
|
<div class="im-slide-caption">Model the two phases of autoregressive inference and KV-cache memory pressure.</div>
|
|
</div>
|
|
<div class="im-slide" data-index="5">
|
|
<div class="im-slide-label">Distributed Training</div>
|
|
<div class="im-slide-viz">
|
|
<svg viewBox="0 0 320 130" class="im-distributed-svg">
|
|
<text x="160" y="15" fill="#94a3b8" font-size="9" text-anchor="middle">256× H100 — GPT-3 175B</text>
|
|
<!-- Parallelism strategy boxes -->
|
|
<rect x="10" y="28" width="95" height="42" rx="6" fill="rgba(124,58,237,0.1)" stroke="rgba(124,58,237,0.3)" stroke-width="1"/>
|
|
<text x="57.5" y="45" fill="#a78bfa" font-size="8" font-weight="bold" text-anchor="middle">Data Parallel</text>
|
|
<text x="57.5" y="62" fill="#c4b5fd" font-size="16" font-weight="bold" text-anchor="middle">32×</text>
|
|
|
|
<rect x="112.5" y="28" width="95" height="42" rx="6" fill="rgba(56,189,248,0.1)" stroke="rgba(56,189,248,0.3)" stroke-width="1"/>
|
|
<text x="160" y="45" fill="#38bdf8" font-size="8" font-weight="bold" text-anchor="middle">Tensor Parallel</text>
|
|
<text x="160" y="62" fill="#7dd3fc" font-size="16" font-weight="bold" text-anchor="middle">4×</text>
|
|
|
|
<rect x="215" y="28" width="95" height="42" rx="6" fill="rgba(16,185,129,0.1)" stroke="rgba(16,185,129,0.3)" stroke-width="1"/>
|
|
<text x="262.5" y="45" fill="#10b981" font-size="8" font-weight="bold" text-anchor="middle">Pipeline Parallel</text>
|
|
<text x="262.5" y="62" fill="#6ee7b7" font-size="16" font-weight="bold" text-anchor="middle">2×</text>
|
|
|
|
<!-- Results row -->
|
|
<line x1="20" y1="82" x2="300" y2="82" stroke="rgba(148,163,184,0.15)" stroke-width="1"/>
|
|
<text x="85" y="98" fill="#94a3b8" font-size="8" text-anchor="middle">Scaling Efficiency</text>
|
|
<text x="85" y="118" fill="#a78bfa" font-size="18" font-weight="bold" text-anchor="middle">74%</text>
|
|
<text x="235" y="98" fill="#94a3b8" font-size="8" text-anchor="middle">Pipeline Bubble</text>
|
|
<text x="235" y="114" fill="#f59e0b" font-size="18" font-weight="bold" text-anchor="middle">6.3%</text>
|
|
</svg>
|
|
</div>
|
|
<div class="im-slide-caption">3D parallelism decomposition: data, tensor, and pipeline parallel scaling on GPU clusters.</div>
|
|
</div>
|
|
<div class="im-slide" data-index="6">
|
|
<div class="im-slide-label">Total Cost of Ownership</div>
|
|
<div class="im-slide-viz">
|
|
<svg viewBox="0 0 320 130" class="im-tco-svg">
|
|
<text x="160" y="15" fill="#94a3b8" font-size="9" text-anchor="middle">64× H100 Cluster — 3-Year TCO</text>
|
|
<!-- Stacked cost bars -->
|
|
<text x="50" y="42" fill="#94a3b8" font-size="9" text-anchor="end">CapEx</text>
|
|
<rect x="55" y="30" width="0" height="16" rx="3" fill="#38bdf8"><animate attributeName="width" from="0" to="200" dur="1.5s" fill="freeze"/></rect>
|
|
<text x="260" y="42" fill="#94a3b8" font-size="8">$2.0M</text>
|
|
|
|
<text x="50" y="68" fill="#94a3b8" font-size="9" text-anchor="end">Energy</text>
|
|
<rect x="55" y="56" width="0" height="16" rx="3" fill="#f59e0b"><animate attributeName="width" from="0" to="120" dur="1.5s" fill="freeze" begin="0.1s"/></rect>
|
|
<text x="180" y="68" fill="#94a3b8" font-size="8">$1.2M</text>
|
|
|
|
<text x="50" y="94" fill="#94a3b8" font-size="9" text-anchor="end">Maint.</text>
|
|
<rect x="55" y="82" width="0" height="16" rx="3" fill="#10b981"><animate attributeName="width" from="0" to="50" dur="1.5s" fill="freeze" begin="0.2s"/></rect>
|
|
<text x="110" y="94" fill="#94a3b8" font-size="8">$0.5M</text>
|
|
|
|
<!-- Total -->
|
|
<line x1="55" y1="108" x2="260" y2="108" stroke="rgba(148,163,184,0.2)" stroke-width="1"/>
|
|
<text x="55" y="124" fill="#94a3b8" font-size="9">Total TCO</text>
|
|
<text x="260" y="124" fill="#e2e8f0" font-size="14" font-weight="bold" text-anchor="end">$3.7M</text>
|
|
</svg>
|
|
</div>
|
|
<div class="im-slide-caption">Break down hardware, energy, and maintenance costs over any time horizon.</div>
|
|
</div>
|
|
</div>
|
|
<div class="im-carousel-dots">
|
|
<button class="im-dot im-dot-active" data-slide="0" aria-label="Roofline Analysis"></button>
|
|
<button class="im-dot" data-slide="1" aria-label="Hardware Comparison"></button>
|
|
<button class="im-dot" data-slide="2" aria-label="Sustainability"></button>
|
|
<button class="im-dot" data-slide="3" aria-label="Data Wall"></button>
|
|
<button class="im-dot" data-slide="4" aria-label="LLM Serving"></button>
|
|
<button class="im-dot" data-slide="5" aria-label="Distributed Training"></button>
|
|
<button class="im-dot" data-slide="6" aria-label="Total Cost of Ownership"></button>
|
|
</div>
|
|
</div>
|
|
</div>
|
|
</div>
|
|
```
|
|
|
|
|
|
<!-- ============================================================
|
|
CONTENT SECTIONS
|
|
============================================================ -->
|
|
|
|
::: {.im-content}
|
|
|
|
<!-- QUICK EXAMPLE -->
|
|
::: {.im-section}
|
|
|
|
::: {.im-section-header}
|
|
### Try it in 5 lines
|
|
:::
|
|
|
|
```python
|
|
import mlsysim
|
|
from mlsysim import Engine
|
|
|
|
profile = Engine.solve(
|
|
model = mlsysim.Models.ResNet50,
|
|
hardware = mlsysim.Hardware.Cloud.A100,
|
|
batch_size = 1,
|
|
precision = "fp16"
|
|
)
|
|
|
|
print(f"Bottleneck: {profile.bottleneck}") # → Memory Bound
|
|
print(f"Latency: {profile.latency.to('ms'):~.2f}") # → 0.42 ms
|
|
print(f"Throughput: {profile.throughput:.0f} img/s") # → 2381 img/s
|
|
```
|
|
|
|
At batch=1, ResNet-50 loads ~50 MB of weights but performs only ~8 GFLOPs, making it firmly memory-bound on any modern GPU. The solver identifies this in microseconds using the **Iron Law** [@williams2009roofline]:
|
|
|
|
$$T = \max\!\left(\frac{\text{FLOPs}}{\text{Peak} \times \eta},\ \frac{\text{Bytes}}{\text{BW}}\right)$$
|
|
|
|
:::
|
|
|
|
<!-- WHAT YOU CAN MODEL -->
|
|
::: {.im-section}
|
|
|
|
::: {.im-section-header}
|
|
### 22 walls, 21 solvers, one framework
|
|
:::
|
|
|
|
Every physical constraint that bounds an ML system — from single-accelerator compute ceilings to fleet-scale carbon emissions — has a mathematical resolver. No hardware required.
|
|
|
|
::: {.im-solvers-grid}
|
|
|
|
::: {.im-solver-card .im-solver-roofline}
|
|
::: {.im-solver-icon}
|
|
:::
|
|
**Roofline Analysis**
|
|
Compute vs. memory bottleneck identification using the Iron Law. Single-node latency and throughput.
|
|
:::
|
|
|
|
::: {.im-solver-card .im-solver-distributed}
|
|
::: {.im-solver-icon}
|
|
:::
|
|
**3D Parallelism**
|
|
Data, tensor, and pipeline parallel scaling efficiency. Ring all-reduce and pipeline bubble overhead.
|
|
:::
|
|
|
|
::: {.im-solver-card .im-solver-serving}
|
|
::: {.im-solver-icon}
|
|
:::
|
|
**LLM Serving**
|
|
Time-to-first-token (TTFT), inter-token latency (ITL), and KV-cache memory pressure.
|
|
:::
|
|
|
|
::: {.im-solver-card .im-solver-data}
|
|
::: {.im-solver-icon}
|
|
:::
|
|
**Data Pipeline**
|
|
Data Wall analysis. Throughput bottlenecks between storage hierarchy, IO interconnects, and compute.
|
|
:::
|
|
|
|
::: {.im-solver-card .im-solver-scaling}
|
|
::: {.im-solver-icon}
|
|
:::
|
|
**Scaling Physics**
|
|
Chinchilla scaling laws. Compute-optimal model size ($P$) and token count ($D$) for any budget.
|
|
:::
|
|
|
|
::: {.im-solver-card .im-solver-orchestration}
|
|
::: {.im-solver-icon}
|
|
:::
|
|
**Cluster Orchestration**
|
|
Wait Wall analysis. Little's Law and queueing theory for researcher wait times and cluster utilization.
|
|
:::
|
|
|
|
::: {.im-solver-card .im-solver-compression}
|
|
::: {.im-solver-icon}
|
|
:::
|
|
**Model Compression**
|
|
Accuracy vs. Efficiency trade-offs. Predict memory savings and accuracy impact of quantization and pruning.
|
|
:::
|
|
|
|
::: {.im-solver-card .im-solver-tco}
|
|
::: {.im-solver-icon}
|
|
:::
|
|
**Total Cost of Ownership**
|
|
CapEx, OpEx, electricity, maintenance, and per-query economics over any time horizon.
|
|
:::
|
|
|
|
::: {.im-solver-card .im-solver-sustain}
|
|
::: {.im-solver-icon}
|
|
:::
|
|
**Sustainability**
|
|
Energy, carbon footprint (kg CO₂e), and water usage across datacenter regions.
|
|
:::
|
|
|
|
::: {.im-solver-card .im-solver-reliability}
|
|
::: {.im-solver-icon}
|
|
:::
|
|
**Reliability**
|
|
Fleet MTBF, failure probability, and Young-Daly optimal checkpoint interval.
|
|
:::
|
|
|
|
:::
|
|
|
|
:::
|
|
|
|
|
|
<!-- TUTORIALS -->
|
|
::: {.im-section}
|
|
|
|
::: {.im-section-header}
|
|
### Learn by doing
|
|
:::
|
|
|
|
::: {.im-tutorial-grid}
|
|
|
|
::: {.im-tutorial-card}
|
|
[Beginner]{.im-tutorial-badge .im-badge-beginner}
|
|
|
|
#### [Hello World](tutorials/hello_world.qmd)
|
|
Memory-bound vs. compute-bound in 5 lines of Python. Sweep batch sizes and see the roofline crossover.
|
|
:::
|
|
|
|
::: {.im-tutorial-card}
|
|
[Intermediate]{.im-tutorial-badge .im-badge-intermediate}
|
|
|
|
#### [LLM Serving](tutorials/llm_serving.qmd)
|
|
Model the two phases of autoregressive generation (pre-fill and decode) and diagnose KV-cache pressure.
|
|
:::
|
|
|
|
::: {.im-tutorial-card}
|
|
[Intermediate]{.im-tutorial-badge .im-badge-intermediate}
|
|
|
|
#### [Distributed Training](tutorials/distributed.qmd)
|
|
Ring all-reduce communication, pipeline bubbles, and scaling efficiency on 256 GPUs.
|
|
:::
|
|
|
|
::: {.im-tutorial-card}
|
|
[Advanced]{.im-tutorial-badge .im-badge-advanced}
|
|
|
|
#### [Sustainability Lab](tutorials/sustainability.qmd)
|
|
Same model, same GPU, yet up to 41x difference in carbon footprint depending on where you train.
|
|
:::
|
|
|
|
:::
|
|
|
|
:::
|
|
|
|
|
|
<!-- WHO USES IT -->
|
|
::: {.im-section}
|
|
|
|
::: {.im-section-header}
|
|
### Built for
|
|
:::
|
|
|
|
::: {.im-audience}
|
|
|
|
::: {.im-audience-item .im-aud-student}
|
|
[**Students**](for-students.qmd)
|
|
|
|
Build intuition for *why* ML systems behave as they do. Run roofline analysis, see the memory wall, compute carbon footprints — all without needing GPU hardware. [See learning path →](for-students.qmd)
|
|
:::
|
|
|
|
::: {.im-audience-item .im-aud-instructor}
|
|
[**Instructors**](for-instructors.qmd)
|
|
|
|
Assign analytically grounded problem sets with deterministic, reproducible outputs. All specs sourced from vetted datasheets. [See course integration →](for-instructors.qmd)
|
|
:::
|
|
|
|
::: {.im-audience-item .im-aud-engineer}
|
|
[**Engineers & Researchers**](for-engineers.qmd)
|
|
|
|
Pre-deployment estimates for any architecture. Model distributed overheads, LLM serving latency, and multi-region sustainability before provisioning hardware. [See quick API guide →](for-engineers.qmd)
|
|
:::
|
|
|
|
:::
|
|
|
|
:::
|
|
|
|
|
|
<!-- CITATION -->
|
|
::: {.im-section .im-section-last}
|
|
|
|
::: {.im-section-header}
|
|
### Citation
|
|
:::
|
|
|
|
If you use MLSys·im in coursework or research, please cite:
|
|
|
|
```bibtex
|
|
@article{mlsysim2025,
|
|
title = {{MLSys·im}: First-Principles Infrastructure Modeling
|
|
for Machine Learning Systems},
|
|
author = {Reddi, Vijay Janapa},
|
|
year = {2025},
|
|
institution = {Harvard University},
|
|
url = {https://mlsysbook.ai/mlsysim}
|
|
}
|
|
```
|
|
|
|
:::
|
|
|
|
:::
|