Files
cs249r_book/mlsysim/README.md
Vijay Janapa Reddi a78f1bd8b0 feat(mlsysim): add documentation site, typed registries, and 6-solver core
Complete MLSYSIM v0.1.0 implementation with:

- Documentation website (Quarto): landing page with animated hero
  and capability carousel, 4 tutorials (hello world, LLM serving,
  distributed training, sustainability), hardware/model/fleet/infra
  catalogs, solver guide, whitepaper, math foundations, glossary,
  and full quartodoc API reference
- Typed registry system: Hardware (18 devices across 5 tiers),
  Models (15 workloads), Systems (fleets, clusters, fabrics),
  Infrastructure (grid profiles, rack configs, datacenters)
- Core types: Pint-backed Quantity, Metadata provenance tracking,
  custom exception hierarchy (OOMError, SLAViolation)
- SimulationConfig with YAML/JSON loading and pre-validation
- Scenario system tying workloads to systems with SLA constraints
- Multi-level evaluation scorecard (feasibility, performance, macro)
- Examples, tests, and Jetson Orin NX spec fix (100 → 25 TFLOP/s)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-07 15:59:51 -05:00

2.9 KiB

🚀 mlsysim: The ML Systems Modeling Platform

mlsysim is the high-performance, physics-grounded analytical simulator powering the Machine Learning Systems textbook ecosystem. It provides a unified "Single Source of Truth" (SSoT) for modeling systems from sub-watt microcontrollers to exaflop-scale global fleets.


🏗 The 5-Layer Analytical Stack

mlsysim implements a "Progressive Lowering" architecture, separating high-level workloads from the physical infrastructure that executes them.

Layer A: Workload Representation (mlsysim.models)

High-level model definitions (TransformerWorkload, CNNWorkload).

  • Math: FLOPs, parameter counts, and arithmetic intensity.
  • Key Models: Models.Llama3_70B, Models.GPT3, Models.ResNet50.

Layer B: Hardware Registry (mlsysim.hardware)

Precise, concrete specifications for real-world silicon.

  • Cloud: Hardware.H100, Hardware.H200, Hardware.MI300X, Hardware.TPUv5p.
  • Mobile/Workstation: Hardware.iPhone, Hardware.Snapdragon, Hardware.MacBookM3Max.
  • Edge/Tiny: Hardware.Jetson, Hardware.TeslaFSD, Hardware.ESP32, Hardware.Arduino.

Layer C: Infrastructure & Environment (mlsysim.infra)

Regional grid profiles and datacenter sustainability.

  • Math: PUE, Carbon Intensity (gCO2/kWh), WUE.
  • Grids: Infra.Quebec, Infra.Poland, Infra.US_Avg.

Layer D: Systems & Topology (mlsysim.systems)

Fleet configurations, network fabrics, and narrative scenarios.

  • Scenarios: Applications.Doorbell, Applications.AutoDrive, Applications.Frontier.

Layer E: Execution & Solvers (mlsysim.core.solver)

The physics-grounded solvers that resolve the hierarchy of constraints.

  • SingleNodeSolver: Roofline and Iron Law performance.
  • ServingSolver: LLM Pre-fill vs. Decoding and KV-Cache growth.
  • DistributedSolver: 3D Parallelism (TP/PP/DP) and Network Oversubscription.
  • SustainabilitySolver: Carbon Footprint and Water usage.

🚀 Quick Usage: The System Evaluation

The primary way to use mlsysim is through the Hierarchy of Constraints.

import mlsysim

# 1. Pick a Lighthouse Scenario
scenario = mlsysim.Applications.Doorbell

# 2. Run a Multi-Level Evaluation
evaluation = scenario.evaluate()

# 3. View the Scorecard
print(evaluation.scorecard())

Example Scorecard Output:

=== SYSTEM EVALUATION: Smart Doorbell ===
Level 1: Feasibility -> [PASS]
   Model fits in memory (0.5 MB / 0.5 MB)
Level 2: Performance -> [PASS]
   Latency: 105.00 ms (Target: 200 ms)
Level 3: Macro/Economics -> [PASS]
   Annual Carbon: 5.1 kg | TCO: $31,501

🛡 Stability & Integrity

Because this core powers a printed textbook, we enforce strict Invariant Verification. Every physical constant is traceable to a primary source (datasheet or paper), and dimensional integrity is enforced via pint.

🛠 Installation

pip install -e .