mirror of https://github.com/harvard-edge/cs249r_book.git synced 2026-03-11 17:49:25 -05:00

Files

Vijay Janapa Reddi a78f1bd8b0 feat(mlsysim): add documentation site, typed registries, and 6-solver core

Complete MLSYSIM v0.1.0 implementation with:

- Documentation website (Quarto): landing page with animated hero
  and capability carousel, 4 tutorials (hello world, LLM serving,
  distributed training, sustainability), hardware/model/fleet/infra
  catalogs, solver guide, whitepaper, math foundations, glossary,
  and full quartodoc API reference
- Typed registry system: Hardware (18 devices across 5 tiers),
  Models (15 workloads), Systems (fleets, clusters, fabrics),
  Infrastructure (grid profiles, rack configs, datacenters)
- Core types: Pint-backed Quantity, Metadata provenance tracking,
  custom exception hierarchy (OOMError, SLAViolation)
- SimulationConfig with YAML/JSON loading and pre-validation
- Scenario system tying workloads to systems with SLA constraints
- Multi-level evaluation scorecard (feasibility, performance, macro)
- Examples, tests, and Jetson Orin NX spec fix (100 → 25 TFLOP/s)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-03-07 15:59:51 -05:00

3.3 KiB

Raw Blame History

mlsysim: Volume 2 "Farm to Scale" Plan

This document tracks the systematic build-out of the advanced features for Volume 2 of the Machine Learning Systems textbook.

📅 Roadmap Overview

Feature	Status	Priority	Goal
LLM Serving & KV-Cache	✅	P0	Model TTFT, ITL, and memory footprint of LLM inference.
3D Parallelism Solver	✅	P1	Model TP/PP bubbles for massive Frontier-scale training.
Network Bisection & Oversubscription	✅	P1	Model congestion in non-blocking and oversubscribed fabrics.
Concrete Hardware Registry	✅	P1	Replace generics with real-world devices (iPhone, H200, etc).
Empirical Validation Suite	⬜	P2	Build `test_empirical.py` against MLPerf benchmarks.
Tail Latency Physics	⬜	P2	Calculate P99/P99.9 using queueing theory.

✅ Systematic Execution Log

2025-03-06: Infrastructure Foundations Complete

Completed the refactor to the 5-layer Pydantic stack (Layers A-E).
Implemented the baseline DistributedSolver and EconomicsSolver.
Fixed the generate_appendix.py to correctly extract data from the new registry.
Verified that all Volume 1 & 2 book invariants hold after the structural refactor.

2025-03-06: LLM Serving & KV-Cache [COMPLETED]

Implemented ServingSolver in mlsysim.core.solver supporting Pre-fill and Decoding phases.
Added heads, kv_heads, and hidden_dim to TransformerWorkload.
Implemented get_kv_cache_size method for dynamic memory calculation.
Verified against Llama-3-70B on H100 (detected infeasibility for single-node FP16).

2025-03-06: 3D Parallelism & Network Congestion [COMPLETED]

Upgraded DistributedSolver to support Tensor Parallelism (TP) and Pipeline Parallelism (PP).
Implemented the Pipeline Bubble formula ((P-1)/(P-1+M)).
Added oversubscription_ratio to NetworkFabric and integrated it into communication math.
Added comprehensive NumPy-style docstrings to all solvers in mlsysim.core.solver.
Verified against a Frontier-8K H100 cluster scenario.

2025-03-06: Concrete Hardware & Narrative Scenarios [COMPLETED]

Replaced generic placeholders with 15+ real-world devices including iPhone 15 Pro, MacBook M3 Max, and NVIDIA H200.
Implemented the Lighthouse Archetype scenarios (Doorbell, AV, Frontier) with built-in SLA validation.
Created the Hierarchy of Constraints SystemEvaluation scorecard.
Established Engineering & Modeling Best Practices in BEST_PRACTICES.md.
Created Hello World and Manual Sweep tutorials for students.

🛠 Feature Specs

[P0] LLM Serving & KV-Cache

Input: model: TransformerWorkload, hardware: HardwareNode, seq_len: int, batch_size: int.
Output: latency_prefill, latency_decoding, total_kv_cache_gb, feasible_on_hardware.
Validation: Must match vLLM benchmark results for Llama-3-70B on H100 (within 10%).

🛡 Verification Standard ("No Hallucination")

Unit Tests: Every feature must have a corresponding test in mlsysim/tests/.
Empirical Anchor: Formulas must be cited from standard industry papers (e.g., "The Case for PagedAttention").
Dimensional Integrity: pint must resolve all results to correct SI units.

3.3 KiB Raw Blame History