mirror of
https://github.com/harvard-edge/cs249r_book.git
synced 2026-03-11 17:49:25 -05:00
Complete MLSYSIM v0.1.0 implementation with: - Documentation website (Quarto): landing page with animated hero and capability carousel, 4 tutorials (hello world, LLM serving, distributed training, sustainability), hardware/model/fleet/infra catalogs, solver guide, whitepaper, math foundations, glossary, and full quartodoc API reference - Typed registry system: Hardware (18 devices across 5 tiers), Models (15 workloads), Systems (fleets, clusters, fabrics), Infrastructure (grid profiles, rack configs, datacenters) - Core types: Pint-backed Quantity, Metadata provenance tracking, custom exception hierarchy (OOMError, SLAViolation) - SimulationConfig with YAML/JSON loading and pre-validation - Scenario system tying workloads to systems with SLA constraints - Multi-level evaluation scorecard (feasibility, performance, macro) - Examples, tests, and Jetson Orin NX spec fix (100 → 25 TFLOP/s) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
3.3 KiB
3.3 KiB
mlsysim: Volume 2 "Farm to Scale" Plan
This document tracks the systematic build-out of the advanced features for Volume 2 of the Machine Learning Systems textbook.
📅 Roadmap Overview
| Feature | Status | Priority | Goal |
|---|---|---|---|
| LLM Serving & KV-Cache | ✅ | P0 | Model TTFT, ITL, and memory footprint of LLM inference. |
| 3D Parallelism Solver | ✅ | P1 | Model TP/PP bubbles for massive Frontier-scale training. |
| Network Bisection & Oversubscription | ✅ | P1 | Model congestion in non-blocking and oversubscribed fabrics. |
| Concrete Hardware Registry | ✅ | P1 | Replace generics with real-world devices (iPhone, H200, etc). |
| Empirical Validation Suite | ⬜ | P2 | Build test_empirical.py against MLPerf benchmarks. |
| Tail Latency Physics | ⬜ | P2 | Calculate P99/P99.9 using queueing theory. |
✅ Systematic Execution Log
2025-03-06: Infrastructure Foundations Complete
- Completed the refactor to the 5-layer Pydantic stack (Layers A-E).
- Implemented the baseline
DistributedSolverandEconomicsSolver. - Fixed the
generate_appendix.pyto correctly extract data from the new registry. - Verified that all Volume 1 & 2 book invariants hold after the structural refactor.
2025-03-06: LLM Serving & KV-Cache [COMPLETED]
- Implemented
ServingSolverinmlsysim.core.solversupporting Pre-fill and Decoding phases. - Added
heads,kv_heads, andhidden_dimtoTransformerWorkload. - Implemented
get_kv_cache_sizemethod for dynamic memory calculation. - Verified against Llama-3-70B on H100 (detected infeasibility for single-node FP16).
2025-03-06: 3D Parallelism & Network Congestion [COMPLETED]
- Upgraded
DistributedSolverto support Tensor Parallelism (TP) and Pipeline Parallelism (PP). - Implemented the Pipeline Bubble formula (
(P-1)/(P-1+M)). - Added
oversubscription_ratiotoNetworkFabricand integrated it into communication math. - Added comprehensive NumPy-style docstrings to all solvers in
mlsysim.core.solver. - Verified against a Frontier-8K H100 cluster scenario.
2025-03-06: Concrete Hardware & Narrative Scenarios [COMPLETED]
- Replaced generic placeholders with 15+ real-world devices including iPhone 15 Pro, MacBook M3 Max, and NVIDIA H200.
- Implemented the Lighthouse Archetype scenarios (Doorbell, AV, Frontier) with built-in SLA validation.
- Created the Hierarchy of Constraints
SystemEvaluationscorecard. - Established Engineering & Modeling Best Practices in
BEST_PRACTICES.md. - Created Hello World and Manual Sweep tutorials for students.
🛠 Feature Specs
[P0] LLM Serving & KV-Cache
- Input:
model: TransformerWorkload,hardware: HardwareNode,seq_len: int,batch_size: int. - Output:
latency_prefill,latency_decoding,total_kv_cache_gb,feasible_on_hardware. - Validation: Must match vLLM benchmark results for Llama-3-70B on H100 (within 10%).
🛡 Verification Standard ("No Hallucination")
- Unit Tests: Every feature must have a corresponding test in
mlsysim/tests/. - Empirical Anchor: Formulas must be cited from standard industry papers (e.g., "The Case for PagedAttention").
- Dimensional Integrity:
pintmust resolve all results to correct SI units.