mirror of
https://github.com/harvard-edge/cs249r_book.git
synced 2026-03-09 07:15:51 -05:00
Complete MLSYSIM v0.1.0 implementation with: - Documentation website (Quarto): landing page with animated hero and capability carousel, 4 tutorials (hello world, LLM serving, distributed training, sustainability), hardware/model/fleet/infra catalogs, solver guide, whitepaper, math foundations, glossary, and full quartodoc API reference - Typed registry system: Hardware (18 devices across 5 tiers), Models (15 workloads), Systems (fleets, clusters, fabrics), Infrastructure (grid profiles, rack configs, datacenters) - Core types: Pint-backed Quantity, Metadata provenance tracking, custom exception hierarchy (OOMError, SLAViolation) - SimulationConfig with YAML/JSON loading and pre-validation - Scenario system tying workloads to systems with SLA constraints - Multi-level evaluation scorecard (feasibility, performance, macro) - Examples, tests, and Jetson Orin NX spec fix (100 → 25 TFLOP/s) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
3.1 KiB
3.1 KiB
mlsysim: The Architecture & Development Plan
Vision: The MIPS/SPIM for Machine Learning Systems
mlsysim is a first-order analytical simulator for AI infrastructure. Just as Hennessy and Patterson used the MIPS architecture and SPIM simulator to teach the physics of instruction pipelining, mlsysim teaches the physics of tensor movement, memory hierarchies, and distributed fleet dynamics.
1. Core Architecture (The 5-Layer Stack) - [COMPLETED]
- Layer A: Workload Representation: High-level model definitions.
- Layer B: Hardware Registry: Concrete specs for real-world devices (H100, iPhone, ESP32).
- Layer C: Infrastructure & Environment: Regional grids and PUE models.
- Layer D: Systems & Topology: Fleet configurations and narrative Scenarios.
- Layer E: Execution & Solvers: Pluggable solvers for Performance, Serving, and Economics.
2. Systematic Record of Execution
Phase 1: Core API & The Ontology [COMPLETED - 2025-03-06]
- Migrated from monolithic
coreto 5-layer Pydantic-powered structure. - Implemented
Quantitytypes with strict validation and JSON serialization.
Phase 2: Volume 2 "Farm to Scale" Core [COMPLETED - 2025-03-06]
- 3D Parallelism: Implemented
DistributedSolverwith TP/PP/DP and Pipeline Bubble math. - LLM Serving: Implemented
ServingSolverwith KV-Cache footprint and Pre-fill/Decode phases. - Network Physics: Added Oversubscription Ratios and Bisection BW logic.
- Narrative Scenarios: Implemented the "Lighthouse Archetypes" (Doorbell, AV, Frontier).
- Hierarchy of Constraints: Implemented
SystemEvaluationScorecard (Feasibility -> Performance -> Macro). - Concrete Registry: Replaced generic placeholders with 15+ real-world devices (iPhone 15, H200, MI300X, etc).
3. The "No Hallucination" Validation Standard
- Empirical Anchoring: Every solver validated against MLPerf, Megatron-LM, or published training logs.
- Dimensional Analysis: Every formula proven via
pintunit resolution. - Traceable Constants: Every constant in
core.constantscited to a specific datasheet or paper.
Phase 3: Empirical Validation & Documentation [IN PROGRESS - 2025-03-06]
- Deep Narrative Analysis: Completed 32-chapter audit. Integrated
plot_scorecard()into Volume 1 and "Memory Wall" case study into Volume 2. - Empirical Validation Suite: Build
tests/test_empirical.py. - Goal: Assert that simulator predictions match MLPerf results within 10%.
Phase 4: Tail Latency & Straggler Physics
- Scope: Probabilistic models for P99/P99.9 latencies in massive fleets.
Phase 5: Automated Documentation (Quartodoc)
- Scope: Generate the full API reference site directly from docstrings.
Phase 6: Live Sourcing & Freshness (Thinking Ahead)
- Goal: Move from hardcoded constants to a "Source-Anchored" registry.
- Action: Implement a
ProvenanceMapthat links physical constants to public dashboards (e.g., Electricity Maps, AWS Pricing API). - Outcome: A "Verified" badge next to every number in the documentation with a link to the primary source.