feat(mlsysim): add documentation site, typed registries, and 6-solver core

Complete MLSYSIM v0.1.0 implementation with: - Documentation website (Quarto): landing page with animated hero and capability carousel, 4 tutorials (hello world, LLM serving, distributed training, sustainability), hardware/model/fleet/infra catalogs, solver guide, whitepaper, math foundations, glossary, and full quartodoc API reference - Typed registry system: Hardware (18 devices across 5 tiers), Models (15 workloads), Systems (fleets, clusters, fabrics), Infrastructure (grid profiles, rack configs, datacenters) - Core types: Pint-backed Quantity, Metadata provenance tracking, custom exception hierarchy (OOMError, SLAViolation) - SimulationConfig with YAML/JSON loading and pre-validation - Scenario system tying workloads to systems with SLA constraints - Multi-level evaluation scorecard (feasibility, performance, macro) - Examples, tests, and Jetson Orin NX spec fix (100 → 25 TFLOP/s) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-05-02 02:29:16 -05:00 · 2026-03-07 15:59:51 -05:00
parent 3a6e5c5ef6
commit a78f1bd8b0
116 changed files with 10319 additions and 606 deletions
--- a/mlsysim/docs/api/core.solver.DistributedSolver.qmd
+++ b/mlsysim/docs/api/core.solver.DistributedSolver.qmd
@@ -0,0 +1,52 @@
+# core.solver.DistributedSolver { #mlsysim.core.solver.DistributedSolver }
+
+```python
+core.solver.DistributedSolver()
+```
+
+Resolves fleet-wide communication, synchronization, and pipelining constraints.
+Supports 3D Parallelism (DP, TP, PP) and Network Bisection/Oversubscription.
+
+## Methods
+
+| Name | Description |
+| --- | --- |
+| [solve](#mlsysim.core.solver.DistributedSolver.solve) | Calculates distributed training performance using the 3D Parallelism model. |
+
+### solve { #mlsysim.core.solver.DistributedSolver.solve }
+
+```python
+core.solver.DistributedSolver.solve(
+    model,
+    fleet,
+    batch_size=1,
+    precision='fp16',
+    efficiency=0.5,
+    tp_size=1,
+    pp_size=1,
+    microbatch_count=1,
+    topology_override=None,
+)
+```
+
+Calculates distributed training performance using the 3D Parallelism model.
+
+#### Parameters {.doc-section .doc-section-parameters}
+
+| Name              | Type     | Description                                          | Default    |
+|-------------------|----------|------------------------------------------------------|------------|
+| model             | Workload | The model architecture to simulate.                  | _required_ |
+| fleet             | Fleet    | The hardware cluster and network topology.           | _required_ |
+| batch_size        | int      | Global batch size.                                   | `1`        |
+| precision         | str      | Numerical precision (fp16, fp32, int8).              | `'fp16'`   |
+| efficiency        | float    | Achieved compute efficiency (0.0 to 1.0).            | `0.5`      |
+| tp_size           | int      | Tensor Parallelism degree (usually intra-node).      | `1`        |
+| pp_size           | int      | Pipeline Parallelism degree (cross-node stages).     | `1`        |
+| microbatch_count  | int      | Number of microbatches for pipeline parallelism (M). | `1`        |
+| topology_override | str      | Force a specific topology (ring, tree).              | `None`     |
+
+#### Returns {.doc-section .doc-section-returns}
+
+| Name   | Type             | Description                                                                    |
+|--------|------------------|--------------------------------------------------------------------------------|
+|        | Dict\[str, Any\] | Performance metrics including scaling efficiency and pipeline bubble fraction. |