mirror of
https://github.com/harvard-edge/cs249r_book.git
synced 2026-05-02 02:29:16 -05:00
feat(mlsysim): add documentation site, typed registries, and 6-solver core
Complete MLSYSIM v0.1.0 implementation with: - Documentation website (Quarto): landing page with animated hero and capability carousel, 4 tutorials (hello world, LLM serving, distributed training, sustainability), hardware/model/fleet/infra catalogs, solver guide, whitepaper, math foundations, glossary, and full quartodoc API reference - Typed registry system: Hardware (18 devices across 5 tiers), Models (15 workloads), Systems (fleets, clusters, fabrics), Infrastructure (grid profiles, rack configs, datacenters) - Core types: Pint-backed Quantity, Metadata provenance tracking, custom exception hierarchy (OOMError, SLAViolation) - SimulationConfig with YAML/JSON loading and pre-validation - Scenario system tying workloads to systems with SLA constraints - Multi-level evaluation scorecard (feasibility, performance, macro) - Examples, tests, and Jetson Orin NX spec fix (100 → 25 TFLOP/s) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
52
mlsysim/docs/api/core.solver.DistributedSolver.qmd
Normal file
52
mlsysim/docs/api/core.solver.DistributedSolver.qmd
Normal file
@@ -0,0 +1,52 @@
|
||||
# core.solver.DistributedSolver { #mlsysim.core.solver.DistributedSolver }
|
||||
|
||||
```python
|
||||
core.solver.DistributedSolver()
|
||||
```
|
||||
|
||||
Resolves fleet-wide communication, synchronization, and pipelining constraints.
|
||||
Supports 3D Parallelism (DP, TP, PP) and Network Bisection/Oversubscription.
|
||||
|
||||
## Methods
|
||||
|
||||
| Name | Description |
|
||||
| --- | --- |
|
||||
| [solve](#mlsysim.core.solver.DistributedSolver.solve) | Calculates distributed training performance using the 3D Parallelism model. |
|
||||
|
||||
### solve { #mlsysim.core.solver.DistributedSolver.solve }
|
||||
|
||||
```python
|
||||
core.solver.DistributedSolver.solve(
|
||||
model,
|
||||
fleet,
|
||||
batch_size=1,
|
||||
precision='fp16',
|
||||
efficiency=0.5,
|
||||
tp_size=1,
|
||||
pp_size=1,
|
||||
microbatch_count=1,
|
||||
topology_override=None,
|
||||
)
|
||||
```
|
||||
|
||||
Calculates distributed training performance using the 3D Parallelism model.
|
||||
|
||||
#### Parameters {.doc-section .doc-section-parameters}
|
||||
|
||||
| Name | Type | Description | Default |
|
||||
|-------------------|----------|------------------------------------------------------|------------|
|
||||
| model | Workload | The model architecture to simulate. | _required_ |
|
||||
| fleet | Fleet | The hardware cluster and network topology. | _required_ |
|
||||
| batch_size | int | Global batch size. | `1` |
|
||||
| precision | str | Numerical precision (fp16, fp32, int8). | `'fp16'` |
|
||||
| efficiency | float | Achieved compute efficiency (0.0 to 1.0). | `0.5` |
|
||||
| tp_size | int | Tensor Parallelism degree (usually intra-node). | `1` |
|
||||
| pp_size | int | Pipeline Parallelism degree (cross-node stages). | `1` |
|
||||
| microbatch_count | int | Number of microbatches for pipeline parallelism (M). | `1` |
|
||||
| topology_override | str | Force a specific topology (ring, tree). | `None` |
|
||||
|
||||
#### Returns {.doc-section .doc-section-returns}
|
||||
|
||||
| Name | Type | Description |
|
||||
|--------|------------------|--------------------------------------------------------------------------------|
|
||||
| | Dict\[str, Any\] | Performance metrics including scaling efficiency and pipeline bubble fraction. |
|
||||
Reference in New Issue
Block a user