mirror of
https://github.com/harvard-edge/cs249r_book.git
synced 2026-05-07 18:18:42 -05:00
The solver.py refactoring renamed most solver classes from *Solver to *Model (e.g. DistributedSolver → DistributedModel). The docs still referenced the old names, causing the Quarto site build to fail with: ImportError: cannot import name 'DistributedSolver' from 'mlsysim' - Fix executable code cells in tutorials/distributed.qmd - Update non-executable code examples across 10 doc files - Rename 19 API reference files from *Solver.qmd to *Model.qmd - SensitivitySolver and SynthesisSolver retain their names (correct) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
69 lines
4.1 KiB
Plaintext
69 lines
4.1 KiB
Plaintext
# core.solver.DistributedModel { #mlsysim.core.solver.DistributedModel }
|
|
|
|
```python
|
|
core.solver.DistributedModel()
|
|
```
|
|
|
|
Resolves fleet-wide communication, synchronization, and pipelining constraints.
|
|
|
|
This solver models the constraints of distributed scale for distributed training. It
|
|
decomposes a workload across a cluster using 3D Parallelism (DP, TP, PP)
|
|
and calculates the resulting communication overheads and idle times
|
|
(bubbles) that determine the Model FLOPs Utilization (MFU).
|
|
|
|
Literature Source:
|
|
1. Shoeybi et al. (2019), "Megatron-LM: Training Multi-Billion Parameter
|
|
Language Models Using Model Parallelism." (3D Parallelism Framework)
|
|
2. Narayanan et al. (2019), "PipePipe: Efficient Pipeline Parallelism for
|
|
Training Large Models." (1F1B Pipeline Bubble Model)
|
|
3. Patarasuk & Mueller (2009), "Bandwidth-Optimal All-Reduce Algorithms
|
|
for Clusters of Workstations." (Ring All-Reduce)
|
|
|
|
## Methods
|
|
|
|
| Name | Description |
|
|
| --- | --- |
|
|
| [solve](#mlsysim.core.solver.DistributedModel.solve) | Calculates distributed training performance using the 3D/4D Parallelism model. |
|
|
|
|
### solve { #mlsysim.core.solver.DistributedModel.solve }
|
|
|
|
```python
|
|
core.solver.DistributedModel.solve(
|
|
model,
|
|
fleet,
|
|
batch_size=1,
|
|
precision='fp16',
|
|
efficiency=0.5,
|
|
tp_size=1,
|
|
pp_size=1,
|
|
ep_size=1,
|
|
v_stages=1,
|
|
microbatch_count=1,
|
|
topology_override=None,
|
|
)
|
|
```
|
|
|
|
Calculates distributed training performance using the 3D/4D Parallelism model.
|
|
|
|
#### Parameters {.doc-section .doc-section-parameters}
|
|
|
|
| Name | Type | Description | Default |
|
|
|-------------------|----------|------------------------------------------------------------------------------------------------------------------------------|------------|
|
|
| model | Workload | The model architecture to simulate. | _required_ |
|
|
| fleet | Fleet | The hardware cluster and network topology. | _required_ |
|
|
| batch_size | int | Global batch size. | `1` |
|
|
| precision | str | Numerical precision (fp16, fp32, int8). | `'fp16'` |
|
|
| efficiency | float | Achieved compute efficiency (0.0 to 1.0). | `0.5` |
|
|
| tp_size | int | Tensor Parallelism degree. Splits individual layers across GPUs, usually within a single node over high-speed NVLink. | `1` |
|
|
| pp_size | int | Pipeline Parallelism degree. Chains model layers across multiple nodes, introducing 'pipeline bubbles' while saving memory. | `1` |
|
|
| ep_size | int | Expert Parallelism degree for MoE models. Introduces All-to-All communication overhead across nodes. | `1` |
|
|
| v_stages | int | Number of virtual stages for interleaved pipeline schedules. | `1` |
|
|
| microbatch_count | int | Number of microbatches (M). Increasing M reduces the pipeline bubble but increases synchronization overhead. | `1` |
|
|
| topology_override | str | Force a specific topology (ring, tree). | `None` |
|
|
|
|
#### Returns {.doc-section .doc-section-returns}
|
|
|
|
| Name | Type | Description |
|
|
|--------|------------------|-----------------------------------------------------------------------------------------------------|
|
|
| | Dict\[str, Any\] | Metrics including DP/TP/EP latency, the Pipeline Bubble penalty, and the final Scaling Efficiency. |
|