Files
cs249r_book/mlsysim/docs/api/core.solver.DistributedModel.qmd
Vijay Janapa Reddi 611de228d9 fix(mlsysim): align docs with *Model naming convention
The solver.py refactoring renamed most solver classes from *Solver to
*Model (e.g. DistributedSolver → DistributedModel). The docs still
referenced the old names, causing the Quarto site build to fail with:
  ImportError: cannot import name 'DistributedSolver' from 'mlsysim'

- Fix executable code cells in tutorials/distributed.qmd
- Update non-executable code examples across 10 doc files
- Rename 19 API reference files from *Solver.qmd to *Model.qmd
- SensitivitySolver and SynthesisSolver retain their names (correct)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 08:39:11 -04:00

69 lines
4.1 KiB
Plaintext

# core.solver.DistributedModel { #mlsysim.core.solver.DistributedModel }
```python
core.solver.DistributedModel()
```
Resolves fleet-wide communication, synchronization, and pipelining constraints.
This solver models the constraints of distributed scale for distributed training. It
decomposes a workload across a cluster using 3D Parallelism (DP, TP, PP)
and calculates the resulting communication overheads and idle times
(bubbles) that determine the Model FLOPs Utilization (MFU).
Literature Source:
1. Shoeybi et al. (2019), "Megatron-LM: Training Multi-Billion Parameter
Language Models Using Model Parallelism." (3D Parallelism Framework)
2. Narayanan et al. (2019), "PipePipe: Efficient Pipeline Parallelism for
Training Large Models." (1F1B Pipeline Bubble Model)
3. Patarasuk & Mueller (2009), "Bandwidth-Optimal All-Reduce Algorithms
for Clusters of Workstations." (Ring All-Reduce)
## Methods
| Name | Description |
| --- | --- |
| [solve](#mlsysim.core.solver.DistributedModel.solve) | Calculates distributed training performance using the 3D/4D Parallelism model. |
### solve { #mlsysim.core.solver.DistributedModel.solve }
```python
core.solver.DistributedModel.solve(
model,
fleet,
batch_size=1,
precision='fp16',
efficiency=0.5,
tp_size=1,
pp_size=1,
ep_size=1,
v_stages=1,
microbatch_count=1,
topology_override=None,
)
```
Calculates distributed training performance using the 3D/4D Parallelism model.
#### Parameters {.doc-section .doc-section-parameters}
| Name | Type | Description | Default |
|-------------------|----------|------------------------------------------------------------------------------------------------------------------------------|------------|
| model | Workload | The model architecture to simulate. | _required_ |
| fleet | Fleet | The hardware cluster and network topology. | _required_ |
| batch_size | int | Global batch size. | `1` |
| precision | str | Numerical precision (fp16, fp32, int8). | `'fp16'` |
| efficiency | float | Achieved compute efficiency (0.0 to 1.0). | `0.5` |
| tp_size | int | Tensor Parallelism degree. Splits individual layers across GPUs, usually within a single node over high-speed NVLink. | `1` |
| pp_size | int | Pipeline Parallelism degree. Chains model layers across multiple nodes, introducing 'pipeline bubbles' while saving memory. | `1` |
| ep_size | int | Expert Parallelism degree for MoE models. Introduces All-to-All communication overhead across nodes. | `1` |
| v_stages | int | Number of virtual stages for interleaved pipeline schedules. | `1` |
| microbatch_count | int | Number of microbatches (M). Increasing M reduces the pipeline bubble but increases synchronization overhead. | `1` |
| topology_override | str | Force a specific topology (ring, tree). | `None` |
#### Returns {.doc-section .doc-section-returns}
| Name | Type | Description |
|--------|------------------|-----------------------------------------------------------------------------------------------------|
| | Dict\[str, Any\] | Metrics including DP/TP/EP latency, the Pipeline Bubble penalty, and the final Scaling Efficiency. |