cs249r_book/mlsysim/docs/api/core.solver.DistributedSolver.qmd

# core.solver.DistributedModel { #mlsysim.core.solver.DistributedModel }

```python
core.solver.DistributedModel()
```

Resolves fleet-wide communication, synchronization, and pipelining constraints.

This solver models the constraints of distributed scale for distributed training. It
decomposes a workload across a cluster using 3D Parallelism (DP, TP, PP)
and calculates the resulting communication overheads and idle times
(bubbles) that determine the Model FLOPs Utilization (MFU).

Literature Source:
1. Shoeybi et al. (2019), "Megatron-LM: Training Multi-Billion Parameter
   Language Models Using Model Parallelism." (3D Parallelism Framework)
2. Narayanan et al. (2019), "PipePipe: Efficient Pipeline Parallelism for
   Training Large Models." (1F1B Pipeline Bubble Model)
3. Patarasuk & Mueller (2009), "Bandwidth-Optimal All-Reduce Algorithms
   for Clusters of Workstations." (Ring All-Reduce)

## Methods

| Name | Description |
| --- | --- |
| [solve](#mlsysim.core.solver.DistributedModel.solve) | Calculates distributed training performance using the 3D/4D Parallelism model. |

### solve { #mlsysim.core.solver.DistributedModel.solve }

```python
core.solver.DistributedModel.solve(
    model,
    fleet,
    batch_size=1,
    precision='fp16',
    efficiency=0.5,
    tp_size=1,
    pp_size=1,
    ep_size=1,
    v_stages=1,
    microbatch_count=1,
    topology_override=None,
)
```

Calculates distributed training performance using the 3D/4D Parallelism model.

#### Parameters {.doc-section .doc-section-parameters}

| Name              | Type     | Description                                                                                                                  | Default    |
|-------------------|----------|------------------------------------------------------------------------------------------------------------------------------|------------|
| model             | Workload | The model architecture to simulate.                                                                                          | _required_ |
| fleet             | Fleet    | The hardware cluster and network topology.                                                                                   | _required_ |
| batch_size        | int      | Global batch size.                                                                                                           | `1`        |
| precision         | str      | Numerical precision (fp16, fp32, int8).                                                                                      | `'fp16'`   |
| efficiency        | float    | Achieved compute efficiency (0.0 to 1.0).                                                                                    | `0.5`      |
| tp_size           | int      | Tensor Parallelism degree. Splits individual layers across GPUs,  usually within a single node over high-speed NVLink.       | `1`        |
| pp_size           | int      | Pipeline Parallelism degree. Chains model layers across multiple  nodes, introducing 'pipeline bubbles' while saving memory. | `1`        |
| ep_size           | int      | Expert Parallelism degree for MoE models. Introduces All-to-All communication overhead across nodes.                         | `1`        |
| v_stages          | int      | Number of virtual stages for interleaved pipeline schedules.                                                                 | `1`        |
| microbatch_count  | int      | Number of microbatches (M). Increasing M reduces the pipeline  bubble but increases synchronization overhead.                | `1`        |
| topology_override | str      | Force a specific topology (ring, tree).                                                                                      | `None`     |

#### Returns {.doc-section .doc-section-returns}

| Name   | Type             | Description                                                                                         |
|--------|------------------|-----------------------------------------------------------------------------------------------------|
|        | Dict\[str, Any\] | Metrics including DP/TP/EP latency, the Pipeline Bubble penalty,  and the final Scaling Efficiency. |