# core.solver.DistributedModel { #mlsysim.core.solver.DistributedModel } ```python core.solver.DistributedModel() ``` Resolves fleet-wide communication, synchronization, and pipelining constraints. This solver models the constraints of distributed scale for distributed training. It decomposes a workload across a cluster using 3D Parallelism (DP, TP, PP) and calculates the resulting communication overheads and idle times (bubbles) that determine the Model FLOPs Utilization (MFU). Literature Source: 1. Shoeybi et al. (2019), "Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism." (3D Parallelism Framework) 2. Narayanan et al. (2019), "PipePipe: Efficient Pipeline Parallelism for Training Large Models." (1F1B Pipeline Bubble Model) 3. Patarasuk & Mueller (2009), "Bandwidth-Optimal All-Reduce Algorithms for Clusters of Workstations." (Ring All-Reduce) ## Methods | Name | Description | | --- | --- | | [solve](#mlsysim.core.solver.DistributedModel.solve) | Calculates distributed training performance using the 3D/4D Parallelism model. | ### solve { #mlsysim.core.solver.DistributedModel.solve } ```python core.solver.DistributedModel.solve( model, fleet, batch_size=1, precision='fp16', efficiency=0.5, tp_size=1, pp_size=1, ep_size=1, v_stages=1, microbatch_count=1, topology_override=None, ) ``` Calculates distributed training performance using the 3D/4D Parallelism model. #### Parameters {.doc-section .doc-section-parameters} | Name | Type | Description | Default | |-------------------|----------|------------------------------------------------------------------------------------------------------------------------------|------------| | model | Workload | The model architecture to simulate. | _required_ | | fleet | Fleet | The hardware cluster and network topology. | _required_ | | batch_size | int | Global batch size. | `1` | | precision | str | Numerical precision (fp16, fp32, int8). | `'fp16'` | | efficiency | float | Achieved compute efficiency (0.0 to 1.0). | `0.5` | | tp_size | int | Tensor Parallelism degree. Splits individual layers across GPUs, usually within a single node over high-speed NVLink. | `1` | | pp_size | int | Pipeline Parallelism degree. Chains model layers across multiple nodes, introducing 'pipeline bubbles' while saving memory. | `1` | | ep_size | int | Expert Parallelism degree for MoE models. Introduces All-to-All communication overhead across nodes. | `1` | | v_stages | int | Number of virtual stages for interleaved pipeline schedules. | `1` | | microbatch_count | int | Number of microbatches (M). Increasing M reduces the pipeline bubble but increases synchronization overhead. | `1` | | topology_override | str | Force a specific topology (ring, tree). | `None` | #### Returns {.doc-section .doc-section-returns} | Name | Type | Description | |--------|------------------|-----------------------------------------------------------------------------------------------------| | | Dict\[str, Any\] | Metrics including DP/TP/EP latency, the Pipeline Bubble penalty, and the final Scaling Efficiency. |