cs249r_book/mlsysim/docs/api/core.solver.ScalingModel.qmd

# core.solver.ScalingModel { #mlsysim.core.solver.ScalingModel }

```python
core.solver.ScalingModel()
```

Analyzes the 'Scaling Physics' of model training (Chinchilla Laws).

This solver determines the optimal model size (P) and dataset size (D)
given a compute budget (C), following the compute-optimal training
regime where D ≈ 20P.

Literature Source:
1. Hoffmann et al. (2022), "Training Compute-Optimal Large Language Models."
2. Kaplan et al. (2020), "Scaling Laws for Neural Language Models."
3. McCandlish et al. (2018), "An Empirical Model of Large-Batch Training."

## Methods

| Name | Description |
| --- | --- |
| [solve](#mlsysim.core.solver.ScalingModel.solve) | Solves for compute-optimal model and dataset parameters. |

### solve { #mlsysim.core.solver.ScalingModel.solve }

```python
core.solver.ScalingModel.solve(compute_budget, target_model_size=None)
```

Solves for compute-optimal model and dataset parameters.

#### Parameters {.doc-section .doc-section-parameters}

| Name              | Type     | Description                                                               | Default    |
|-------------------|----------|---------------------------------------------------------------------------|------------|
| compute_budget    | Quantity | Total training budget (e.g., in TFLOPs or H100-GPU-days).                 | _required_ |
| target_model_size | Quantity | If provided, calculates the required tokens for this specific model size. | `None`     |

#### Returns {.doc-section .doc-section-returns}

| Name   | Type             | Description                                                       |
|--------|------------------|-------------------------------------------------------------------|
|        | Dict\[str, Any\] | Optimal parameters, token count, and training duration estimates. |