mirror of
https://github.com/harvard-edge/cs249r_book.git
synced 2026-05-07 18:18:42 -05:00
The solver.py refactoring renamed most solver classes from *Solver to *Model (e.g. DistributedSolver → DistributedModel). The docs still referenced the old names, causing the Quarto site build to fail with: ImportError: cannot import name 'DistributedSolver' from 'mlsysim' - Fix executable code cells in tutorials/distributed.qmd - Update non-executable code examples across 10 doc files - Rename 19 API reference files from *Solver.qmd to *Model.qmd - SensitivitySolver and SynthesisSolver retain their names (correct) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
44 lines
1.8 KiB
Plaintext
44 lines
1.8 KiB
Plaintext
# core.solver.ScalingModel { #mlsysim.core.solver.ScalingModel }
|
|
|
|
```python
|
|
core.solver.ScalingModel()
|
|
```
|
|
|
|
Analyzes the 'Scaling Physics' of model training (Chinchilla Laws).
|
|
|
|
This solver determines the optimal model size (P) and dataset size (D)
|
|
given a compute budget (C), following the compute-optimal training
|
|
regime where D ≈ 20P.
|
|
|
|
Literature Source:
|
|
1. Hoffmann et al. (2022), "Training Compute-Optimal Large Language Models."
|
|
2. Kaplan et al. (2020), "Scaling Laws for Neural Language Models."
|
|
3. McCandlish et al. (2018), "An Empirical Model of Large-Batch Training."
|
|
|
|
## Methods
|
|
|
|
| Name | Description |
|
|
| --- | --- |
|
|
| [solve](#mlsysim.core.solver.ScalingModel.solve) | Solves for compute-optimal model and dataset parameters. |
|
|
|
|
### solve { #mlsysim.core.solver.ScalingModel.solve }
|
|
|
|
```python
|
|
core.solver.ScalingModel.solve(compute_budget, target_model_size=None)
|
|
```
|
|
|
|
Solves for compute-optimal model and dataset parameters.
|
|
|
|
#### Parameters {.doc-section .doc-section-parameters}
|
|
|
|
| Name | Type | Description | Default |
|
|
|-------------------|----------|---------------------------------------------------------------------------|------------|
|
|
| compute_budget | Quantity | Total training budget (e.g., in TFLOPs or H100-GPU-days). | _required_ |
|
|
| target_model_size | Quantity | If provided, calculates the required tokens for this specific model size. | `None` |
|
|
|
|
#### Returns {.doc-section .doc-section-returns}
|
|
|
|
| Name | Type | Description |
|
|
|--------|------------------|-------------------------------------------------------------------|
|
|
| | Dict\[str, Any\] | Optimal parameters, token count, and training duration estimates. |
|