mirror of
https://github.com/harvard-edge/cs249r_book.git
synced 2026-05-07 18:18:42 -05:00
The solver.py refactoring renamed most solver classes from *Solver to *Model (e.g. DistributedSolver → DistributedModel). The docs still referenced the old names, causing the Quarto site build to fail with: ImportError: cannot import name 'DistributedSolver' from 'mlsysim' - Fix executable code cells in tutorials/distributed.qmd - Update non-executable code examples across 10 doc files - Rename 19 API reference files from *Solver.qmd to *Model.qmd - SensitivitySolver and SynthesisSolver retain their names (correct) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
49 lines
1.9 KiB
Plaintext
49 lines
1.9 KiB
Plaintext
# core.solver.OrchestrationModel { #mlsysim.core.solver.OrchestrationModel }
|
|
|
|
```python
|
|
core.solver.OrchestrationModel()
|
|
```
|
|
|
|
Analyzes Cluster Orchestration and Queueing (Little's Law).
|
|
|
|
This solver models the 'Wait Wall' in shared research clusters,
|
|
calculating job completion times and researcher wait times based on
|
|
cluster utilization and arrival rates.
|
|
|
|
Literature Source:
|
|
1. Little (1961), "A Proof for the Queuing Formula: L = λW."
|
|
2. Barroso et al. (2018), "The Datacenter as a Computer" (Cluster Mgmt).
|
|
3. Jeon et al. (2019), "Analysis of Large-Scale Multi-Tenant GPU Clusters."
|
|
|
|
## Methods
|
|
|
|
| Name | Description |
|
|
| --- | --- |
|
|
| [solve](#mlsysim.core.solver.OrchestrationModel.solve) | Solves for cluster wait times and utilization. |
|
|
|
|
### solve { #mlsysim.core.solver.OrchestrationModel.solve }
|
|
|
|
```python
|
|
core.solver.OrchestrationModel.solve(
|
|
fleet,
|
|
arrival_rate_jobs_per_day,
|
|
avg_job_duration_days,
|
|
)
|
|
```
|
|
|
|
Solves for cluster wait times and utilization.
|
|
|
|
#### Parameters {.doc-section .doc-section-parameters}
|
|
|
|
| Name | Type | Description | Default |
|
|
|---------------------------|--------|------------------------------------------------------------------|------------|
|
|
| fleet | Fleet | The hardware cluster configuration. | _required_ |
|
|
| arrival_rate_jobs_per_day | float | λ: Rate at which new training jobs are submitted. | _required_ |
|
|
| avg_job_duration_days | float | The average time a job takes to run if it has the whole cluster. | _required_ |
|
|
|
|
#### Returns {.doc-section .doc-section-returns}
|
|
|
|
| Name | Type | Description |
|
|
|--------|------------------|----------------------------------------------------|
|
|
| | Dict\[str, Any\] | Wait time, system length, and utilization metrics. |
|