mirror of
https://github.com/harvard-edge/cs249r_book.git
synced 2026-05-08 02:28:25 -05:00
The solver.py refactoring renamed most solver classes from *Solver to *Model (e.g. DistributedSolver → DistributedModel). The docs still referenced the old names, causing the Quarto site build to fail with: ImportError: cannot import name 'DistributedSolver' from 'mlsysim' - Fix executable code cells in tutorials/distributed.qmd - Update non-executable code examples across 10 doc files - Rename 19 API reference files from *Solver.qmd to *Model.qmd - SensitivitySolver and SynthesisSolver retain their names (correct) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
46 lines
1.4 KiB
Plaintext
46 lines
1.4 KiB
Plaintext
# core.solver.TailLatencyModel { #mlsysim.core.solver.TailLatencyModel }
|
|
|
|
```python
|
|
core.solver.TailLatencyModel()
|
|
```
|
|
|
|
Analyzes queueing delays and P99 tail latency for deployed inference models.
|
|
|
|
Models inference servers as M/M/c queues to determine if the deployment
|
|
can sustain the target arrival rate while meeting strict SLA latency bounds.
|
|
|
|
Literature Source:
|
|
1. Dean & Barroso (2013), "The Tail at Scale."
|
|
|
|
## Methods
|
|
|
|
| Name | Description |
|
|
| --- | --- |
|
|
| [solve](#mlsysim.core.solver.TailLatencyModel.solve) | Solves for P50 and P99 tail latencies under variable load. |
|
|
|
|
### solve { #mlsysim.core.solver.TailLatencyModel.solve }
|
|
|
|
```python
|
|
core.solver.TailLatencyModel.solve(
|
|
arrival_rate_qps,
|
|
service_latency_ms,
|
|
num_replicas=1,
|
|
)
|
|
```
|
|
|
|
Solves for P50 and P99 tail latencies under variable load.
|
|
|
|
#### Parameters {.doc-section .doc-section-parameters}
|
|
|
|
| Name | Type | Description | Default |
|
|
|------|------|-------------|---------|
|
|
| arrival_rate_qps | float | Request arrival rate in queries per second. | _required_ |
|
|
| service_latency_ms | float | Average service latency per request in milliseconds. | _required_ |
|
|
| num_replicas | int | Number of inference replicas (servers). | `1` |
|
|
|
|
#### Returns {.doc-section .doc-section-returns}
|
|
|
|
| Name | Type | Description |
|
|
|------|------|-------------|
|
|
| | TailLatencyResult | P50 latency, P99 latency, queue utilization, stability flag, and SLO violation probability. |
|