mirror of
https://github.com/harvard-edge/cs249r_book.git
synced 2026-05-22 22:33:28 -05:00
Complete MLSYSIM v0.1.0 implementation with: - Documentation website (Quarto): landing page with animated hero and capability carousel, 4 tutorials (hello world, LLM serving, distributed training, sustainability), hardware/model/fleet/infra catalogs, solver guide, whitepaper, math foundations, glossary, and full quartodoc API reference - Typed registry system: Hardware (18 devices across 5 tiers), Models (15 workloads), Systems (fleets, clusters, fabrics), Infrastructure (grid profiles, rack configs, datacenters) - Core types: Pint-backed Quantity, Metadata provenance tracking, custom exception hierarchy (OOMError, SLAViolation) - SimulationConfig with YAML/JSON loading and pre-validation - Scenario system tying workloads to systems with SLA constraints - Multi-level evaluation scorecard (feasibility, performance, macro) - Examples, tests, and Jetson Orin NX spec fix (100 → 25 TFLOP/s) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
26 lines
538 B
Plaintext
26 lines
538 B
Plaintext
# models.TransformerWorkload { #mlsysim.models.TransformerWorkload }
|
|
|
|
```python
|
|
models.TransformerWorkload()
|
|
```
|
|
|
|
|
|
|
|
## Methods
|
|
|
|
| Name | Description |
|
|
| --- | --- |
|
|
| [get_kv_cache_size](#mlsysim.models.TransformerWorkload.get_kv_cache_size) | Calculates memory footprint for the KV cache. |
|
|
|
|
### get_kv_cache_size { #mlsysim.models.TransformerWorkload.get_kv_cache_size }
|
|
|
|
```python
|
|
models.TransformerWorkload.get_kv_cache_size(
|
|
seq_len,
|
|
batch_size,
|
|
precision=BYTES_FP16,
|
|
)
|
|
```
|
|
|
|
Calculates memory footprint for the KV cache.
|