mirror of
https://github.com/harvard-edge/cs249r_book.git
synced 2026-05-23 07:23:03 -05:00
Replace 9 old tutorials with 12 new numbered tutorials (00-11) covering roofline through full-stack audit. Redesign landing page, add models-and-solvers and extending-the-engine guides. Add __main__.py, cli.py, and cli/ package for command-line interface.
28 lines
2.7 KiB
Plaintext
28 lines
2.7 KiB
Plaintext
# API Reference {.doc .doc-index}
|
|
|
|
## Core API
|
|
|
|
Primary objects and solvers.
|
|
|
|
| | |
|
|
| --- | --- |
|
|
| [hardware](hardware.qmd#mlsysim.hardware) | |
|
|
| [models](models.qmd#mlsysim.models) | |
|
|
| [infra](infra.qmd#mlsysim.infra) | |
|
|
| [systems](systems.qmd#mlsysim.systems) | |
|
|
| [core](core.qmd#mlsysim.core) | |
|
|
| [core.solver.SingleNodeModel](core.solver.SingleNodeModel.qmd#mlsysim.core.solver.SingleNodeModel) | Resolves single-node hardware Roofline bounds and feasibility. |
|
|
| [core.solver.ServingModel](core.solver.ServingModel.qmd#mlsysim.core.solver.ServingModel) | Analyzes the two-phase LLM serving lifecycle: Pre-fill vs. Decoding. |
|
|
| [core.solver.DistributedModel](core.solver.DistributedModel.qmd#mlsysim.core.solver.DistributedModel) | Resolves fleet-wide communication, synchronization, and pipelining constraints. |
|
|
| [core.solver.DataModel](core.solver.DataModel.qmd#mlsysim.core.solver.DataModel) | Analyzes the 'Data Wall' — the throughput bottleneck between storage and compute. |
|
|
| [core.solver.ScalingModel](core.solver.ScalingModel.qmd#mlsysim.core.solver.ScalingModel) | Analyzes the 'Scaling Physics' of model training (Chinchilla Laws). |
|
|
| [core.solver.OrchestrationModel](core.solver.OrchestrationModel.qmd#mlsysim.core.solver.OrchestrationModel) | Analyzes Cluster Orchestration and Queueing (Little's Law). |
|
|
| [core.solver.CompressionModel](core.solver.CompressionModel.qmd#mlsysim.core.solver.CompressionModel) | Analyzes model compression trade-offs (Accuracy vs. Efficiency). |
|
|
| [core.solver.SustainabilityModel](core.solver.SustainabilityModel.qmd#mlsysim.core.solver.SustainabilityModel) | Calculates Datacenter-scale Sustainability metrics. |
|
|
| [core.solver.EconomicsModel](core.solver.EconomicsModel.qmd#mlsysim.core.solver.EconomicsModel) | Calculates Total Cost of Ownership (TCO) including Capex and Opex. |
|
|
| [core.solver.ContinuousBatchingModel](core.solver.ContinuousBatchingModel.qmd#mlsysim.core.solver.ContinuousBatchingModel) | Analyzes production LLM serving with Continuous Batching and PagedAttention. |
|
|
| [core.solver.WeightStreamingModel](core.solver.WeightStreamingModel.qmd#mlsysim.core.solver.WeightStreamingModel) | Analyzes Wafer-Scale inference (e.g., Cerebras CS-3) using Weight Streaming. |
|
|
| [core.solver.TailLatencyModel](core.solver.TailLatencyModel.qmd#mlsysim.core.solver.TailLatencyModel) | Analyzes queueing delays and P99 tail latency for deployed inference (M/M/c). |
|
|
| [core.solver.ReliabilityModel](core.solver.ReliabilityModel.qmd#mlsysim.core.solver.ReliabilityModel) | Calculates Mean Time Between Failures (MTBF) and optimal checkpointing intervals. |
|
|
| [core.solver.CheckpointModel](core.solver.CheckpointModel.qmd#mlsysim.core.solver.CheckpointModel) | Analyzes checkpoint I/O burst penalties and MFU impact. |
|