Files
cs249r_book/mlsysim/docs/cli-reference.qmd
Vijay Janapa Reddi 152b8630dc fix(ci): clear all 8 failing pre-commit hooks on dev (#1413)
* fix(content): clear two mitpress-above-below pre-commit failures

The "📚 Book ·  Validate (Dev)" workflow has been failing on dev for
8+ consecutive runs because the mitpress-above-below pre-commit hook
flags spatial references like "above"/"below" inside body prose and
figure captions (the MIT Press style guide wants @sec-/@fig- cross-refs
or "earlier"/"later" instead). Two pre-existing violations were tripping
the hook on every push:

  - book/quarto/contents/vol1/responsible_engr/responsible_engr.qmd:1604
    fig-cap for fig-data-governance-pillars said "obligations discussed
    below: privacy, security, compliance, and transparency" — but those
    four obligations are *immediately* listed in the same caption, so
    "discussed below" was redundant. Reworded to "obligations of
    privacy, security, compliance, and transparency …".

  - book/quarto/contents/vol2/network_fabrics/network_fabrics.qmd:1217
    fig-cap for fig-congestion-cascade said "the PFC backpressure
    cascades described below." Reworded to "described later in this
    section." which is what the hook wants.

After our 4 release-prep merges (PR-1/2/7/12) cleaned up the other
hook failures (spelling, bibtex tidy, pipe tables, contractions,
mitpress-vs-period, …), this was the last remaining failing hook.
Verified locally:

  pre-commit run mitpress-above-below --all-files
  MIT Press: No above/below spatial refs (use cross-refs).....Passed

These are pure copy-edits to figure captions; no semantic change to
the diagrams or surrounding text.

* fix(check-internal-links): suppress 4 categories of false positives

The Tier 1 link checker (shipped in PR #1404) was over-eager and
flagged author content as broken in four documented patterns:

1. TikZ source inside HTML comments. Link regex matched `\node[mycycle](B1)`
   as a Markdown link `[mycycle](B1)`. Fix: strip `<!-- ... -->` bodies
   before scanning, preserving line/column offsets so any *real* failure
   we report stays accurate.
2. Quarto cross-references like `[Foo](@sec-bar)`, `@fig-x`, `@tbl-y`.
   These resolve through the project xref index at render time, not the
   filesystem; book/binder owns that validation. Fix: skip targets whose
   first token is `@sec-/@fig-/@tbl-/@eq-/@lst-/@thm-/@cor-/@def-/@exr-/
   @exm-/@prp-`.
3. Uppercase URL schemes (`HTTPS://`, `HTTP://`) — common after mobile
   auto-capitalize or copied citations. Fix: case-insensitive prefix
   match for the EXTERNAL_SCHEMES tuple.
4. GitHub-style emoji-prefix slugs in `.md` READMEs (e.g.
   `## 🎯 20 Progressive Modules` produces anchor `#-20-progressive-modules`
   on github.com, but Pandoc would slugify to `progressive-modules`).
   Fix: register both Pandoc-style and GitHub-style slugs as valid
   anchors so neither rendering target trips the checker.

Drops repo-wide broken-link count from 150 → 84 (false positives only;
no real link rot is masked). Real rot is fixed in a separate commit so
the checker improvement can be reviewed independently.

* fix(content): repair internal-link rot across 10 files

Concrete link rot the new checker (PR #1404) surfaced once its false
positives were cleared. None of these are stylistic; each link points
at a path or anchor that does not exist.

- README/README_{zh,ja,ko}.md (24 links): translation files live in
  README/ so paths to repo-root targets need a `../` prefix
  (`book/README.md` -> `../book/README.md`, etc.).
- mlsysim/docs/contributing.qmd (21 links): `../slides/...` pointed
  inside `mlsysim/`; the slides root is two levels up
  (`../../slides/...`).
- mlsysim/docs/cli-reference.qmd: `getting-started.qmd#bring-your-own-yaml-byoy`
  removed; retarget to `#defining-custom-models` (closest surviving
  section about user-supplied model specs).
- mlsysim/docs/for-engineers.qmd, for-instructors.qmd:
  `solver-guide.qmd#extending-mlsysim` no longer exists; retarget to
  `#writing-a-custom-solver` (the surviving custom-solver guide).
- book/tools/scripts/README.md: `../docs/BINDER.md` resolved to
  `book/tools/docs/BINDER.md` (nonexistent); the file actually lives
  at `book/docs/BINDER.md`, which is `../../docs/BINDER.md` from here.
- book/quarto/contents/frontmatter/index.qmd:
  `about.qmd#about-the-book-unnumbered` anchor was removed when the
  About heading was simplified; drop the anchor so the link lands at
  the top of the page (which IS the About section).
- tinytorch/datasets/tinytalks/README.md: `scripts/README.md` was
  never created; point at the directory listing instead.

* chore(pre-commit): exclude 3 forward-looking files from internal-link checker

Three files reference content that does not (yet) exist on the
filesystem; the references are intentional rather than rot, so they
should not block CI:

- labs/index.qmd: lists the 33 planned labs (vol1/lab_00..lab_16,
  vol2/lab_01..lab_16) as a roadmap. Links go live as each lab ships.
  De-linking now would lose the visual roadmap. When a lab lands the
  exclusion narrows naturally on its own.
- labs/PROTOCOL.md, labs/TEMPLATE.md: internal authoring docs that
  reference `../.claude/docs/labs/{PROTOCOL,TEMPLATE}.md`. The
  `.claude/` tree is per-worktree and not always present at the same
  relative path; these are author-tooling refs, not user-facing.

Net effect: the link checker is now green on a clean checkout. The
exclude block uses comments per existing convention so the rationale
is discoverable from the config alone.

* fix(content): clear codespell, contractions, and vs. pre-commit failures

Three pre-existing pre-commit hooks were failing on the dev branch
prior to the release-prep merges. Each is a small content normalization:

- codespell (2): re-declares -> redeclares (book/quarto/config/shared/README.md);
  unparseable -> unparsable (handled in the check-internal-links rewrite).
- contractions (2):
  * socratiq/socratiq.qmd callout: "If you're" -> "If you are".
  * nn_architectures fig-alt for the attention-visualization figure:
    "didn't" -> "did not". Alt-text is descriptive prose for screen
    readers, not a verbatim transcription of pixels, so expanding the
    contraction matches MIT Press style without changing the figure
    itself.
- mitpress-vs-period (6): bare `vs` -> `vs.` per MIT Press 2026 §10.5
  in benchmarking.qmd, distributed_training.qmd (x3 across two Python
  docstrings rendered in code listings), fault_tolerance.qmd, and
  inference.qmd. Code-listing strings are visible prose in the rendered
  PDF, so the rule applies there as well.

* chore: bibtex-tidy auto-format outputs

Outputs of the bibtex-tidy pre-commit hook (which auto-fixes its own
input). Picked up here so that running pre-commit on a clean checkout
no longer reports a "files were modified" failure for the same files
on every invocation. Pure formatting; no entry semantics changed.
2026-04-20 12:58:28 -04:00

292 lines
7.0 KiB
Plaintext
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
title: "CLI Reference"
subtitle: "Every command, every flag, with real examples."
---
MLSys·im ships an agent-ready CLI built on [Typer](https://typer.tiangolo.com/) and [Rich](https://rich.readthedocs.io/). It follows the [3-Tier Command Mapping](architecture.qmd): `eval` maps to Models, `optimize` maps to Optimizers, and `zoo` maps to the registries.
::: {.callout-tip}
## Output Formats
Every command supports `-o json` for machine-parseable output and `-o markdown` for reports. The default is `text` (human-readable Rich tables). AI agents should always use `-o json`.
:::
---
## Quick Examples
```bash
# What's in the Zoo?
mlsysim zoo hardware
mlsysim zoo models
# Single-node roofline: is Llama-3 8B memory-bound on H100?
mlsysim eval Llama3_8B H100
# Same thing, but with batch size 32 and fp8 precision
mlsysim eval Llama3_8B H100 --batch-size 32 --precision fp8
# Full cluster evaluation from a YAML spec
mlsysim eval cluster.yaml
# Machine-readable JSON for CI/CD pipelines
mlsysim eval Llama3_8B H100 -o json
# Export JSON Schema for IDE autocompletion
mlsysim schema --type hardware > hardware.schema.json
```
---
## Exit Codes
The CLI uses semantic exit codes so scripts and CI pipelines can react programmatically:
| Code | Meaning | Example |
|:-----|:--------|:--------|
| `0` | Success | Analysis completed, all assertions passed |
| `1` | Bad input | Unknown model name, malformed YAML, missing required flag |
| `2` | Physics violation | OOM — model does not fit in memory at the given precision |
| `3` | SLA violation | A `constraints.assert` check in the YAML failed |
```bash
mlsysim eval Llama3_70B T4 --batch-size 1
# Exit code 2: OOM — 140 GB model weights exceed 16 GB T4 memory
echo $? # → 2
```
---
## Global Options
```
mlsysim [OPTIONS] COMMAND [ARGS]...
```
| Flag | Description | Default |
|:-----|:-----------|:--------|
| `-o, --output` | Output format: `text`, `json`, `markdown` | `text` |
| `--install-completion` | Install shell completion (bash, zsh, fish) | — |
| `--show-completion` | Print completion script to stdout | — |
| `--help` | Show help and exit | — |
---
## `mlsysim zoo`
Explore the built-in registries (the MLSys Zoo).
```
mlsysim zoo [CATEGORY]
```
**Arguments:**
| Argument | Description |
|:---------|:-----------|
| `CATEGORY` | `hardware` or `models` |
**Examples:**
```bash
# List all hardware in the Zoo with specs
mlsysim zoo hardware
# List all models with parameter counts and FLOPs
mlsysim zoo models
# JSON output for scripting
mlsysim zoo hardware -o json
```
---
## `mlsysim eval`
Evaluate the analytical physics of an ML system. This is the primary command — it runs the roofline analysis and returns bottleneck, latency, throughput, and memory usage.
```
mlsysim eval [OPTIONS] TARGET [HARDWARE]
```
**Arguments:**
| Argument | Description | Required |
|:---------|:-----------|:---------|
| `TARGET` | Model name (e.g., `Llama3_8B`) or path to `mlsys.yaml` | Yes |
| `HARDWARE` | Hardware name (e.g., `H100`) — required when TARGET is a model name | Conditional |
**Options:**
| Flag | Description | Default |
|:-----|:-----------|:--------|
| `-b, --batch-size` | Batch size | `1` |
| `-p, --precision` | Numerical precision: `fp32`, `fp16`, `fp8`, `int8`, `int4` | `fp16` |
| `-e, --efficiency` | Model FLOPs Utilization (0.01.0) | `0.5` |
**Examples:**
```bash
# Quick check: is ResNet-50 memory-bound on A100?
mlsysim eval ResNet50 A100
# LLM inference at batch 1 (typical serving scenario)
mlsysim eval Llama3_8B H100 --batch-size 1 --precision fp16
# Quantized inference
mlsysim eval Llama3_8B H100 --batch-size 32 --precision int8 --efficiency 0.35
# Full cluster evaluation with SLA assertions
mlsysim eval cluster.yaml
# JSON for CI/CD — fails with exit code 3 if SLA assertions fail
mlsysim eval cluster.yaml -o json
```
### YAML Cluster Evaluation
When `TARGET` is a YAML file, `eval` runs the full 3-lens scorecard (Feasibility, Performance, Macro) including distributed training, economics, and sustainability analysis.
```yaml
version: "1.0"
workload:
name: "Llama3_70B"
batch_size: 4096
hardware:
name: "H100"
nodes: 64
ops:
region: "Quebec"
duration_days: 14.0
constraints:
assert:
- metric: "performance.latency"
max: 50.0
```
---
## `mlsysim schema`
Export JSON Schema for configuration files. Feed these to your IDE for autocompletion or to an LLM agent for structured generation.
```
mlsysim schema [OPTIONS]
```
**Options:**
| Flag | Description |
|:-----|:-----------|
| `--type` | Schema type: `hardware`, `workload`, or `plan` |
**Examples:**
```bash
# Get the hardware YAML schema for IDE autocompletion
mlsysim schema --type hardware > hardware.schema.json
# Get the workload schema
mlsysim schema --type workload > workload.schema.json
# Get the full cluster plan schema (for mlsys.yaml files)
mlsysim schema --type plan > plan.schema.json
```
---
## `mlsysim optimize`
Search the design space for optimal configurations. Each subcommand maps to an Optimizer in the 3-Tier architecture.
```
mlsysim optimize COMMAND [ARGS]...
```
### `mlsysim optimize parallelism`
Find the optimal (TP, PP, DP) split to maximize Model FLOPs Utilization.
```
mlsysim optimize parallelism CONFIG_FILE
```
| Argument | Description | Required |
|:---------|:-----------|:---------|
| `CONFIG_FILE` | Path to `mlsys.yaml` with fleet definition | Yes |
**Example:**
```bash
# Find the best parallelism strategy for a 70B model on 256 H100s
mlsysim optimize parallelism cluster.yaml
```
### `mlsysim optimize batching`
Find the maximum safe batch size that satisfies a P99 latency SLA.
```
mlsysim optimize batching [OPTIONS] CONFIG_FILE
```
| Flag | Description | Required |
|:-----|:-----------|:---------|
| `--sla-ms` | P99 latency SLA in milliseconds | Yes |
| `--qps` | Arrival rate in queries per second | Yes |
**Example:**
```bash
# Max batch size for 50ms P99 at 100 QPS
mlsysim optimize batching cluster.yaml --sla-ms 50 --qps 100
```
### `mlsysim optimize placement`
Find the optimal datacenter region to minimize TCO and carbon footprint.
```
mlsysim optimize placement [OPTIONS] CONFIG_FILE
```
| Flag | Description | Default |
|:-----|:-----------|:--------|
| `--carbon-tax` | Carbon tax penalty in $/ton CO₂ | `100.0` |
**Example:**
```bash
# Find cheapest region with $150/ton carbon penalty
mlsysim optimize placement cluster.yaml --carbon-tax 150
```
---
## `mlsysim audit`
Profile a workload against the Iron Law and report which wall binds.
```
mlsysim audit [OPTIONS]
```
| Flag | Description |
|:-----|:-----------|
| `--workload` | Workload name to audit |
---
## Bring Your Own YAML
Instead of using registry names, you can pass custom hardware or workload YAML files directly to `eval`:
```bash
# Custom chip spec against a Zoo model
mlsysim eval Llama3_8B ./my_custom_chip.yaml --batch-size 32
# Both custom
mlsysim eval ./my_model.yaml ./my_chip.yaml
```
See [Getting Started — Defining Custom Models](getting-started.qmd#defining-custom-models) for the model definition format.