Files
cs249r_book/mlsysim/docs/tutorials/07_geography.qmd
Vijay Janapa Reddi 3ba3858b74 MLSys·im 0.1.0 release-prep audit (#1397)
* docs(mlsysim): release-prep audit fixes for 0.1.0

Fixes the broken links, stale numerical claims, and naming inconsistencies
surfaced by the 0.1.0 release-prep review. Output of the docs site now matches
what the engine actually computes, internal navigation has no unresolved targets,
and the Hatch announcement banner uses an absolute URL so sub-pages render the
"Get started" link correctly.

Notable changes:
- Hero example on docs/index.qmd and getting-started.qmd now reflect the actual
  Engine.solve(ResNet50, A100, bs=1, fp16) output (Memory / 0.54 ms / 1843).
- Update Python version requirement (3.10+) and document the editable-install
  limitation (Hatch sources rewrite is not supported by editables).
- Standardize the typographic brand to "MLSys·im" in the navbar, OG/Twitter
  metadata, and the shared cross-site dropdown.
- Add the four solvers missing from the quartodoc list
  (BatchingOptimizer, ForwardModel, NetworkRooflineModel, PlacementOptimizer)
  and surface the orphan tutorials (01_pipeline_callbacks,
  02_differential_explainer, 12_design_space_exploration) in the sidebar.
- Rename every reference to the now-deleted hello_world / llm_serving /
  sustainability / 11_full_stack_audit tutorials to their current filenames.
- Add the missing @mlsysbook2024 entry to references.bib so whitepaper.qmd
  no longer logs a citeproc warning.
- Fix the CLI sample on the parent site/index.qmd card to use real model
  identifiers (Llama3_70B H100 --batch-size 1).
- Soften the Colab/Binder copy until launch buttons are wired in.
- Remove the duplicate "Differential Explainer" card on tutorials/index.qmd.

* release(mlsysim): add 0.1.0 release notes and runbook

- RELEASE_NOTES_0.1.0.md: GitHub-release-ready notes promoted from CHANGELOG
  with install/quickstart copy and a "known limitations & gotchas" section
  covering the editable-install issue, broken example scripts, and unpublished
  slide tag.
- RELEASE.md: copy-pasteable runbook for cutting a release (pre-flight check,
  tag, build, twine upload, docs deploy via workflow_dispatch, GitHub release,
  and post-release verification).
- CHANGELOG.md: corrected the test count from 334 to the actual 367 currently
  passing on dev.

* mlsysim: nest package layout, enable editable installs, clean lint

Restructure mlsysim into the standard nested layout (`mlsysim/mlsysim/...`)
so `pip install -e .` works out of the box. The previous flat layout used
a Hatch `sources = {"." = "mlsysim"}` prefix-add rewrite that the
`editables` backend cannot handle, breaking editable installs entirely.

Packaging
- pyproject.toml: drop `sources` rewrite, set `packages = ["mlsysim"]`,
  add explicit `[tool.hatch.build.targets.sdist]` include list.
- Wheel and sdist now contain only the package and project metadata
  (no `tests/`, `docs/`, `examples/`, `paper/`, `vscode-ext/` leakage).
- Update `pyright.exclude` for nested layout.
- Update GitHub source links in `docs/math.qmd` and
  `docs/models-and-solvers.qmd` to point to `mlsysim/mlsysim/...`.

Lint configuration
- Add `[tool.ruff]` to pyproject.toml with sensible per-file ignores:
  `__init__.py` re-export pattern (F401/F403/F405/F811),
  `core/constants.py` star import from unit registry,
  tests/examples idioms.
- `ruff check .` reports zero issues (down from 621).

Real bug fixes uncovered by lint cleanup
- `core/solver.py`: remove unused `from pydantic import BaseModel` that
  was being shadowed by the local `BaseModel = ForwardModel` alias.
- `sim/simulations.py`: remove redundant local `Fleet` import that was
  shadowing the module-level import and triggering F823 (referenced
  before assignment) on the earlier `isinstance(..., Fleet)` check.
- `cli/commands/audit.py`, `cli/commands/eval.py`: narrow three bare
  `except:` clauses to specific exception types.
- `tests/test_sota.py`: add the missing speculative-decoding ITL
  assertion (`res_opt.itl < res_base.itl`) — `res_base` was previously
  computed but never compared.
- `cli/commands/eval.py`: drop unused `is_json` local.
- `labs/components.py`: drop unused `energy` placeholder local.

Examples
- `examples/06_multi_objective_pareto.py`: rewrite around the actual
  `BatchingOptimizerResult` API (which has no `pareto_front` attribute);
  build the front explicitly by sweeping batch sizes through
  `ServingModel` + `TailLatencyModel`, then highlight the optimum
  returned by `BatchingOptimizer`.
- `examples/gemini_design_loop.py`: fix multi-line f-string syntax errors
  (`f"\n[…]"` instead of an embedded literal newline) so the file imports
  on every supported Python version.

Dev scripts
- `generate_appendix.py` and `paper/scripts/validate_anchors.py`: switch
  from package-relative imports to absolute `from mlsysim... import` so
  they run cleanly under the nested layout.

Docs / release notes
- `docs/getting-started.qmd`: replace the editable-install caveat with
  `pip install -e ".[dev]"` (now supported).
- `RELEASE_NOTES_0.1.0.md`: drop the three "known limitations" entries
  that this commit resolves (editable install, pareto example, gemini
  example).
- `CHANGELOG.md`: add a "Packaging & Tooling" section describing the
  layout change and the resolver bug fixes.

Verification
- `python -m pytest tests/` → 367 passed (was 367, no regressions).
- `ruff check .` → All checks passed.
- `pip install -e .` → succeeds; live source picked up.
- Fresh-venv wheel install + CLI smoke test → succeeds.
- `examples/06_multi_objective_pareto.py` and
  `examples/gemini_design_loop.py` → both exit 0.

* fix(mlsysim): repair docs build + lab test after nested-package restructure

The 0.1.0 release prep moved the package from `mlsysim/` to `mlsysim/mlsysim/`
to support `pip install -e .`. Two CI jobs still depended on the old layout:

1. **Docs build (`mlsysim-preview-dev`)** — every tutorial and zoo page used
   a hand-rolled `importlib.util.spec_from_file_location` block to load
   `<repo>/mlsysim/__init__.py` directly from source. After the restructure,
   that path no longer exists. Replaced the hack in 17 docs/.qmd files with
   a plain `import mlsysim` — the package is already pip-installed in the
   docs build environment via `pip install ".[docs]"`. Updated the matching
   guidance in `contributing.qmd`.

2. **Lab static tests** — `test_no_localstorage_import` hard-coded
   `mlsysim/labs/state.py`; updated to the new nested path
   `mlsysim/mlsysim/labs/state.py`.

Verified locally: `pytest labs/tests/test_static.py::TestStateImplementation`
passes, and `quarto render docs/zoo/models.qmd` succeeds end-to-end.
2026-04-18 13:11:13 -04:00

274 lines
9.2 KiB
Plaintext

---
title: "Geography is a Systems Variable"
subtitle: "Same cluster, same model, same duration — but does location change the cost?"
description: "Compare identical training runs across four grid regions to discover whether geography matters more than hardware choice or training duration for carbon footprint."
categories: ["ops", "intermediate"]
---
## The Question
You have a 256-GPU cluster training a model for 30 days. Does it matter *where* that
cluster is located? Not for latency or throughput — those are fixed by the hardware. But
for carbon emissions, water usage, and total cost of ownership, does geography matter —
and if so, by how much?
::: {.callout-note}
## Prerequisites
Complete [Tutorial 1: The Memory Wall](01_memory_wall.qmd). No other prerequisites
are required — this tutorial can be completed independently.
:::
::: {.callout-note}
## What You Will Learn
- **Calculate** the carbon footprint of identical training runs in different regions
- **Quantify** the gap between the cleanest and dirtiest electricity grids
- **Compare** geography vs. training duration as levers for sustainability
- **Apply** the `EconomicsModel` to show how carbon pricing changes the cheapest option
:::
::: {.callout-tip}
## Background: Grid Carbon Intensity
Every kilowatt-hour of electricity has a carbon cost, measured in grams of CO2 per kWh
(gCO2/kWh). This number depends entirely on how the electricity is generated:
| Region | Primary Source | Carbon Intensity |
|:-------|:---------------|:-----------------|
| Quebec | Hydroelectric | ~20 gCO2/kWh |
| Norway | Hydroelectric | ~29 gCO2/kWh |
| US Average | Mixed (gas, coal, renewables) | ~390 gCO2/kWh |
| Poland | Coal-dominated | ~820 gCO2/kWh |
The range is wide. How wide — and whether it matters more than other levers like
training duration or hardware choice — is what this tutorial quantifies.
:::
---
## 1. Setup
```{python}
#| echo: false
#| output: false
import mlsysim # installed via `pip install mlsysim` (see workflow)
Engine = mlsysim.Engine
```
```python
import mlsysim
from mlsysim import Engine
```
---
## 2. Two-Region Comparison
Let's run the same training job in two locations: Quebec (hydroelectric) and Poland
(coal-dominated). Same fleet, same model, same 30-day duration. The only variable
is where the electricity comes from.
```{python}
from mlsysim import SustainabilityModel, Systems
from mlsysim.systems.types import Fleet, Node, NetworkFabric
from mlsysim.core.constants import Q_
from mlsysim.show import table, info
# 256-GPU cluster: 32 DGX H100 nodes
fleet = Fleet(
name="256-GPU Training Cluster",
node=Systems.Nodes.DGX_H100,
count=32,
fabric=Systems.Fabrics.InfiniBand_NDR
)
solver = SustainabilityModel()
# Quebec: hydroelectric grid
res_quebec = solver.solve(
fleet=fleet, duration_days=30,
datacenter=mlsysim.Infra.Grids.Quebec
)
# Poland: coal-heavy grid
res_poland = solver.solve(
fleet=fleet, duration_days=30,
datacenter=mlsysim.Infra.Grids.Poland
)
carbon_q = res_quebec.carbon_footprint_kg / 1000 # tonnes
carbon_p = res_poland.carbon_footprint_kg / 1000
ratio = carbon_p / carbon_q if carbon_q > 0 else 0
table(
["Region", "Carbon (tonnes CO2)"],
[
["Quebec (Hydro)", f"{carbon_q:.1f}"],
["Poland (Coal)", f"{carbon_p:.1f}"],
]
)
info(Ratio=f"{ratio:.0f}x")
```
Same cluster. Same model. Same duration. The carbon footprint differs by roughly
**40x** depending on the electricity grid. This is not an optimization — it is a
location decision.
---
## 3. All-Region Sweep
Let's expand the comparison to all four grid regions in the Infrastructure Zoo,
adding energy consumption, water usage, and PUE to the picture.
```{python}
grids = [
mlsysim.Infra.Grids.Quebec,
mlsysim.Infra.Grids.Norway,
mlsysim.Infra.Grids.US_Avg,
mlsysim.Infra.Grids.Poland,
]
region_results = {}
rows = []
for grid in grids:
r = solver.solve(fleet=fleet, duration_days=30, datacenter=grid)
energy_mwh = r.total_energy_kwh.magnitude / 1000
carbon_t = r.carbon_footprint_kg / 1000
water_kl = r.water_usage_liters / 1000
region_results[r.region_name] = r
rows.append([r.region_name, f"{energy_mwh:,.1f}", f"{carbon_t:,.1f}", f"{water_kl:,.1f}", f"{r.pue:.2f}"])
table(["Region", "Energy (MWh)", "Carbon (t)", "Water (kL)", "PUE"], rows)
```
Notice that energy consumption also varies between regions because of different PUE
values. A modern liquid-cooled facility (PUE 1.1) wastes less energy on cooling than
a legacy air-cooled datacenter (PUE 1.6). But the dominant factor is carbon intensity
— it creates the 40x gap.
---
## 4. Geography vs. Training Duration
Is it better to train longer in a clean region or shorter in a dirty region? Let's
compare 30 days in Quebec against just 10 days in Poland.
```{python}
# 30 days in Quebec
res_30d_quebec = solver.solve(
fleet=fleet, duration_days=30,
datacenter=mlsysim.Infra.Grids.Quebec
)
# 10 days in Poland (1/3 the training time)
res_10d_poland = solver.solve(
fleet=fleet, duration_days=10,
datacenter=mlsysim.Infra.Grids.Poland
)
c_q = res_30d_quebec.carbon_footprint_kg / 1000
c_p = res_10d_poland.carbon_footprint_kg / 1000
table(
["Scenario", "Carbon (tonnes CO2)"],
[
["30 days in Quebec", f"{c_q:.1f}"],
["10 days in Poland", f"{c_p:.1f}"],
]
)
info(Ratio=f"{c_p/c_q:.1f}x")
```
::: {.callout-important}
## Key Insight
**Geography is a larger lever than training duration for carbon footprint.** Even
training for one-third the time in Poland produces more carbon than the full 30-day
run in Quebec. The carbon intensity gap between hydro and coal grids is so large that
no reasonable reduction in training time can compensate. For any organization serious
about sustainable AI, datacenter location is not a logistics detail — it is a
first-order systems design decision with 40x impact.
:::
---
## 5. Economic Angle: When Carbon Has a Price
What happens when carbon emissions carry a financial cost? Carbon pricing (through
taxes or cap-and-trade) changes the economics of datacenter location. Let's compute
TCO with a carbon price of $50/tonne.
```{python}
from mlsysim import EconomicsModel
econ = EconomicsModel()
carbon_price = 50 # USD per tonne CO2
rows = []
for grid in grids:
tco = econ.solve(fleet=fleet, duration_days=30, grid=grid)
carbon_cost = (tco.carbon_footprint_kg / 1000) * carbon_price
total = tco.tco_usd + carbon_cost
rows.append([tco.region_name, f"${tco.tco_usd:,.0f}", f"${carbon_cost:,.0f}", f"${total:,.0f}"])
table(["Region", "TCO ($)", "Carbon Cost ($)", "Total ($)"], rows)
```
At $50/tonne, carbon pricing adds a visible cost differential between regions. At
higher carbon prices (some jurisdictions already charge $100+/tonne), the difference
becomes even more pronounced, potentially shifting which region offers the lowest TCO.
---
## Your Turn
::: {.callout-caution}
## Exercises
**Exercise 1: Predict before you compute.**
Training for 30 days in Quebec vs. 10 days in Poland — which produces more carbon?
Write your prediction, then run both scenarios. Were you right? What does this tell
you about the relative magnitude of grid carbon intensity vs. training duration?
**Exercise 2: At what carbon price does geography change the cheapest option?**
Sweep carbon price from $0 to $500/tonne in steps of $50. For each price, calculate
the total cost (TCO + carbon cost) for all four regions. At what price does a region
other than the default cheapest become the best option? Print a table showing the
crossover.
**Exercise 3: Sweep PUE from 1.0 to 2.0.**
Create custom grid profiles using `from mlsysim.infra.types import GridProfile` with
US Average carbon intensity but varying PUE. Sweep PUE from 1.0 to 2.0 in steps of
0.1. How much does total energy increase? At what PUE does facility overhead exceed
the IT energy itself?
**Self-check:** If you train for 30 days in Quebec (20 gCO2/kWh) vs. 15 days in
Poland (820 gCO2/kWh), and both use the same fleet and power, which produces more
total carbon? Show the mental calculation: the ratio of carbon intensities is 41x,
and the ratio of durations is 2x, so Poland is still 41/2 = ~20x worse.
:::
---
## Key Takeaways
::: {.callout-tip}
## Summary
- **Grid carbon intensity creates a 40x gap** between the cleanest (Quebec, ~20 gCO2/kWh) and dirtiest (Poland, ~820 gCO2/kWh) regions
- **Geography dominates training duration** as a sustainability lever: 10 days in Poland emits more than 30 days in Quebec
- **PUE amplifies energy use** but carbon intensity is the dominant factor in emissions
- **Carbon pricing changes the economics**: at $50-100/tonne, location becomes a financial variable, not just an environmental one
- **Datacenter location is a systems design decision** with first-order impact on sustainability and, increasingly, on cost
:::
---
## Next Steps
- **[The $9M Question](08_nine_million_dollar.qmd)** -- Quantify the infrastructure cost of chain-of-thought reasoning
- **[Scaling to 1000 GPUs](06_scaling_1000_gpus.qmd)** -- Discover the hidden reliability cost at scale
- **[Sensitivity Analysis](09_sensitivity.qmd)** -- Use sensitivity sweeps to find which parameter matters most
- **[Infrastructure Zoo](../zoo/infra.qmd)** -- Browse all regional grid profiles and datacenter configurations