cs249r_book

mirror of https://github.com/harvard-edge/cs249r_book.git synced 2026-05-22 22:33:28 -05:00

Author	SHA1	Message	Date
Vijay Janapa Reddi	1eb30f5f86	fix(mlsysim): harden release QA and paper artifacts Align the MLSys·im code, docs, paper, website, workflows, and lab wheel for the 0.1.1 release. This also fixes runtime/API issues found during release review and prepares the paper PDF plus archive package.	2026-04-25 10:06:01 -04:00
Vijay Janapa Reddi	3ba3858b74	MLSys·im 0.1.0 release-prep audit (#1397 ) * docs(mlsysim): release-prep audit fixes for 0.1.0 Fixes the broken links, stale numerical claims, and naming inconsistencies surfaced by the 0.1.0 release-prep review. Output of the docs site now matches what the engine actually computes, internal navigation has no unresolved targets, and the Hatch announcement banner uses an absolute URL so sub-pages render the "Get started" link correctly. Notable changes: - Hero example on docs/index.qmd and getting-started.qmd now reflect the actual Engine.solve(ResNet50, A100, bs=1, fp16) output (Memory / 0.54 ms / 1843). - Update Python version requirement (3.10+) and document the editable-install limitation (Hatch sources rewrite is not supported by editables). - Standardize the typographic brand to "MLSys·im" in the navbar, OG/Twitter metadata, and the shared cross-site dropdown. - Add the four solvers missing from the quartodoc list (BatchingOptimizer, ForwardModel, NetworkRooflineModel, PlacementOptimizer) and surface the orphan tutorials (01_pipeline_callbacks, 02_differential_explainer, 12_design_space_exploration) in the sidebar. - Rename every reference to the now-deleted hello_world / llm_serving / sustainability / 11_full_stack_audit tutorials to their current filenames. - Add the missing @mlsysbook2024 entry to references.bib so whitepaper.qmd no longer logs a citeproc warning. - Fix the CLI sample on the parent site/index.qmd card to use real model identifiers (Llama3_70B H100 --batch-size 1). - Soften the Colab/Binder copy until launch buttons are wired in. - Remove the duplicate "Differential Explainer" card on tutorials/index.qmd. * release(mlsysim): add 0.1.0 release notes and runbook - RELEASE_NOTES_0.1.0.md: GitHub-release-ready notes promoted from CHANGELOG with install/quickstart copy and a "known limitations & gotchas" section covering the editable-install issue, broken example scripts, and unpublished slide tag. - RELEASE.md: copy-pasteable runbook for cutting a release (pre-flight check, tag, build, twine upload, docs deploy via workflow_dispatch, GitHub release, and post-release verification). - CHANGELOG.md: corrected the test count from 334 to the actual 367 currently passing on dev. * mlsysim: nest package layout, enable editable installs, clean lint Restructure mlsysim into the standard nested layout (`mlsysim/mlsysim/...`) so `pip install -e .` works out of the box. The previous flat layout used a Hatch `sources = {"." = "mlsysim"}` prefix-add rewrite that the `editables` backend cannot handle, breaking editable installs entirely. Packaging - pyproject.toml: drop `sources` rewrite, set `packages = ["mlsysim"]`, add explicit `[tool.hatch.build.targets.sdist]` include list. - Wheel and sdist now contain only the package and project metadata (no `tests/`, `docs/`, `examples/`, `paper/`, `vscode-ext/` leakage). - Update `pyright.exclude` for nested layout. - Update GitHub source links in `docs/math.qmd` and `docs/models-and-solvers.qmd` to point to `mlsysim/mlsysim/...`. Lint configuration - Add `[tool.ruff]` to pyproject.toml with sensible per-file ignores: `__init__.py` re-export pattern (F401/F403/F405/F811), `core/constants.py` star import from unit registry, tests/examples idioms. - `ruff check .` reports zero issues (down from 621). Real bug fixes uncovered by lint cleanup - `core/solver.py`: remove unused `from pydantic import BaseModel` that was being shadowed by the local `BaseModel = ForwardModel` alias. - `sim/simulations.py`: remove redundant local `Fleet` import that was shadowing the module-level import and triggering F823 (referenced before assignment) on the earlier `isinstance(..., Fleet)` check. - `cli/commands/audit.py`, `cli/commands/eval.py`: narrow three bare `except:` clauses to specific exception types. - `tests/test_sota.py`: add the missing speculative-decoding ITL assertion (`res_opt.itl < res_base.itl`) — `res_base` was previously computed but never compared. - `cli/commands/eval.py`: drop unused `is_json` local. - `labs/components.py`: drop unused `energy` placeholder local. Examples - `examples/06_multi_objective_pareto.py`: rewrite around the actual `BatchingOptimizerResult` API (which has no `pareto_front` attribute); build the front explicitly by sweeping batch sizes through `ServingModel` + `TailLatencyModel`, then highlight the optimum returned by `BatchingOptimizer`. - `examples/gemini_design_loop.py`: fix multi-line f-string syntax errors (`f"\n[…]"` instead of an embedded literal newline) so the file imports on every supported Python version. Dev scripts - `generate_appendix.py` and `paper/scripts/validate_anchors.py`: switch from package-relative imports to absolute `from mlsysim... import` so they run cleanly under the nested layout. Docs / release notes - `docs/getting-started.qmd`: replace the editable-install caveat with `pip install -e ".[dev]"` (now supported). - `RELEASE_NOTES_0.1.0.md`: drop the three "known limitations" entries that this commit resolves (editable install, pareto example, gemini example). - `CHANGELOG.md`: add a "Packaging & Tooling" section describing the layout change and the resolver bug fixes. Verification - `python -m pytest tests/` → 367 passed (was 367, no regressions). - `ruff check .` → All checks passed. - `pip install -e .` → succeeds; live source picked up. - Fresh-venv wheel install + CLI smoke test → succeeds. - `examples/06_multi_objective_pareto.py` and `examples/gemini_design_loop.py` → both exit 0. * fix(mlsysim): repair docs build + lab test after nested-package restructure The 0.1.0 release prep moved the package from `mlsysim/` to `mlsysim/mlsysim/` to support `pip install -e .`. Two CI jobs still depended on the old layout: 1. Docs build (`mlsysim-preview-dev`) — every tutorial and zoo page used a hand-rolled `importlib.util.spec_from_file_location` block to load `<repo>/mlsysim/__init__.py` directly from source. After the restructure, that path no longer exists. Replaced the hack in 17 docs/.qmd files with a plain `import mlsysim` — the package is already pip-installed in the docs build environment via `pip install ".[docs]"`. Updated the matching guidance in `contributing.qmd`. 2. Lab static tests — `test_no_localstorage_import` hard-coded `mlsysim/labs/state.py`; updated to the new nested path `mlsysim/mlsysim/labs/state.py`. Verified locally: `pytest labs/tests/test_static.py::TestStateImplementation` passes, and `quarto render docs/zoo/models.qmd` succeeds end-to-end.	2026-04-18 13:11:13 -04:00
Vijay Janapa Reddi	a12412190e	docs(examples): add expected output comments to all 9 runnable examples Each example now has a commented block at the bottom showing expected output from mlsysim v0.1.0. Helps students know what to expect before running, and serves as regression markers.	2026-04-01 19:12:12 -04:00
Vijay Janapa Reddi	495efb3d0b	fix(examples): rewrite 03_heterogeneous_cluster to use existing Fleet API Old version imported NodeGroup (never implemented). Rewritten to use Fleet + Node + NetworkFabric. Demonstrates 128-GPU cluster (16 nodes) with DistributedModel + EconomicsModel.	2026-04-01 18:50:31 -04:00
Vijay Janapa Reddi	481f72feac	feat(staffml): expand corpus to 7,533 published questions (86% validated) Generated 1,125 questions via gemini-2.5-flash batch generation across 1,762 gap-filling jobs, plus 235 targeted questions via Claude for thin topics. Cleaned 252 ERROR questions, fixed duplicate IDs and broken chain references. All 79 topics >= 25 questions, all 11 zones >= 250 questions, 19/19 invariant checks pass. Paper figures rebuilt with updated stats.	2026-04-01 16:03:23 -04:00
Vijay Janapa Reddi	caa6668e16	docs(mlsysim): add Level 5 Autonomy vision and self-improving loop demo	2026-03-18 17:10:02 -04:00
Vijay Janapa Reddi	a878bf5d7b	feat(mlsysim): add MCP server, agentic loop examples, and ISCA tutorial slides	2026-03-18 17:06:20 -04:00
Vijay Janapa Reddi	b6fcbcfa6c	feat: add new mlsysim examples Added examples demonstrating heterogeneous clusters (both programmatic and YAML), the data wall phenomenon, Hugging Face model import, and multi-objective Pareto optimization.	2026-03-16 16:08:32 -04:00
Vijay Janapa Reddi	dbd6a122bc	chore(mlsysim): add standard package scaffolding (Makefile, LICENSE, examples, pyproject)	2026-03-14 18:15:28 -04:00
Vijay Janapa Reddi	c9b09d5bf4	docs(root): add MLSysim to top-level ecosystem links	2026-03-13 08:26:06 -04:00
Vijay Janapa Reddi	a07a664185	refactor(mlsysim): overhaul solver API, results, and test suite Restructure solver.py with prompt caching in ServingSolver, improve results dataclass, update pipeline chaining, and modernize test suite. Replace hardcoded hardware values with constants throughout.	2026-03-12 16:04:51 -04:00
Vijay Janapa Reddi	a78f1bd8b0	feat(mlsysim): add documentation site, typed registries, and 6-solver core Complete MLSYSIM v0.1.0 implementation with: - Documentation website (Quarto): landing page with animated hero and capability carousel, 4 tutorials (hello world, LLM serving, distributed training, sustainability), hardware/model/fleet/infra catalogs, solver guide, whitepaper, math foundations, glossary, and full quartodoc API reference - Typed registry system: Hardware (18 devices across 5 tiers), Models (15 workloads), Systems (fleets, clusters, fabrics), Infrastructure (grid profiles, rack configs, datacenters) - Core types: Pint-backed Quantity, Metadata provenance tracking, custom exception hierarchy (OOMError, SLAViolation) - SimulationConfig with YAML/JSON loading and pre-validation - Scenario system tying workloads to systems with SLA constraints - Multi-level evaluation scorecard (feasibility, performance, macro) - Examples, tests, and Jetson Orin NX spec fix (100 → 25 TFLOP/s) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-07 15:59:51 -05:00

12 Commits