cs249r_book

mirror of https://github.com/harvard-edge/cs249r_book.git synced 2026-05-08 02:28:25 -05:00

Author	SHA1	Message	Date
Rocky	19296a07a0	fix(labs/tests): open lab files with explicit UTF-8 encoding (#1596 ) read_source() used open() without an encoding argument. On Windows, the default codec is cp1252, which cannot decode the UTF-8 byte 0x90 (used in em-dash and similar characters present in lab markdown strings). All 33 TestWidgetReturnCompleteness tests failed on Windows with UnicodeDecodeError; passing PYTHONUTF8=1 made them all green. Fix: add encoding="utf-8" to the open() call in read_source().	2026-04-28 14:28:32 -04:00
Vijay Janapa Reddi	5fb95bfd19	test: add dynamic widget interaction testing to CI & fix WASM worker path resolution This commit introduces the following fixes to the Marimo labs architecture: 1. Interactive Testing: Updates test_widget.py to dynamically extract, simulate clicks, and verify the interactive states hidden behind mo.stop() to ensure execution pipelines don't crash. 2. Ledger Continuity: Fixes an issue in 4 Volume 2 labs where the ledger.save() was mistakenly passed a string key (e.g. 'v2_05') instead of an integer. 3. WASM Relative Pathing: Modifies tools/build_site.sh to duplicate built Pyodide wheel assets into vol1/wheels and vol2/wheels to satisfy Pyodide's worker.js relative path resolution, which was causing the labs to hang at startup on GitHub Pages with BadZipFile errors.	2026-04-25 13:45:03 -04:00
Rocky	56db0cc010	fix(lab05): resolve silent WASM hang on Pyodide boot (#1388 ) (#1389 ) Cell 0 imported INFINIBAND_NDR_BW_GBS from mlsysim.core.defaults but returned the name IB_NDR_BW_GBS which was never assigned — a NameError that causes Pyodide execution to stall silently with no console error, leaving all tabs unrendered. - Add IB_NDR_BW_GBS = INFINIBAND_NDR_BW_GBS alias in cell 0 - Remove dead imports (GPU_MTTF_HOURS, IB_NDR_LATENCY_US, SCALING_EFF_256GPU, OVERHEAD_PIPELINE_BUBBLE) and unused EDGE variable - Add A100_TFLOPS_FP16 and T4_TFLOPS_FP16 from mlsysim registry so hardware tier dropdowns and synthesis cell use live constants instead of hardcoded magic numbers (989.0, 312.0, 25.0, 12.5, 65.0) - Extend browser_smoke.py with Phase 4: after network-idle, verify [role="tab"] elements are visible for any lab declaring mo.ui.tabs; catches the #1388-class hang that passes network-idle but never executes the tabs cell	2026-04-19 10:44:29 -04:00
Vijay Janapa Reddi	3ba3858b74	MLSys·im 0.1.0 release-prep audit (#1397 ) * docs(mlsysim): release-prep audit fixes for 0.1.0 Fixes the broken links, stale numerical claims, and naming inconsistencies surfaced by the 0.1.0 release-prep review. Output of the docs site now matches what the engine actually computes, internal navigation has no unresolved targets, and the Hatch announcement banner uses an absolute URL so sub-pages render the "Get started" link correctly. Notable changes: - Hero example on docs/index.qmd and getting-started.qmd now reflect the actual Engine.solve(ResNet50, A100, bs=1, fp16) output (Memory / 0.54 ms / 1843). - Update Python version requirement (3.10+) and document the editable-install limitation (Hatch sources rewrite is not supported by editables). - Standardize the typographic brand to "MLSys·im" in the navbar, OG/Twitter metadata, and the shared cross-site dropdown. - Add the four solvers missing from the quartodoc list (BatchingOptimizer, ForwardModel, NetworkRooflineModel, PlacementOptimizer) and surface the orphan tutorials (01_pipeline_callbacks, 02_differential_explainer, 12_design_space_exploration) in the sidebar. - Rename every reference to the now-deleted hello_world / llm_serving / sustainability / 11_full_stack_audit tutorials to their current filenames. - Add the missing @mlsysbook2024 entry to references.bib so whitepaper.qmd no longer logs a citeproc warning. - Fix the CLI sample on the parent site/index.qmd card to use real model identifiers (Llama3_70B H100 --batch-size 1). - Soften the Colab/Binder copy until launch buttons are wired in. - Remove the duplicate "Differential Explainer" card on tutorials/index.qmd. * release(mlsysim): add 0.1.0 release notes and runbook - RELEASE_NOTES_0.1.0.md: GitHub-release-ready notes promoted from CHANGELOG with install/quickstart copy and a "known limitations & gotchas" section covering the editable-install issue, broken example scripts, and unpublished slide tag. - RELEASE.md: copy-pasteable runbook for cutting a release (pre-flight check, tag, build, twine upload, docs deploy via workflow_dispatch, GitHub release, and post-release verification). - CHANGELOG.md: corrected the test count from 334 to the actual 367 currently passing on dev. * mlsysim: nest package layout, enable editable installs, clean lint Restructure mlsysim into the standard nested layout (`mlsysim/mlsysim/...`) so `pip install -e .` works out of the box. The previous flat layout used a Hatch `sources = {"." = "mlsysim"}` prefix-add rewrite that the `editables` backend cannot handle, breaking editable installs entirely. Packaging - pyproject.toml: drop `sources` rewrite, set `packages = ["mlsysim"]`, add explicit `[tool.hatch.build.targets.sdist]` include list. - Wheel and sdist now contain only the package and project metadata (no `tests/`, `docs/`, `examples/`, `paper/`, `vscode-ext/` leakage). - Update `pyright.exclude` for nested layout. - Update GitHub source links in `docs/math.qmd` and `docs/models-and-solvers.qmd` to point to `mlsysim/mlsysim/...`. Lint configuration - Add `[tool.ruff]` to pyproject.toml with sensible per-file ignores: `__init__.py` re-export pattern (F401/F403/F405/F811), `core/constants.py` star import from unit registry, tests/examples idioms. - `ruff check .` reports zero issues (down from 621). Real bug fixes uncovered by lint cleanup - `core/solver.py`: remove unused `from pydantic import BaseModel` that was being shadowed by the local `BaseModel = ForwardModel` alias. - `sim/simulations.py`: remove redundant local `Fleet` import that was shadowing the module-level import and triggering F823 (referenced before assignment) on the earlier `isinstance(..., Fleet)` check. - `cli/commands/audit.py`, `cli/commands/eval.py`: narrow three bare `except:` clauses to specific exception types. - `tests/test_sota.py`: add the missing speculative-decoding ITL assertion (`res_opt.itl < res_base.itl`) — `res_base` was previously computed but never compared. - `cli/commands/eval.py`: drop unused `is_json` local. - `labs/components.py`: drop unused `energy` placeholder local. Examples - `examples/06_multi_objective_pareto.py`: rewrite around the actual `BatchingOptimizerResult` API (which has no `pareto_front` attribute); build the front explicitly by sweeping batch sizes through `ServingModel` + `TailLatencyModel`, then highlight the optimum returned by `BatchingOptimizer`. - `examples/gemini_design_loop.py`: fix multi-line f-string syntax errors (`f"\n[…]"` instead of an embedded literal newline) so the file imports on every supported Python version. Dev scripts - `generate_appendix.py` and `paper/scripts/validate_anchors.py`: switch from package-relative imports to absolute `from mlsysim... import` so they run cleanly under the nested layout. Docs / release notes - `docs/getting-started.qmd`: replace the editable-install caveat with `pip install -e ".[dev]"` (now supported). - `RELEASE_NOTES_0.1.0.md`: drop the three "known limitations" entries that this commit resolves (editable install, pareto example, gemini example). - `CHANGELOG.md`: add a "Packaging & Tooling" section describing the layout change and the resolver bug fixes. Verification - `python -m pytest tests/` → 367 passed (was 367, no regressions). - `ruff check .` → All checks passed. - `pip install -e .` → succeeds; live source picked up. - Fresh-venv wheel install + CLI smoke test → succeeds. - `examples/06_multi_objective_pareto.py` and `examples/gemini_design_loop.py` → both exit 0. * fix(mlsysim): repair docs build + lab test after nested-package restructure The 0.1.0 release prep moved the package from `mlsysim/` to `mlsysim/mlsysim/` to support `pip install -e .`. Two CI jobs still depended on the old layout: 1. Docs build (`mlsysim-preview-dev`) — every tutorial and zoo page used a hand-rolled `importlib.util.spec_from_file_location` block to load `<repo>/mlsysim/__init__.py` directly from source. After the restructure, that path no longer exists. Replaced the hack in 17 docs/.qmd files with a plain `import mlsysim` — the package is already pip-installed in the docs build environment via `pip install ".[docs]"`. Updated the matching guidance in `contributing.qmd`. 2. Lab static tests — `test_no_localstorage_import` hard-coded `mlsysim/labs/state.py`; updated to the new nested path `mlsysim/mlsysim/labs/state.py`. Verified locally: `pytest labs/tests/test_static.py::TestStateImplementation` passes, and `quarto render docs/zoo/models.qmd` succeeds end-to-end.	2026-04-18 13:11:13 -04:00
Vijay Janapa Reddi	80b916f3d6	fix(labs): widget return-tuple sweep + static test for regressions (#1332 ) the bug class: every lab defined some widgets in a cell but returned only a subset — typically just the terminal `partX_prediction`. marimo's dataflow routes variables strictly through return tuples, so sliders and dropdowns defined alongside a prediction never reached the tabs cell. students saw "missing prediction choices" and "chart stuck, does not move when changing items" even though all static and engine tests passed. peter koellner (#1332) flagged it in lab_02/lab_03; sweeping the whole tree found it in 17 of 33 labs with 175 unreturned widgets. this pr: 1. codemods every lab: each `@app.cell` now returns the full set of widgets it defines, alphabetical. one-line rewrites of `return (...)` statements; no logic or labels touched. 2. rewrites the tabs cell signature in every lab to declare every widget it references as a parameter (instead of the stale `def _(mo, partD_prediction):` style that omitted everything else and relied on marimo's editor to auto-sync on save — which never happens for files edited by hand or mechanically transformed). 3. adds `TestWidgetReturnCompleteness` in labs/tests/test_static.py that fails on any new cell defining a `mo.ui.*` widget without returning it (render-sink names like `tabs`/`_tabs` and widgets defined inside the tabs cell itself are excluded). 33 / 33 labs pass now; a regression would be caught at CI time. test plan: - labs/tests/test_static.py + tests/test_engine.py: 825 passed 4 skipped 1 xfailed (up from 792 — 33 new widget-return-complete runs) - marimo check on lab_04 (spot-sampled): exit 0 - widget-return audit script: 17 offending labs → 0 on the relevant check (42 remaining hits are widgets defined inside tabs cells, which don't need to flow outward; excluded by the test) remaining follow-up: the 42 widgets defined inside tabs cells work via closure scope but ideally would be pulled into their own cells for consistency. peter's lab_02 partD_data_size / partD_wireless note gets at this; left for a separate pr since it requires moving code across cell boundaries rather than mechanical return-tuple fixes.	2026-04-17 14:15:38 -04:00
Vijay Janapa Reddi	4a5964ba18	fix(labs/ci): detect marimo python cell exceptions via console log scan marimo routes python stderr through styled console.log, not console.error, so the previous console.error-only check missed every python-level failure in a cell — including the exact class of bug the browser smoke was written to catch (plotly imported before micropip.install, #1353). fix: scan every console log line for marimo's structured exception payload `{"type":"exception","exception_type":"...","msg":"..."}` and surface a one-line summary like `ModuleNotFoundError: No module named 'foo'`. verified locally against a lab deliberately broken with an `import nonexistent_foo_bar_pkg` before micropip.install. the check fails with `[python] ModuleNotFoundError: No module named 'nonexistent_foo_bar_pkg'` while the healthy lab_00 export still passes with 0 captured errors.	2026-04-17 10:23:04 -04:00
Vijay Janapa Reddi	a0292c65a1	fix(labs/ci): wait for pyodide boot, not just marimo shell the first pass of browser_smoke.py passed all 4 labs in 5s each on ci — obviously too fast. marimo's wasm export serializes pre-run cell outputs into the static html shell, so selectors like [role="tab"] and .marimo-cell attach to the dom before pyodide has downloaded or executed anything. that meant the test was only proving the page loaded, not that python actually ran. the #1353-class bug (plotly imported before micropip.install) would have slipped through again. three-phase check now: 1. shell selector within 30s — fast fail if export is broken 2. network-idle within 180s — pyodide runtime + every wheel micropip pulls in must resolve before networkidle fires, so this is the real pyodide-actually-booted signal 3. 5s settle so post-install cell work (plotly figure construction, etc.) has time to emit console errors before we tally captured pageerror + console.error throughout all three phases. if any accumulate, the lab fails with actionable output.	2026-04-17 10:12:32 -04:00
Vijay Janapa Reddi	74948a498b	feat(labs/ci): browser-level wasm smoke via playwright + coep/coop adds a real headless chromium check to the wasm-smoke-test ci job. exports 4 representative labs, serves them with cross-origin isolation headers, and waits 180s per lab for a marimo dom signal (tab, cell, or island). any timeout or console error fails ci. motivation: lab_05_dist_train shipped broken in #1353 because plotly was imported before micropip.install() in the wasm runtime. static tests, engine tests, and node-pyodide wheel tests all passed. only a real browser with shared-array-buffer + coep/coop could catch it. adding lab_05_dist_train to the smoke set makes that specific regression class impossible to ship again silently. design: - labs/tests/browser_smoke.py: python http.server with coep/coop + cross- origin-resource-policy headers, threaded. playwright.sync_api drives chromium with --enable-features=SharedArrayBuffer. waits on selector union 'marimo-island, [role="tab"], .marimo-cell' with 180s budget. - pageerror + console.error handlers capture uncaught errors so failures have actionable output, not just 'timeout'. - single-job integration: reuses /tmp/wasm-smoke exports from the existing step, so no artifact handoff. timeout bumped 15 -> 25 minutes. the prior local attempt at this failed at 60s without coi headers; 180s with proper headers matches the real boot time for labs that install mlsysim + plotly + pint via micropip.	2026-04-17 10:05:13 -04:00
Vijay Janapa Reddi	0b9dc7fe50	refactor(labs/lab_00): manual Pattern C migration (closes #1347 ) split the gated 5-widget check-2 cell (old line 386) into an ungated widget-defs cell + a gated display cell, matching the canonical pattern from lab_01 post-#1339. widgets (model_size, quantization, move_server, faster_gpu, edge_deploy) are now always globally defined so downstream helpers (check2empty, check2value_list) and render cells never miss them when check1 is unanswered. removed the _KNOWN_MULTI_LEAK_LABS grandfather mechanism in labs/tests/test_static.py since the set went empty after this refactor. the one-widget-per-gated-cell rule is now strictly enforced across all 33 labs. closes #1347	2026-04-17 09:53:19 -04:00
Vijay Janapa Reddi	1c535fc7e0	test(labs): catch runtime-installed imports before micropip.install adds TestWASMRuntimeImportOrder::test_runtime_packages_imported_after_micropip_install to test_static.py. flags any top-level import of plotly, pydantic, pint, pandas, or mlsysim that appears BEFORE `await micropip.install([...])` within the same cell. this is the static regression test for the bug fixed in this PR (lab_05 importing plotly.subplots at line 55 before micropip installed plotly at line 60). existing CI completely missed this class of bug because: - marimo check: syntax only, doesn't trace dataflow - test_engine: runs in native python where plotly is already installed - test_static: had no check for import-order vs install-order - WASM smoke test: byte-size check only, doesn't actually load the page in pyodide only a real browser caught lab_05 in production. this test catches the bug class at static-analysis time. verified: the test fails on pre-fix lab_05 with clear message; passes on post-fix lab_05 and all 32 other labs.	2026-04-16 18:01:31 -04:00
Vijay Janapa Reddi	87ae4cd390	test(labs): shrink widget-gated-cell grandfather list 13 of 14 labs migrated to Pattern C across this branch. only vol1/lab_00 remains: its check1/check2/check3 pattern is structurally different from the partX_prediction idiom and the mechanical Pattern C transformation breaks test_engine.py (needs manual per-cell refactor). updates #1347.	2026-04-16 17:04:20 -04:00
Vijay Janapa Reddi	888ef19379	fix(labs): relax widget-gated-cell check to only catch multi-widget leaks the check in #1346 was overly strict. it flagged the sequential-unlock idiom that actually works (one widget per gated cell) as a violation, so 32 of 33 labs were xfailed. the real bug (lab_01 pre-#1339) was a gated cell defining MULTIPLE widgets, which creates a cascade of undefined deps when the gate fires. a cell defining one NEXT prediction widget is fine: gate fires → user sees unlock msg → answers prediction → gate clears → next cell unlocks. that's lab_02 through lab_16's pattern and test_engine.py confirms it works (70 passed). changes: - renamed test to test_no_multi_widget_leak_in_gated_cell for accuracy - raised threshold from >=1 leaked widget to >=2 - dropped the blanket xfail; the 14 labs with actual multi-widget debt are now grandfathered via _KNOWN_MULTI_LEAK_LABS pointing at #1347 - updated docstring to reflect the real rule and cite lab_01 post-#1339 as the canonical fix verification: - pytest labs/tests/test_static.py: 675 passed, 18 skipped (14 grandfathered + 4 pre-existing), 1 xfailed (unrelated) - new labs and the 19 currently-clean labs are strictly enforced - as labs get refactored, remove them from the grandfather set; when empty, bug class is closed	2026-04-16 16:25:51 -04:00
Vijay Janapa Reddi	31d2f5eb3b	feat(ci): tooling to catch fork-PR variable leaks and marimo dataflow bugs adds checks for two bug classes that each produced silent, long-lived failures: 1. workflow fork-safety (from #1344). any pull_request-triggered workflow that references ${{ vars.* }} or non-GITHUB_TOKEN ${{ secrets.* }} breaks silently on fork PRs because repo vars/secrets are not exposed in that context. a week of broken fork CI on #1306, #1331, #1339 before anyone noticed. 2. marimo widget-in-gated-cell (exposed by lab_01's broken state, fixed in #1339). when an @app.cell has mo.stop() AND defines mo.ui.* widgets that appear in its return tuple, those widgets don't exist until the gate unblocks, cascading "undefined dependency" failures through every cell that depends on them. ## changes - `.github/scripts/check_workflow_fork_safety.py`: standalone python + pyyaml, parses each workflow, identifies those triggered by pull_request, flags unsafe vars/secrets references with file:line:token pointers and a fix hint. exempts secrets.GITHUB_TOKEN which is always available. - `.github/workflows/ci-sanity.yml`: new workflow triggered on .github/ changes that runs the fork-safety check. catches contributors who don't have pre-commit installed. - `.pre-commit-config.yaml`: wires the fork-safety script as a local pre-commit hook under a new "SECTION 3.5: CI SANITY" so it runs on workflow edits. - `labs/tests/test_static.py`: new TestMarimoDataflow class with test_no_widget_defined_in_gated_cell. ast-based, flags cells that are both gated and define returned widgets. marked xfail for now because 32 of 33 labs currently have the pattern; the systematic refactor is separate scope. once labs are converted to the proper pattern (widget in own cell, gate cell is pure mo.stop, see vol2/lab_05_dist_train), remove the xfail. - `labs-validate-dev.yml`, `kits-validate-dev.yml`, `mlsysim-validate-dev.yml`: the "validate build output" step now echoes the resolved env var and fails loudly with an explanation if the var is empty, rather than silently checking a wrong path. would have diagnosed #1344 in 30 seconds instead of a week. ## verification - fork-safety check: ok, 47 workflows scanned, 8 pull_request-exposed, 0 violations - marimo check: ok across all 33 labs - pytest labs/tests/test_static.py: 656 passed, 4 skipped, 33 xfailed, 1 xpassed - vol2/lab_05_dist_train is xpassed (the canonical reference lab with the proper pattern)	2026-04-16 15:04:12 -04:00
Vijay Janapa Reddi	f4d0551fbc	fix(labs): ship mlsysim wheel for Pyodide and correct micropip path Point all Marimo labs at ../../wheels from labs/volN/ so browser loads resolve the wheel next to the repo root. Track mlsysim 0.1.0 wheel with a narrow .gitignore exception. Harden test_static so ../../../wheels/... cannot satisfy the wheel check via substring matching.	2026-04-12 19:23:26 -04:00
Vijay Janapa Reddi	7fdb49ee5c	feat(labs): gold-standard polish — stakeholders, hover templates, test fixes - Add 16 stakeholder messages to Vol2 labs 10, 11, 12, 15 (4 per lab) - Add 32 Plotly hover templates across 6 labs with units and precision - Fix Lab 00 DecisionLog dataflow bug (missing from cell return + signature) - Fix test_widget slider regex to handle Python underscore literals (100_000) - Fix test_widget slider count test: check total controls, not just sliders - Fix test_engine to skip gracefully when marimo is not installed - Standardize 4 chart heights from 420-450px to 380px Tests: 1,326 passed, 28 skipped, 0 failed (full suite)	2026-04-02 07:14:13 -04:00
Vijay Janapa Reddi	25d200638a	feat(labs): add Level 4 protocol invariant tests and integrate into CI New test_protocol.py validates 6 protocol invariants from PROTOCOL.md: - Invariant 1: constants sourced from mlsysim registries (not hardcoded) - Invariant 4: multi-part tabbed structure (4-5 parts + synthesis) - Invariant 5: multiple deployment contexts (2-3 hardware tiers) - Zone structure (4 zones: opening, widgets, tabs, ledger) - Ledger integration (ledger.save with correct chapter number) - Pedagogical flow (predictions per part, mo.stop gates, stakeholder msgs) Known gaps surface as xfail, not hard failures — provides a quality dashboard without blocking CI while labs are brought up to protocol.	2026-03-22 08:48:27 -04:00
Vijay Janapa Reddi	171e9a6de0	fix(labs): fix style.py f-string escaping, lab_16 cell dataflow, and clean up test suite - style.py: escape CSS @keyframes braces for Python f-string (was breaking all 33 labs) - lab_16: rename _axes/_target_met/_effective_gain to remove underscore prefix so Marimo dataflow wires them between TABS and LEDGER_HUD cells - components.py: add FailureBanner component for OOM/failure state visuals - test_engine.py: remove test_app_defines_core_variables (always skipped, Marimo cells don't export to module namespace) - CI workflow: remove 4 redundant bash validation steps (pytest covers them)	2026-03-22 08:43:24 -04:00
Vijay Janapa Reddi	da3c96ed55	fix(labs): mitigate WASM deployment risks (wheel URL, IndexedDB, pydantic)	2026-03-18 13:38:22 -04:00
Vijay Janapa Reddi	3f7e64fd52	fix(labs): mark aspirational test assertions as xfail Vol2 labs are still in development and don't yet use mo.ui.tabs. Mark test_has_tabs as xfail instead of hard failure. Also skip lab_00 for plotly import check and xfail Cortex-M7 check.	2026-03-16 16:36:23 -04:00
Vijay Janapa Reddi	3a555deabd	feat(labs): rebuild all 32 labs with chapter-grounded parts, ed-tech review fixes, and test infrastructure Complete lab curriculum rebuild: - 16 Vol1 + 16 Vol2 labs (33,368 lines across 32 files) - Each lab has 4-5 pedagogically grounded parts + synthesis - Parts proposed by agents reading actual chapter QMD files - Ed-tech review identified and fixed: redundancies, time overruns, mlsysim gaps - V2-03 + V2-06 merged into "Communication at Scale" - 17 old/superseded files deleted (pre-merger, pre-renumber) Structural fixes applied: - V1-01 Part C replaced (Silent Decay -> Triad Across Targets) - V1-03 grounded in mlsysim (Engine.solve for OOM, Engine.sweep for configs) - V1-09 Part C replaced (Curriculum Learning -> Preprocessing Tax) - V1-13 Part E dropped (redundant with Lab 10) - V1-15 Part C dropped (TCO not fairness-specific), refocused - V2-12 model extraction dropped, restructured around privacy cost - V2-16 capstone cut to 4 parts, Design Ledger fallbacks added - All 15-min parts trimmed to 12 min Test infrastructure added: - labs/tests/ with 3-level pytest suite (static, engine, widget) - CI workflow updated with pytest stages - pytest added to labs/requirements.txt	2026-03-15 16:26:50 -04:00

20 Commits