- Rename 409 unused image files with _ prefix across vol1 (272) and
vol2 (137) so they are visually identifiable without being deleted
- Restore polished TikZ figures from main branch into vol2 chapters:
fault_tolerance, edge_intelligence, security_privacy, distributed_training,
responsible_ai, sustainable_ai, robust_ai
- Remove all tikz-source backup blocks (0 remaining across vol2)
- Prefix 33 SVG files superseded by restored TikZ with _
- Add GreenL0, chains, shapes.arrows, decorations.pathreplacing to diagram.yml
Uncomments all chapters, parts, frontmatter, and appendices in both Volume 1 and Volume 2 Quarto PDF configuration files. This ensures that the complete book content is included when generating PDF outputs.
Replaces embedded TikZ code with external SVG image references across various chapters. This change enhances rendering performance, reduces document file size, and improves compatibility.
Includes minor text formatting adjustments for numerical values and symbols.
Converts numerous inline TikZ diagrams to external SVG files across the book's content. This improves rendering performance, streamlines figure management, and ensures consistent visual presentation.
Enhances CLI validation by:
- Ignoring cross-reference IDs when checking for multiplication to prevent false positives.
- Stripping inline math spans before currency checks to avoid misinterpreting mathematical expressions as currency.
- Applying hex literal exclusions to pre-processed lines for more accurate validation.
Adds optional Matplotlib import to the plotting module for improved flexibility in environments where the library may not be available.
Standardizes the representation of numerical multipliers and ranges across Quarto documents. This change improves the typographic rendering of expressions like `$10\times$` and `1.3--$2\times$`, enhancing consistency and readability of the book's content.
Include pandas in validate-dev fallback dependencies to satisfy mlsysim.sim imports during book-mlsys-test-units when base requirements install partially fails.
Add fallback hook dependencies in validate-dev and apply trailing-whitespace fixes to lab plan files so pre-commit no longer fails on auto-modifications.
Switch container/baremetal/validate/preview/live flows to vol1+vol2 artifacts, keep baremetal in dev validation, and add stable single-book navbar link.
Removes a sentence that summarized the chapter's structure.
This change simplifies the immediate opening, aligning with broader content organization efforts.
Allows a single `@all-contributors` comment to add or update a contributor across multiple projects simultaneously.
Updates the workflow to:
- Detect multiple projects from explicit mentions in the trigger comment.
- Iterate over all detected projects to update their respective `.all-contributorsrc` files and project `README.md` tables.
- Adapt commit messages and bot replies to reflect multi-project changes.
This improves efficiency for managing contributors in multi-project repositories by reducing repetitive commands.
- Add -v2 to cache key to invalidate stale R package cache
- Add ggrepel to verification step to catch missing packages early
- Fixes hw_acceleration.qmd build failure (fig-processor-trends chunk)
Moves the 'Scaling the Machine: From Node to Fleet' section to a more logical position
within the chapter, following the discussion on defining ML systems.
Refines various sentences for improved clarity, conciseness, and a more formal,
impersonal tone. Adds an introductory sentence to better outline the chapter's
structure and movements.
Second agent pass matched the LABS_SPEC brief more precisely:
- Act I renamed to 'Design Ledger Archaeology' — reads actual ledger history,
computes per-domain constraint hit rate, renders radar chart + bar chart
- Act II is 'The Final Architecture Challenge' with 6 simultaneous scorecard
constraints (accuracy, P99, DP, adversarial, carbon, fault tolerance)
- Stakeholder scenario: Chief Architect / Principal Engineer promotion framing
- Medical fleet (1000 hospitals, 100k inferences/day) as the deployment target
- Curriculum journey timeline grid (all 33 labs) in closing section
- All constants match spec: FLEET_SIZE_NODES=1000, COAL_CI_G_KWH=820, etc.
Add all Vol1 (labs 01-16) and Vol2 (labs 01-17) interactive Marimo labs
as the first full first-pass implementation of the ML Systems curriculum labs.
Each lab follows the PROTOCOL 2-Act structure (35-40 min):
- Act I: Calibration with prediction lock → instruments → overlay
- Act II: Design challenge with failure states and reflection
Key pedagogical instruments introduced progressively:
- Vol1: D·A·M Triad, Iron Law, Memory Ledger, Roofline, Amdahl's Law,
Little's Law, P99 Histogram, Compression Frontier, Chouldechova theorem
- Vol2: NVLink vs PCIe cliff, Bisection BW, Young-Daly T*, Parallelism Paradox,
AllReduce ring vs tree, KV-cache model, Jevons Paradox, DP ε-δ tradeoff,
SLO composition, Adversarial Pareto, two-volume synthesis capstone
All 35 staged files pass AST syntax verification (36/36 including lab_00).
Also includes:
- labs/LABS_SPEC.md: authoritative sub-agent brief for all lab conventions
- labs/core/style.py: expanded unified design system with semantic color tokens
Complete rewrite of lab_00_introduction.py with four sections:
1. The 95% Problem — ML systems vs ML framing (not models, infrastructure)
2. Physical Constraints — speed of light, thermodynamics, memory physics
3. Four Deployment Regimes — Cloud/Edge/Mobile/TinyML constraint walls
4. Interface Orientation — live cockpit tour (tabs, levers, prediction lock, MathPeek)
Each concept block gates the next via mo.stop() with structured checks:
- Check 1: radio MCQ (silent degradation / ML systems domain)
- Check 2: multiselect (AV latency / speed-of-light constraint)
- Check 3: radio scenario (ICU sensor / TinyML constraint analysis)
Interface tour uses real mo.ui components (dropdown, slider, accordion)
so students build motor memory for the cockpit before Lab 01 content begins.
Design Ledger initialized at completion with deployment context + check answers.
Fix: DesignLedger import corrected to labs.core.state (not mlsysim.sim.ledger).
Verified: Exit 0 under Python 3.13 with marimo 0.19.6.
- book/quarto/mlsys/__init__.py: add repo-root sys.path injection so
mlsysim is importable when scripts run from book/quarto/ context
- book/quarto/mlsys/{constants,formulas,formatting,hardware}.py: new
compatibility shims that re-export from mlsysim.core.* and mlsysim.fmt
- mlsysim/viz/__init__.py: remove try/except for dashboard import; use
explicit "import from mlsysim.viz.dashboard" pattern instead
- .codespell-ignore-words.txt: add "covert" (legitimate security term)
- book/tools/scripts/reference_check_log.txt: delete generated artifact
- Various QMD, bib, md files: auto-formatted by pre-commit hooks
(trailing whitespace, bibtex-tidy, pipe table alignment)
Moves the mlsysim package from book/quarto/mlsysim/ to the repo root
so it is importable as a proper top-level package across the codebase.
Key changes:
- mlsysim/fmt.py: new top-level module for all formatting helpers (fmt,
sci, check, md_math, fmt_full, fmt_split, etc.), moved out of viz/
- mlsysim/viz/__init__.py: now exports only plot utilities; dashboard.py
(marimo-only) is no longer wildcard-exported and must be imported
explicitly by marimo labs
- mlsysim/__init__.py: added `from . import fmt` and `from .core import
constants`; removed broken `from .viz import plots as viz` alias
- execute-env.yml: fixed PYTHONPATH from "../../.." to "../.." so
chapters resolve to repo root, not parent of repo
- 51 QMD files: updated `from mlsysim.viz import <fmt-fns>` to
`from mlsysim.fmt import <fmt-fns>`
- book/quarto/mlsys/: legacy shadow package contents cleaned up;
stub __init__.py remains for backward compat
- All Vol1 and Vol2 chapters verified to build with `binder build pdf`
Offset the 2nd bezier control point x from the endpoint x on all four
Node 1 ring arcs so orient="auto" computes a diagonal arrival angle
instead of a straight vertical arrowhead.
- Adds standardized callout-definition blocks with bold term + clear definition
to all Vol.2 chapters (distributed training, inference, network fabrics, etc.)
- Fixes caption_inline_python errors: replaces Python inline refs in table
captions with static text in responsible_engr, appendix_fleet, appendix_reliability,
compute_infrastructure
- Fixes undefined_inline_ref errors: adds missing code fence for PlatformEconomics
class in ops_scale.qmd; converts display math blocks with Python refs to prose
- Fixes render-pattern errors: moves inline Python outside $...$ math delimiters
in conclusion, fleet_orchestration, inference, introduction, network_fabrics,
responsible_ai, security_privacy, sustainable_ai, distributed_training
- Fixes dropcap errors: restructures drop-cap sentences in hw_acceleration and
nn_architectures to not start with cross-references
- Fixes unreferenced-label errors: removes @ prefix from @sec-/@tbl- refs inside
Python comment strings in training, model_compression, ml_systems
- Adds clientA to codespell ignore words (TikZ node label in edge_intelligence)
- Updates mlsys constants, hardware, models, and test_units for Vol.2 calculations
- Updates _quarto.yml and references.bib for two-volume structure
Refactors chapter discovery across CLI commands to use a single, canonical source of truth: the volume's Quarto PDF configuration file.
Introduces a new `get_chapters_from_config` function in `core/discovery.py` that parses the `_quarto-pdf-{volume}.yml` to derive the ordered list of testable chapter stems. This ensures consistent chapter order for `build` and `debug` operations, reducing duplication and improving maintainability.
Updates `build.py` and `debug.py` to delegate all chapter list retrieval to this new centralized method within `ChapterDiscovery`. Also enhances chapter QMD file location to support shared content paths.
Updates Quarto configurations to reorder, add, and rename appendices across all output formats for both volumes, and includes previously commented chapters in PDF builds.
Encapsulates Python calculation logic and exported variables within dedicated classes across numerous Quarto documents, improving modularity, maintainability, and clarity of in-text references.
Refines MLOps definitions, corrects TCO calculation with distinct inference GPU rates, adjusts distributed training scaling scenarios (e.g., commodity network bandwidth), and clarifies network fabric details (e.g., FEC latency).
Renames Volume II parts from V-VIII to I-IV, updating all corresponding references in the about section, volume introduction, and individual part principle files.
Refines various textual elements across the book for improved conciseness and readability. Cleans up markdown formatting, including removal of unnecessary horizontal rules and empty code blocks. Adjusts footnote placement for better consistency.
Adds new reliability calculation parameters and corrects a tikz diagram rendering issue.
Transforms the initial lab from a static manifesto into an interactive orientation. Introduces three Knowledge Assessment Tasks (KATs) designed to calibrate architectural intuition:
- KAT 1: Explores the cost implications of discovering design constraints late.
- KAT 2: Illustrates non-linear physical scaling laws through processor power consumption.
- KAT 3: Guides users to select a career specialization, defining their long-term mission and binding physical constraints.
Integrates new interactive components and a persistent Design Ledger HUD to enhance user engagement and track progress through the curriculum.
Establishes the foundational content for a structured ML engineering curriculum, covering topics from single-node physics to fleet-scale orchestration.
Adds detailed mission plans for 16 labs in Volume 1 and 17 labs in Volume 2. Each plan outlines chapter context, core invariants, narrative arcs, 3-part missions with objectives, interactive workbenches, and reflection questions to define comprehensive learning experiences.