- hw_acceleration: escape % in callout title 'The Five-Percent Utilization Mystery'
(LaTeX treats % as comment char in div attribute titles, truncating the box)
- data_selection: escape % in callout title 'The Ninety-Nine Percent Sparsity Trap'
(same \fbxSimple runaway argument error)
- model_compression: remove 28-line orphaned stale class body (merge artifact);
add missing mat_dim=4096 to LowRankFactorization class parameters
- model_serving: move littles-law-calc code cell before the prose that references
its exported variables (serving_qps_str etc. used before they were defined)
Fixes a series of LaTeX/Pandoc compilation errors across Vol2 so every
chapter builds cleanly with `binder build pdf <chapter> --vol2 -v`.
Key fixes applied:
- Citations removed from fig-caps, table cells, and footnote definitions
(Quarto 1.8 `marginCitePlaceholderInlineWithProtection` bug with
`citation-location: margin`); citations restored to surrounding prose
- TikZ nodes with `\\` line breaks given `align=center/left` to exit
LR mode (robust_ai, sustainable_ai)
- `\argmax` → `\operatorname{arg\,max}` (undefined in amsmath)
- `\texorpdfstring` wrapping for math in section headers (notation)
- Multi-line `{python}` inline expressions in grid tables converted to
pipe tables (appendix_communication)
- Math expressions split across grid table row boundaries converted to
pipe tables to avoid `\{\beta\}\$` rendering corruption
- Stale class references (`ImageNetBottleneck`, `PrefetchBuffer`,
`CheckpointStorage`) fixed → `StorageEconomics.*` (data_storage)
- Missing `batch_per_gpu` factor in aggregate bandwidth formula (data_storage)
- Duplicate `xytext` keyword in `ax.annotate()` call (edge_intelligence)
- `<` HTML entity mixed with unescaped `$` in table cells fixed (security_privacy)
- Incorrect `check()` invariant corrected (appendix_fleet)
Refines book abstracts, table of contents, and diagram configurations for improved clarity and structure.
This commit enhances the descriptions of both Volume I and Volume II, emphasizing their respective focuses. It also introduces a framework decision tree to guide the selection of parallel training strategies and inference frameworks, and diagrams for visualizing hardware constraints.
Creates a YAML configuration file specifically for generating the PDF version of Volume II: Machine Learning Systems at Scale.
This configuration defines the project structure, book metadata (title, author, abstract), chapter organization, and PDF-specific settings like cover page design, table of contents depth, and inclusion of LaTeX files for custom styling.
This allows for independent building and customization of the PDF output for Volume II.
Improves the data pipeline debugging flowchart by adding visual cues.
These cues help to highlight the type of data issue being investigated
and make the flowchart easier to understand.
Enhances the conclusion of Volume 1, improving clarity and flow by:
- Refining wording and structure for better readability
- Clarifying the connection between theoretical invariants and practical applications
- Adding information for clarity and context
Audits and refactors Volume 2 chapters to ensure all Python calculation cells adhere to the P.I.C.O. (Parameters, Invariants, Calculation, Outputs) standard.
- Consolidates storage specifications and economics into StorageSetup and StorageEconomics classes in data_storage.qmd.
- Refactors collective communication math into the AllReduceCost class in collective_communication.qmd.
- Standardizes infrastructure and performance engineering setups in compute_infrastructure.qmd and performance_engineering.qmd.
- Corrects NameErrors and missing imports in benchmarking and platform ROI calculations.
- Ensures all prose variables are correctly exported and scoped within Safe Class Namespaces to prevent global pollution and ensure mathematical consistency across the fleet-scale narrative.
Audits all Volume 1 chapters to identify and repair structural errors in Python calculation cells introduced during the P.I.C.O. refactor.
- Consolidates redundant memory calculations and fixes missing imports in nn_computation.qmd.
- Refactors AttentionMemory in nn_architectures.qmd to resolve NameErrors and duplicated blocks.
- Cleans up QuantizationSpeedup and restores MobileNetCompressionAnchor in model_compression.qmd.
- Resolves missing Models and Hardware imports in benchmarking.qmd.
- Updates LighthouseModels in ml_systems.qmd with missing variables for MobileNet and KWS.
- Corrects indentation and structural integrity across all Volume 1 calculation scenarios to ensure valid rendering and mathematical consistency.
Restructures Volume II to improve narrative flow and address scale impediments, including reordering of sections and addition of introductory material.
Introduces "Master Map" to guide readers through the volume's layered progression.
Adds callout notes to bridge concepts between sections.
Moves references.qmd to backmatter and adjusts chapter organization for clarity.
Updates hardware parameterization and network performance modeling within code blocks.
Deepens understanding of abstract principles by adding concrete examples and numerical anchors.
These additions provide tangible context and illustrate the practical implications of the discussed concepts, which aids in comprehension and application. It also adds context to constraints, economics and performance.
Updates concept map YAML files for various chapters in volume 1, including introduction, benchmarking, data engineering, data selection, frameworks, hardware acceleration, ML systems, MLOps, ML workflow, model serving, NN architectures, NN computation, optimizations, responsible engineering, and training.
Replaces the old YAML structure with a new structure that focuses on primary, secondary concepts, technical terms, methodologies, and formulas. The change emphasizes the core concepts and their relationships within each chapter. The generated dates are updated to reflect a future date.
Insert thesis declarations, spine reconnections, and evidence elevations
that make the book's central claim explicit: ML systems engineering is a
distinct discipline governed by permanent physical laws. No restructuring
or deletions; insertions only, matching the surrounding rhetorical register.
- Remove 17 empty per-chapter .bib files (all contained only newlines)
- Consolidate HTML and EPUB configs to use central backmatter/references.bib
(matching the pattern already used by PDF configs and Vol1)
- Rename responsible_engineering/ to responsible_ai/ for consistency with
robust_ai/ and sustainable_ai/ in Part IV: The Responsible Fleet
- Update all 4 Quarto config files with new path
Major structural reorganization of Volume II:
- New 4-part structure: The Fleet, Distributed ML, Deployment at Scale, The Responsible Fleet
- Fleet Stack framework (Infrastructure/Distribution/Serving/Governance) replaces Systems Sandwich
- Renamed and reorganized 8 chapter directories to match new structure
- Absorbed ai_good/ into responsible_engineering and emerging_challenges/ into introduction
- Wrote/expanded 6 new chapters (collective_communication, compute_infrastructure,
fleet_orchestration, network_fabrics, data_systems, performance_engineering)
- Fixed 116+ broken @sec- cross-references across all 16 chapters and glossary
- Updated all 4 Quarto config files, part-openers, and summaries.yml
- Added \mlfleetstack LaTeX command for PDF rendering
- Removed old 5-part HTML artifacts and macOS resource fork files
- Converted grid tables to pipe tables in fleet_orchestration
- Fixed inline Python in display math blocks in collective_communication
- Resolved duplicate tbl-tco-comparison label and stale part key reference
- Add war story callout definition in custom-numbered-blocks.yml
- Create war story icon in all three formats (SVG, PNG, PDF) matching
the 64x64 stroke-only style used by all other callout icons
- Add war story bibliography and PDF config entry
- Add first war story ("The Quadratic Wall") in nn_architectures
- Include icon conversion utility script
Aligns the Distributed Training chapter with the Volume 2 'Systems Sandwich'
framework, establishing it as the 'Operational Layer' of the Machine Learning Fleet.
Key changes:
- Refactors 'Purpose' and 'Learning Objectives' to use rhetorical pivots and
focus on the 'Physics of the Fleet'.
- Updates Python setup cell to use the 'Safe Class Namespace' pattern (P.I.C.O.)
and adds Archetype A (GPT-4) constants.
- Rewrites 'Multi-Machine Scaling Fundamentals' to center on the
'Communication-Computation Ratio' and the 'Law of Distributed Efficiency'.
- Cross-references the Volume 2 Introduction definitions to create a cohesive narrative.
Aligns the rhetorical style and quantitative rigor of the Volume 2
Introduction with the established Volume 1 standards.
Introduces the "Machine Learning Fleet" narrative as the central
engineering challenge of Volume 2, shifting from single-node
optimization to cluster-scale orchestration.
Key changes:
- Establishes the "Law of Distributed Efficiency" and "CI Ratio"
(Communication Intensity) as new quantitative frameworks.
- Defines the "Reliability Gap" to address statistical failure
certainty in massive clusters.
- Refactors all TikZ diagrams (Systems Sandwich, Roadmap, AI Triad)
to use project-standard colors and Helvetica font.
- Updates the "Lighthouse Archetypes" to focus on throughput,
latency, and resource-bound fleet challenges.
- Implements P.I.C.O. math patterns for fleet-scale calculations.
Renames and restructures the framework evolution section to
"The Ladder of Abstraction," emphasizing the problem-solving
nature of each abstraction layer.
Clarifies the role of each layer (BLAS/LAPACK, NumPy,
Deep Learning Frameworks) in solving specific problems related
to performance, usability, and differentiation, respectively.
Highlights the trade-offs between productivity and transparency
as we move up the abstraction ladder.
Refines the notation section to explicitly state the use
of decimal SI prefixes for data, memory, and compute.
Updates wording for clarity and consistency, specifically
addressing units, storage, and compute contexts.
Ensures that the book uses only decimal SI prefixes and
specifies the formatting of numbers and units.
- debugCommands: prompt for format (PDF, HTML, EPUB) in Build All Chapters (Parallel)
- parallelDebug: clearer success/fail messages, Open Reports Folder, REPORT.md header
- README: document volume + format selection for parallel builds
Checkpoint the branch-wide content/config revisions together with workbench enhancements so chapter rendering and developer workflows stay aligned. This captures the current validation-driven formatting and parallel build/debug improvements in one commit.
Use a file watcher to detect when the PDF is created/modified during
build, then automatically open it in VS Code. Build still runs in the
visible terminal so users see progress. Also fix LaTeX comma-in-title
bug in foldbox.tex by bracing the title argument inside tcolorbox options.
Convert all remaining lowercase 'x' used as multiplication (e.g.,
"1000x faster") to $\times$ across 17 vol2 chapters. These were
flagged by the new lowercase_x_multiplication validator check.
Simplifies the validator regex from a fragile word-list approach to a
broader pattern matching digit-x-lowercase (e.g., \dx\s+[a-z]) which
naturally excludes hardware counts (8x A100) and hex literals (0x61).
Includes the conversion script in _archive.
Convert all Unicode × (U+00D7) to LaTeX $\times$ in prose, tables, and
math contexts across both volumes. Unicode × is preserved only inside
fig-alt text for accessibility screen readers. One instance inside a
plain markdown backtick code span (frameworks.qmd) was reverted to
Unicode × since LaTeX doesn't render in code spans.
Updates validate.py with a new lowercase-x-as-multiplication check and
refines the latex_adjacent warning to distinguish _str variables (safe)
from raw inline Python. Updates validate_inline_refs.py comments to
reflect the new convention. Includes the conversion script in _archive.
Audited all 52 backward-looking prose references ("recall", "as we saw",
"introduced earlier") across all 16 Vol I chapters. Found 46 valid and
6 with issues; fixed the 4 actionable ones:
- benchmarking: fix dual attribution for energy-movement claim
- hw_acceleration: fix imprecise "100x" energy gap to "orders-of-magnitude"
- hw_acceleration: change "introduced in" to "mentioned in" for HBM ref
- conclusion: correct invariant attribution from data_engineering to Part I
Audit report: .claude/_reviews/2026-02-15_backward-reference-audit.yaml
Add a persistent health indicator to the extension: a status bar item
that shows pass/warn/error at a glance, plus a health summary node at
the top of the Pre-commit tree view. Fast in-process TypeScript checks
run on file save, editor switch, and startup (<100ms per file).
Checks: duplicate labels, unclosed div fences, missing figure alt-text,
and unresolved in-file cross-references.
- Add src/validation/qmdChecks.ts with four pure check functions
- Add src/validation/healthManager.ts with central status tracker
- Wire HealthManager into extension.ts with status bar and event hooks
- Add expandable health summary node to PrecommitTreeProvider
- Register showHealthDetails command in package.json
Second pass catching ~37 additional instances missed in the initial
cleanup, including prose in frameworks, glossary definitions, footnotes,
fig-caps, fig-alts, table cells, and callout content.
All remaining `Nx` patterns are now exclusively inside Python code
blocks (comments, docstrings, f-strings) or are mathematical variable
expressions (e.g., derivative = 2x), which are correct as-is.
Integrate figure rendering into the binder CLI so plots can be previewed
without a full Quarto build. Extracts Python code blocks with fig-* labels
from QMD files, renders them to PNG, and outputs a browsable gallery at
_output/plots/<chapter>/. Also fixes the package import chain so `binder`
works correctly as an installed entry point.
- Add book/cli/commands/render.py with RenderCommand class
- Wire into main.py with help table entry and command dispatch
- Add matplotlib>=3.7.0 to pyproject.toml dependencies
- Add book/quarto/_output/ to .gitignore
- Archive standalone render_figures.py to _archive/
Every callout-takeaways block across Vol 1 now has a title attribute
that captures the chapter's core insight rather than repeating the
chapter name. Titles are drawn from each chapter's purpose question
or central thesis, answering "what should a student remember six
months from now?" Examples: "Constraints Drive Architecture" (Intro),
"Perfectly Available, Perfectly Wrong" (ML Ops), "Architecture Is
Infrastructure" (Network Architectures).
Typed references (@tbl-, @fig-, @sec-, @lst-, @eq-) and label
definitions ({#tbl-...}, {#fig-...}) were all rendering in generic
blue because overlapping decorations overrode the typed colors.
Fix by collecting typed matches first and excluding them from
generic/structural buckets. Also fix label-definition regexes to
match labels with trailing attributes (e.g. {#fig-foo fig-env=...}).
Change footnote colors from invisible slate-gray to distinct pink/rose
for clear visual separation from other reference types.
Remove all 27 individual color-override settings (mlsysbook.color*)
since only the preset picker (subtle/balanced/vivid) is needed.
Remove QmdDiagnosticsManager and WorkspaceLabelIndex which caused
false-positive blue squiggles during workspace index loading. Pre-commit
hooks already validate cross-references and inline Python at commit time.
Also remove div fence marker highlighting (Quarto handles natively),
add !inFence guards to label line checks, remove broken reference
decoration, and change footnote colors from gold to muted slate-gray
to convey their marginal/supplementary nature.
Remove redundant ml_ prefix from ml_workflow chapter files and update all
Quarto config references. Consolidate custom scripts into native binder
subcommands and archive obsolete tooling.
- Default label types now include Equation (was missing from default set)
- --check-patterns now defaults to True for inline-refs
- Removed redundant --all-types from VSCode extension command
All five label types (Figure, Table, Section, Equation, Listing) are
now always checked unless explicitly filtered with --figures/--tables/etc.
Port all custom validation and maintenance scripts into the binder CLI
as native subcommands, eliminating the need for standalone scripts.
New `binder validate` subcommands (10):
- section-ids: verify all headers have {#sec-...} IDs
- forbidden-footnotes: check footnotes in tables/captions/divs
- footnotes: validate footnote refs/defs (undefined, unused, duplicate)
- figure-completeness: check figures have captions and alt-text
- figure-placement: audit figure/table proximity to first reference
- index-placement: check LaTeX \index{} placement
- render-patterns: detect problematic rendering patterns
- dropcap: validate drop cap compatibility
- part-keys: validate \part{key:...} against summaries.yml
- image-refs: validate image references exist on disk
New `binder maintain` subcommands (2):
- section-ids (add/repair/list/remove): full section ID lifecycle
- footnotes (cleanup/reorganize/remove): footnote management
Updated 11 pre-commit hooks to use binder commands instead of scripts.
Updated VSCode extension commands to use binder CLI.
All validators verified against original script output (parity confirmed).
- Add WorkspaceLabelIndex: scans all .qmd files on activation, updates
incrementally on save, provides hasLabel() for cross-file validation
- Extend QmdDiagnosticsManager to validate references against workspace
index (not just current file); triggered on save only, not keystrokes
- Add broken reference decoration (red wavy underline) in chunk
highlighter for refs that don't resolve to any label in the workspace
- Add commands: Add Missing Section IDs, Verify Section IDs,
Validate Cross-References (command palette)
- Enable diagnostics by default (save-triggered, not noisy)
- Support YAML-style label definitions (#| label:, #| fig-label:, etc.)