Commit Graph

10630 Commits

Author SHA1 Message Date
Vijay Janapa Reddi
0cc0361f60 fix: remove --params mirror arg from choco texlive install
The InstallerParameters flag passed to install-tl via --params was
corrupting the installer profile, causing abs_path($::installerdir)
to return undef and triggering the 'uninitialized value $tmp' Perl
error at install-tl line 651. Install without params and set the
tlmgr repository mirror post-install instead.
2026-03-02 20:16:37 -05:00
Vijay Janapa Reddi
954b7942c2 chore: harden Windows TeX Live install and default to latest
Improve Windows container reliability by pinning TeX Live installer mirrors with fallback and setting safer Chocolatey CI defaults. Make TeX Live version configurable via build arg and default to latest while retaining override support.
2026-03-02 19:32:40 -05:00
Vijay Janapa Reddi
f64ba2962c chore: resolve pre-commit warning backlog and stabilize checks
Normalize book prose/style issues across touched chapters and remove remaining structural warnings so validation output is clean and reproducible in CI. Also tighten inline/times-spacing validation behavior to reduce noisy false positives while preserving strict checks.
2026-03-02 19:04:35 -05:00
Vijay Janapa Reddi
8129e4b31f Improves artifact verification and output naming
Updates the book publishing workflow to conditionally verify downloaded artifacts based on the `deploy_target` input, preventing failures during partial deployments.

Explicitly sets the output filenames for EPUB and PDF builds in Quarto configurations, ensuring consistent naming for generated book artifacts.
2026-03-02 17:51:04 -05:00
Vijay Janapa Reddi
354cb2000f chore: extract shared HTML footer and update announcement banner
- Add config/shared/html/footer-common.yml with common page-footer elements
  (copyright/license left, GitHub/star right, background, border)
- Reduce _quarto-html-vol1.yml and _quarto-html-vol2.yml page-footer to
  volume-specific center link only; shared elements imported via metadata-files
- Update announcement bar: lead with two-volume launch, keep four-line format
2026-03-02 17:38:30 -05:00
Vijay Janapa Reddi
96fa7ac5e5 chore: bump Quarto to 1.9.27 and R to 4.5.2
- Quarto 1.9.27: Linux (.deb), Windows (direct download; Scoop Extras has 1.8.27)
- R 4.5.2: Linux (CRAN jammy-cran40), Windows (Scoop main/r)
- Baremetal: quarto-actions/setup for both Linux and Windows
- Remove ggrepel version pin (R 4.5.x supports ggrepel 0.9.7)
- Update docs: BUILD.md, CONTAINER_BUILDS.md, docker READMEs
2026-03-02 17:36:35 -05:00
Vijay Janapa Reddi
38ec2d66fb Fix image reference and pre-commit auto-fixes
- Rename _regression_testing.png to regression_testing.png for fault_tolerance.qmd
- Collapse extra blank lines (security_privacy, fault_tolerance)
- Prettify pipe tables (appendix_machine)
2026-03-02 17:21:56 -05:00
Vijay Janapa Reddi
5ec92f5e6a Merge branch 'feature/book-volumes' into dev 2026-03-02 17:16:19 -05:00
Vijay Janapa Reddi
bd151e75ca Expands TikZ libraries and color palette
Incorporates additional TikZ libraries to provide more versatile tools for diagram creation. Also introduces a new `GreenL0` color definition to extend the available color palette for visual elements.
2026-03-02 17:15:57 -05:00
Vijay Janapa Reddi
12ed6525bf Remove root clutter, archive dirs, and build artifacts
- git rm 96 files: one-off scripts (test_simulator.py, list_figs_vol1.py,
  refactor_math_prompt.md), stale archive directories
  (book/tools/scripts/_archive/, book/quarto/scripts/_archive/)
- Move SEMINAL_PAPERS_CORPUS.md and SEMINAL_PAPERS_V2.md to
  .claude/docs/shared/ for proper organization
- Delete local build artifacts: all __pycache__ dirs, .pytest_cache,
  mlsysbook.egg-info, .tito/logs
2026-03-02 17:14:50 -05:00
Vijay Janapa Reddi
0d6b8fee7a feat: add unified memory hierarchy reference and data locality invariant 2026-03-02 17:14:11 -05:00
Vijay Janapa Reddi
2bd6ed1cf0 Prefix unused images with _ and restore TikZ figures from main
- Rename 409 unused image files with _ prefix across vol1 (272) and
  vol2 (137) so they are visually identifiable without being deleted
- Restore polished TikZ figures from main branch into vol2 chapters:
  fault_tolerance, edge_intelligence, security_privacy, distributed_training,
  responsible_ai, sustainable_ai, robust_ai
- Remove all tikz-source backup blocks (0 remaining across vol2)
- Prefix 33 SVG files superseded by restored TikZ with _
- Add GreenL0, chains, shapes.arrows, decorations.pathreplacing to diagram.yml
2026-03-02 15:19:16 -05:00
Vijay Janapa Reddi
a88b25a69c Activates full book content for PDF builds
Uncomments all chapters, parts, frontmatter, and appendices in both Volume 1 and Volume 2 Quarto PDF configuration files. This ensures that the complete book content is included when generating PDF outputs.
2026-03-02 12:23:58 -05:00
Vijay Janapa Reddi
1669a5a63e Refactors diagrams to external SVG files
Replaces embedded TikZ code with external SVG image references across various chapters. This change enhances rendering performance, reduces document file size, and improves compatibility.

Includes minor text formatting adjustments for numerical values and symbols.
2026-03-02 12:01:41 -05:00
Vijay Janapa Reddi
e42c8bc4ea Refactor figures to SVG; enhance validation logic
Converts numerous inline TikZ diagrams to external SVG files across the book's content. This improves rendering performance, streamlines figure management, and ensures consistent visual presentation.

Enhances CLI validation by:
- Ignoring cross-reference IDs when checking for multiplication to prevent false positives.
- Stripping inline math spans before currency checks to avoid misinterpreting mathematical expressions as currency.
- Applying hex literal exclusions to pre-processed lines for more accurate validation.

Adds optional Matplotlib import to the plotting module for improved flexibility in environments where the library may not be available.
2026-03-02 11:59:41 -05:00
Vijay Janapa Reddi
d21e34ab73 Refines numerical multiplier formatting
Standardizes the representation of numerical multipliers and ranges across Quarto documents. This change improves the typographic rendering of expressions like `$10\times$` and `1.3--$2\times$`, enhancing consistency and readability of the book's content.
2026-03-02 11:56:05 -05:00
Vijay Janapa Reddi
1f568f4283 Remove hallucinator from default dependency set.
Avoid blocking CI and local bootstrap on an optional reference-check package that is not required by the pre-commit validation path.
2026-03-02 10:44:53 -05:00
Vijay Janapa Reddi
5c19052d2a Add pandas fallback for pre-commit hooks
Include pandas in validate-dev fallback dependencies to satisfy mlsysim.sim imports during book-mlsys-test-units when base requirements install partially fails.
2026-03-02 10:35:21 -05:00
Vijay Janapa Reddi
fdd90ce139 Stabilize dev pre-commit workflow
Add fallback hook dependencies in validate-dev and apply trailing-whitespace fixes to lab plan files so pre-commit no longer fails on auto-modifications.
2026-03-02 10:22:41 -05:00
Vijay Janapa Reddi
a342170b67 Fix pre-commit fallback dependency install
Install rich in validate-dev pre-commit step so book hooks still run when full requirements install partially fails.
2026-03-02 10:16:09 -05:00
Vijay Janapa Reddi
e0117cebfa Merge feature/book-volumes: volumes + tinytorch + kits + colab 2026-03-02 09:45:48 -05:00
Vijay Janapa Reddi
1052b2be31 Update book workflows for volume-only builds
Switch container/baremetal/validate/preview/live flows to vol1+vol2 artifacts, keep baremetal in dev validation, and add stable single-book navbar link.
2026-03-02 09:45:40 -05:00
Vijay Janapa Reddi
a7f9367e42 Merge dev into feature/book-volumes: CI, contributors, workflows
# Conflicts:
#	README.md
2026-03-02 09:38:47 -05:00
Vijay Janapa Reddi
900b8e6f66 Merge feature/colab-notebooks into feature/book-volumes 2026-03-02 09:38:26 -05:00
Vijay Janapa Reddi
a24d00aba4 Merge feature/hardware-kits into feature/book-volumes 2026-03-02 09:38:24 -05:00
Vijay Janapa Reddi
48b519c42e Merge feature/tinytorch-core into feature/book-volumes
# Conflicts:
#	README.md
#	tinytorch/src/01_tensor/01_tensor.py
#	tinytorch/src/15_quantization/ABOUT.md
2026-03-02 09:38:08 -05:00
Vijay Janapa Reddi
73db0e021a Streamlines chapter introduction
Removes a sentence that summarized the chapter's structure.
This change simplifies the immediate opening, aligning with broader content organization efforts.
2026-03-02 09:37:06 -05:00
Vijay Janapa Reddi
0ae4545bbc Enables multi-project contributor additions
Allows a single `@all-contributors` comment to add or update a contributor across multiple projects simultaneously.

Updates the workflow to:
- Detect multiple projects from explicit mentions in the trigger comment.
- Iterate over all detected projects to update their respective `.all-contributorsrc` files and project `README.md` tables.
- Adapt commit messages and bot replies to reflect multi-project changes.

This improves efficiency for managing contributors in multi-project repositories by reducing repetitive commands.
2026-03-02 09:27:59 -05:00
github-actions[bot]
358879a300 docs: add @salmanmkc as book contributor for doc 2026-03-02 14:01:05 +00:00
Vijay Janapa Reddi
8abbf533d8 fix(ci): bump R package cache to fix missing ggrepel on Linux builds
- Add -v2 to cache key to invalidate stale R package cache
- Add ggrepel to verification step to catch missing packages early
- Fixes hw_acceleration.qmd build failure (fig-processor-trends chunk)
2026-03-02 08:49:07 -05:00
Vijay Janapa Reddi
8a1b0b8cd5 Reorganizes Introduction chapter content and prose
Moves the 'Scaling the Machine: From Node to Fleet' section to a more logical position
within the chapter, following the discussion on defining ML systems.

Refines various sentences for improved clarity, conciseness, and a more formal,
impersonal tone. Adds an introductory sentence to better outline the chapter's
structure and movements.
2026-03-02 08:38:57 -05:00
Vijay Janapa Reddi
ca34ba6bc7 fix: update lab_17_ml_conclusion with spec-accurate structure
Second agent pass matched the LABS_SPEC brief more precisely:
- Act I renamed to 'Design Ledger Archaeology' — reads actual ledger history,
  computes per-domain constraint hit rate, renders radar chart + bar chart
- Act II is 'The Final Architecture Challenge' with 6 simultaneous scorecard
  constraints (accuracy, P99, DP, adversarial, carbon, fault tolerance)
- Stakeholder scenario: Chief Architect / Principal Engineer promotion framing
- Medical fleet (1000 hospitals, 100k inferences/day) as the deployment target
- Curriculum journey timeline grid (all 33 labs) in closing section
- All constants match spec: FLEET_SIZE_NODES=1000, COAL_CI_G_KWH=820, etc.
2026-03-01 20:06:54 -05:00
Vijay Janapa Reddi
9b2f6ee01b fix: update lab_15_sustainable_ai with spec-accurate carbon constants
Second agent pass produced higher-fidelity implementation:
- Carbon intensity corrected to spec values: coal 820 gCO2/kWh (was 400),
  renewable 40 gCO2/kWh (was 10) — matching @tbl-carbon-intensity and EPA eGRID 2022
- Stakeholder scenario aligned to LABS_SPEC brief (1000-node cluster, 30% flexible jobs)
- SLA failure state is kind='danger'; below-target is kind='warn'
- Deployment contexts use spec-exact labels (Coal Region vs Renewable Region)
- All syntax-verified clean
2026-03-01 20:05:06 -05:00
Vijay Janapa Reddi
6f5732558f feat: add complete first-draft labs for both volumes (33 Marimo labs)
Add all Vol1 (labs 01-16) and Vol2 (labs 01-17) interactive Marimo labs
as the first full first-pass implementation of the ML Systems curriculum labs.

Each lab follows the PROTOCOL 2-Act structure (35-40 min):
- Act I: Calibration with prediction lock → instruments → overlay
- Act II: Design challenge with failure states and reflection

Key pedagogical instruments introduced progressively:
- Vol1: D·A·M Triad, Iron Law, Memory Ledger, Roofline, Amdahl's Law,
  Little's Law, P99 Histogram, Compression Frontier, Chouldechova theorem
- Vol2: NVLink vs PCIe cliff, Bisection BW, Young-Daly T*, Parallelism Paradox,
  AllReduce ring vs tree, KV-cache model, Jevons Paradox, DP ε-δ tradeoff,
  SLO composition, Adversarial Pareto, two-volume synthesis capstone

All 35 staged files pass AST syntax verification (36/36 including lab_00).

Also includes:
- labs/LABS_SPEC.md: authoritative sub-agent brief for all lab conventions
- labs/core/style.py: expanded unified design system with semantic color tokens
2026-03-01 19:59:04 -05:00
Vijay Janapa Reddi
67e549e482 feat(labs): rewrite lab_00 as ML Systems architect portal
Complete rewrite of lab_00_introduction.py with four sections:

1. The 95% Problem — ML systems vs ML framing (not models, infrastructure)
2. Physical Constraints — speed of light, thermodynamics, memory physics
3. Four Deployment Regimes — Cloud/Edge/Mobile/TinyML constraint walls
4. Interface Orientation — live cockpit tour (tabs, levers, prediction lock, MathPeek)

Each concept block gates the next via mo.stop() with structured checks:
- Check 1: radio MCQ (silent degradation / ML systems domain)
- Check 2: multiselect (AV latency / speed-of-light constraint)
- Check 3: radio scenario (ICU sensor / TinyML constraint analysis)

Interface tour uses real mo.ui components (dropdown, slider, accordion)
so students build motor memory for the cockpit before Lab 01 content begins.

Design Ledger initialized at completion with deployment context + check answers.
Fix: DesignLedger import corrected to labs.core.state (not mlsysim.sim.ledger).
Verified: Exit 0 under Python 3.13 with marimo 0.19.6.
2026-03-01 19:11:31 -05:00
Vijay Janapa Reddi
7e6cbc96ca Merge remote-tracking branch 'origin/dev' into dev 2026-03-01 18:43:22 -05:00
Vijay Janapa Reddi
c56cb62c25 feat: implement mlsysim dashboard platform and initial interactive labs
- Implement universal 4-zone dashboard cockpit in mlsysim.viz.dashboard
- Add Lab 00: Flight School (Persona & Dashboard Onboarding)
- Add Lab 15: Sustainable AI (Grid-Interactive Scheduler Dashboard)
- Update Mission Plans for Systems, Data, and Orchestration with 3-act narrative
- Establish mlsysim at repo root as future-proof analytical engine
2026-03-01 18:39:13 -05:00
Vijay Janapa Reddi
533cfa6e99 fix: pre-commit hooks — all 48 checks now pass
- book/quarto/mlsys/__init__.py: add repo-root sys.path injection so
  mlsysim is importable when scripts run from book/quarto/ context
- book/quarto/mlsys/{constants,formulas,formatting,hardware}.py: new
  compatibility shims that re-export from mlsysim.core.* and mlsysim.fmt
- mlsysim/viz/__init__.py: remove try/except for dashboard import; use
  explicit "import from mlsysim.viz.dashboard" pattern instead
- .codespell-ignore-words.txt: add "covert" (legitimate security term)
- book/tools/scripts/reference_check_log.txt: delete generated artifact
- Various QMD, bib, md files: auto-formatted by pre-commit hooks
  (trailing whitespace, bibtex-tidy, pipe table alignment)
2026-03-01 17:30:24 -05:00
Vijay Janapa Reddi
c30f2a3bfd refactor: move mlsysim to repo root, extract fmt module from viz
Moves the mlsysim package from book/quarto/mlsysim/ to the repo root
so it is importable as a proper top-level package across the codebase.

Key changes:
- mlsysim/fmt.py: new top-level module for all formatting helpers (fmt,
  sci, check, md_math, fmt_full, fmt_split, etc.), moved out of viz/
- mlsysim/viz/__init__.py: now exports only plot utilities; dashboard.py
  (marimo-only) is no longer wildcard-exported and must be imported
  explicitly by marimo labs
- mlsysim/__init__.py: added `from . import fmt` and `from .core import
  constants`; removed broken `from .viz import plots as viz` alias
- execute-env.yml: fixed PYTHONPATH from "../../.." to "../.." so
  chapters resolve to repo root, not parent of repo
- 51 QMD files: updated `from mlsysim.viz import <fmt-fns>` to
  `from mlsysim.fmt import <fmt-fns>`
- book/quarto/mlsys/: legacy shadow package contents cleaned up;
  stub __init__.py remains for backward compat
- All Vol1 and Vol2 chapters verified to build with `binder build pdf`
2026-03-01 17:24:11 -05:00
Vijay Janapa Reddi
6a763c2552 Fix Node 1 NVLink ring arrowhead tangents in hierarchical-allreduce.svg
Offset the 2nd bezier control point x from the endpoint x on all four
Node 1 ring arcs so orient="auto" computes a diagonal arrival angle
instead of a straight vertical arrowhead.
2026-03-01 16:02:21 -05:00
Vijay Janapa Reddi
b0d826df64 Add Vol 2 textbook-quality SVG figures across all 17 chapters
Generated and audited 122 SVG figures covering all Vol 2 chapters:
introduction, compute_infrastructure, network_fabrics, data_storage,
distributed_training, collective_communication, fault_tolerance,
performance_engineering, inference, fleet_orchestration, ops_scale,
edge_intelligence, responsible_ai, robust_ai, security_privacy,
sustainable_ai. All figures follow the shared SVG style guide
(680x460 viewBox, Helvetica Neue, no embedded titles). Layout audit
applied 11 fixes for text overflow, out-of-bounds elements, and
missing arrowheads.
2026-03-01 15:51:20 -05:00
github-actions[bot]
ae4322101d Update contributors list [skip ci] 2026-03-01 16:41:28 +00:00
Vijay Janapa Reddi
7994f91e0e Merge pull request #1178 from salmanmkc/upgrade-github-actions-node24
Upgrade GitHub Actions for Node 24 compatibility
2026-03-01 11:36:51 -05:00
Vijay Janapa Reddi
6bddf33d1a Merge pull request #1208 from harishb00/patch-1
Fixed typo in GitHub user links and avatars in README
2026-03-01 11:33:48 -05:00
Vijay Janapa Reddi
bf9c402827 Adds callout-definition blocks to all Vol.2 chapters and fixes pre-commit hook errors
- Adds standardized callout-definition blocks with bold term + clear definition
  to all Vol.2 chapters (distributed training, inference, network fabrics, etc.)
- Fixes caption_inline_python errors: replaces Python inline refs in table
  captions with static text in responsible_engr, appendix_fleet, appendix_reliability,
  compute_infrastructure
- Fixes undefined_inline_ref errors: adds missing code fence for PlatformEconomics
  class in ops_scale.qmd; converts display math blocks with Python refs to prose
- Fixes render-pattern errors: moves inline Python outside $...$ math delimiters
  in conclusion, fleet_orchestration, inference, introduction, network_fabrics,
  responsible_ai, security_privacy, sustainable_ai, distributed_training
- Fixes dropcap errors: restructures drop-cap sentences in hw_acceleration and
  nn_architectures to not start with cross-references
- Fixes unreferenced-label errors: removes @ prefix from @sec-/@tbl- refs inside
  Python comment strings in training, model_compression, ml_systems
- Adds clientA to codespell ignore words (TikZ node label in edge_intelligence)
- Updates mlsys constants, hardware, models, and test_units for Vol.2 calculations
- Updates _quarto.yml and references.bib for two-volume structure
2026-03-01 10:44:33 -05:00
Harish
9a6a363b62 Update GitHub user links and avatars in README 2026-03-01 15:18:51 +05:30
Vijay Janapa Reddi
69736d3bdb updates 2026-02-28 18:20:47 -05:00
Vijay Janapa Reddi
3266bc7dfa Standardize chapter discovery via Quarto config
Refactors chapter discovery across CLI commands to use a single, canonical source of truth: the volume's Quarto PDF configuration file.

Introduces a new `get_chapters_from_config` function in `core/discovery.py` that parses the `_quarto-pdf-{volume}.yml` to derive the ordered list of testable chapter stems. This ensures consistent chapter order for `build` and `debug` operations, reducing duplication and improving maintainability.

Updates `build.py` and `debug.py` to delegate all chapter list retrieval to this new centralized method within `ChapterDiscovery`. Also enhances chapter QMD file location to support shared content paths.
2026-02-28 17:08:17 -05:00
Vijay Janapa Reddi
ae6f5d9f11 Refines book structure; modularizes embedded code and updates content
Updates Quarto configurations to reorder, add, and rename appendices across all output formats for both volumes, and includes previously commented chapters in PDF builds.

Encapsulates Python calculation logic and exported variables within dedicated classes across numerous Quarto documents, improving modularity, maintainability, and clarity of in-text references.

Refines MLOps definitions, corrects TCO calculation with distinct inference GPU rates, adjusts distributed training scaling scenarios (e.g., commodity network bandwidth), and clarifies network fabric details (e.g., FEC latency).
2026-02-28 17:00:09 -05:00
Vijay Janapa Reddi
d299e49d10 update 2026-02-28 16:25:00 -05:00