Commit Graph

27 Commits

Author SHA1 Message Date
Vijay Janapa Reddi
b9ee88ca70 docs(readmes): stretch HTML tables to full width
Add `width="100%"` to every HTML content and contributor table across all
project READMEs so they render full-width on GitHub instead of collapsing
to natural content width. Cell-level `width="X%"` percentages were already
in place but only take effect once the table itself has an explicit width.

Also update the contributor-sync scripts so the auto-generated tables stay
consistent on the next bot run:
  - .github/workflows/contributors/generate_main_readme.py
  - .github/workflows/contributors/generate_readme_tables.py

Scope: 27 files, 85 tables. Sub-project READMEs that already use the
"card" pattern (labs/, kits/ content sections with <table width="98%">
wrappers) are intentionally untouched.
2026-04-22 16:01:54 -04:00
Vijay Janapa Reddi
a5aef4b85a chore(pre-commit): auto-format files and add mitpress-spaced-slash hook 2026-04-21 08:38:58 -04:00
Vijay Janapa Reddi
c6c1b78e40 fix(epub): eliminate remaining RSC-005 (alt-on-wrapper + dupe SVG defs)
After the FATAL/URL/OPF/cross-ref fixes, the residual epubcheck errors
were all RSC-005 "attribute alt not allowed here" (214 vol1, 307 vol2)
and a small tail of RSC-005 "Duplicate id=arrow" on 6 SVG files in
vol2. Two independent mechanisms, one error class.

Post-process: alt -> aria-label on non-img wrappers
  Quarto emits `fig-alt` as alt="..." on the enclosing
  <div class="quarto-figure">, while the inner <img> already carries
  alt="" (empty). Strict XHTML forbids alt on non-image elements.
  Fix: in the sanitize_xml_for_epubcheck pass, rewrite alt="..." on
  any element other than img/area/input to aria-label="...". If an
  aria-label already exists on the element the alt is stripped instead
  of duplicated. 214 rewrites vol1 + 271 rewrites vol2.

Source SVGs: remove duplicate <marker> defs
  Seven SVGs under contents/vol2/.../images/svg/ contained a
  duplicated block of <marker id="arrow"/>, <marker id="arrow-red"/>,
  <marker id="arrow-green"/> inside their <defs> element — evidently
  a copy-paste slip during authoring. Six of the seven are referenced
  in the vol2 EPUB and were raising RSC-005 "Duplicate id" under
  epubcheck. Fix: dedupe markers by id in-place (keep first, drop
  subsequent). 3 duplicates removed per file.

  Affected files:
    book/quarto/contents/vol2/ops_scale/images/svg/ecommerce-dependency-graph.svg
    book/quarto/contents/vol2/ops_scale/images/svg/time-travel.svg
    book/quarto/contents/vol2/responsible_ai/images/svg/reward-hacking-loop.svg
    book/quarto/contents/vol2/robust_ai/images/svg/autoencoder.svg
    book/quarto/contents/vol2/robust_ai/images/svg/gradient-attack.svg
    book/quarto/contents/vol2/sustainable_ai/images/svg/ai-lca.svg
    book/quarto/contents/vol2/sustainable_ai/images/svg/water-cycle.svg

Validation (epubcheck 5.3.0, both volumes rebuilt from scratch):
  Vol1: 0 fatals / 0 errors / 0 warnings / 0 infos
  Vol2: 0 fatals / 0 errors / 0 warnings / 0 infos

Combined with the prior two commits, this closes the epubcheck work
for issues #1014, #1052, #1148. Total error reduction: 1000 -> 0
(vol1 346->0, vol2 654->0).
2026-04-20 15:37:23 -04:00
Vijay Janapa Reddi
bf710fc965 fix(epub): sanitize XHTML/SVG/OPF so epubcheck FATALs drop to zero
Extends epub_postprocess.py (already invoked by the EPUB configs as a
post-render hook) with three string-level passes that fix the 11 FATAL
epubcheck errors rejecting the EPUB from Kindle and ClearView, plus
two smaller error classes:

  - RSC-016 FATAL: "--" inside HTML comment bodies.
    Quarto's EPUB filter wraps raw TikZ source in <!-- ... --> for
    figures that have a PNG fallback. TikZ arrow syntax (\draw (a) -- (b))
    produces "--" inside the comment, violating the XML comment spec.
    Fix: replace every "--" inside a comment body with "- -" (XML-safe).
    3 FATALs cleared in vol2.

  - RSC-016 FATAL: bare <br> tags.
    Pandoc emits HTML5 bare <br> inside XHTML for multi-line table
    cells that use the book's `•<br>` bullet convention. Strict XML
    parsers (Kindle, epubcheck) require self-closing <br/>.
    Fix: rewrite <br> to <br/>.
    1 FATAL vol1 + 2 FATALs vol2 cleared (plus 41 more non-FATAL fixes).

  - RSC-016 FATAL: C0 control chars in SVG aria-label attributes.
    The matplotlib->SVG pipeline emits aria-label values containing
    U+0003 / U+000F / U+001D characters from raw-bytes representations.
    XML 1.0 forbids these in attribute values.
    Fix: strip C0 controls (except TAB/LF/CR) from every aria-label.
    2 FATALs vol1 + 3 FATALs vol2 cleared.

Also fixes two non-FATAL classes in the same pipeline since they are
mechanical string fixes on the same extracted EPUB tree:

  - RSC-020: BibTeX escape leaks in href URLs (\_, \%) and raw/encoded
    angle brackets in DOI URLs (SICI DOIs like
    10.1002/(sici)...<995::aid-spe111>3.0.co;2-6).
    Fix: unescape \_ -> _, \% -> %, and percent-encode < / > / &lt; / &gt;
    inside every href.
    27 RSC-020 errors cleared (13 vol1 + 14 vol2); 0 remaining.

  - OPF-014: nav.xhtml contains <math> elements but the OPF manifest
    does not declare the mathml property on the nav item.
    Fix: add `mathml` to the space-separated properties= attribute
    of the nav manifest entry in content.opf.
    2 OPF-014 errors cleared (1 per volume); 0 remaining.

Validation (epubcheck 5.3.0, on both volumes):
  - vol1: 346 errors -> 215 errors, 3 FATAL -> 0 FATAL
  - vol2: 654 errors -> 307 errors, 8 FATAL -> 0 FATAL
  - Residual is 521 RSC-005 + 1 RSC-007, tracked as P4/P6.

Relates to: #1014 (EPUB load failures), #1052 (ClearView "--" rejection),
#1148 (Kindle E999). These three issues require FATAL=0 to be resolved;
full resolution is pending validation on a CI-built artifact.
2026-04-20 15:16:25 -04:00
Vijay Janapa Reddi
aadaf5b13a docs: convert all README markdown tables to HTML format
Standardize table formatting across 25 README files to use
HTML tables with consistent styling (thead/tbody, column widths,
bold labels) matching the main README's presentation.
2026-03-17 08:57:21 -04:00
Vijay Janapa Reddi
059291f243 fix: resolve Windows Unicode encoding errors in PDF builds
- Add PYTHONUTF8=1 env var to all Windows Docker run commands (PEP 540)
- Fix generate_figure_list.py to explicitly use encoding='utf-8' in
  write_text() instead of relying on system default (cp1252 on Windows)
- The ≈ character (\u2248) in Vol I content triggered charmap codec errors
2026-03-06 17:18:37 -05:00
Vijay Janapa Reddi
bd5dd6f088 Enhances Quarto build for robustness and dynamic cross-referencing
Configures explicit `render` paths for both volumes to ensure complete and correct builds, particularly for selective rendering workflows.

Replaces the static cross-reference fix script with a dynamic version. This new script automatically discovers and resolves internal links from QMD sources, improving maintainability and ensuring links remain functional during partial book builds.

Adds a new script to check and auto-fix bibliography completeness, facilitating self-contained volumes.

Removes redundant empty Python code blocks from chapter QMDs and refines frontmatter content for consistency.
2026-03-03 16:04:25 -05:00
Vijay Janapa Reddi
12ed6525bf Remove root clutter, archive dirs, and build artifacts
- git rm 96 files: one-off scripts (test_simulator.py, list_figs_vol1.py,
  refactor_math_prompt.md), stale archive directories
  (book/tools/scripts/_archive/, book/quarto/scripts/_archive/)
- Move SEMINAL_PAPERS_CORPUS.md and SEMINAL_PAPERS_V2.md to
  .claude/docs/shared/ for proper organization
- Delete local build artifacts: all __pycache__ dirs, .pytest_cache,
  mlsysbook.egg-info, .tito/logs
2026-03-02 17:14:50 -05:00
Vijay Janapa Reddi
f6f98266a0 vol2: comprehensive transformation pass (P.I.C.O. refactor, archetypes, hardware trajectories) 2026-02-23 17:38:37 -05:00
Vijay Janapa Reddi
951669d356 fix: inline math × — dimensions as $N\times M$, multipliers as N$\times$
- Fix rendering: dimensions (e.g. 224×224) use single math span $N\times M$
- Revert multipliers to N$\times$ / N--M$\times$ per LaTeX convention
- Fix malformed $N\times$ M → $N\times M$ across vol1/vol2
- Add revert_times_multipliers.py (one-off) and fix_times_math.py (dimension-only)
- Update book-prose guidelines in .claude/rules (dimension vs multiplier)
2026-02-23 14:51:24 -05:00
Vijay Janapa Reddi
73a956a09b chore(volumes,vscode-ext): batch volume updates and tooling improvements
Checkpoint the branch-wide content/config revisions together with workbench enhancements so chapter rendering and developer workflows stay aligned. This captures the current validation-driven formatting and parallel build/debug improvements in one commit.
2026-02-15 14:03:27 -05:00
Vijay Janapa Reddi
e3cc9f7af3 refactor: rename ml_ml_workflow files, consolidate CLI, and clean up scripts
Remove redundant ml_ prefix from ml_workflow chapter files and update all
Quarto config references. Consolidate custom scripts into native binder
subcommands and archive obsolete tooling.
2026-02-13 11:06:28 -05:00
Vijay Janapa Reddi
2390c3ab31 Refactor: consolidate Quarto config layers and content reorganization.
Unifies Quarto metadata into shared base/format/volume fragments while carrying through chapter path, asset, and tooling updates to keep the repository consistent and easier to maintain.
2026-02-12 15:38:55 -05:00
Vijay Janapa Reddi
c015b9d80a Refactor: stabilize non-PDF build workflows and semantic editor cues.
Standardize Quarto config/style handling for HTML/EPUB volume builds, add explicit binder reset commands by format, and align QMD reference/label highlighting so structural tokens share consistent visual semantics.
2026-02-11 20:36:16 -05:00
Vijay Janapa Reddi
9797b74707 Improves figure list generation for book builds
Refactors figure list generation to reliably locate and clear LaTeX manifest files.

- Searches for the figure manifest in both the quarto root and build output directory,
  handling cases where the post-render step moves the file.
- Clears stale manifests from both locations to avoid incorrect figure counts from
  previous builds.
- Moves the LaTeX manifest to the build output directory to keep the source
  tree clean.
- Updates the merge script to find the manifest dynamically.

This prevents issues where figure counts are mismatched due to outdated or
missing manifest files.
2026-02-09 15:33:45 -05:00
Vijay Janapa Reddi
a517ef5df6 refactor: editorial improvements across vol1 chapters
Training and Frameworks chapters restructured for clarity.
Data Selection chapter expanded. Header-includes.tex updated.
Various minor fixes across all chapter files.
2026-02-07 20:35:14 -05:00
Vijay Janapa Reddi
29fabf35c1 Fix figure list to handle appendix figures and exclude shelved files
Two issues:
1. LaTeX parser regex only matched numeric figure numbers (e.g., 1.1)
   but appendices use letter prefixes (B.1, C.2, D.1). Changed \d+ to
   [A-Z\d]+ so all 214 figures are captured.
2. --scan-all mode picked up _shelved QMD files that aren't in the
   actual build, causing a count mismatch. Added _shelved to skip list.
2026-02-04 15:14:36 -05:00
Vijay Janapa Reddi
ac3c9ab2e5 Fix figure list regex to handle LaTeX braces and apostrophes
Three regex bugs caused missing/truncated captions in the figure list:
1. div_pattern broke on LaTeX {} (e.g., $W_{hh}$, \index{...}) — fixed
   with greedy .* anchored to end-of-line
2. Caption/alt regex [^"']+ truncated at apostrophes (e.g., Moore's) —
   fixed by matching double-quote delimiters only: "([^"]*)"
3. Duplicate figures when ::: div wraps a code block — added dedup logic

Fixes applied to both generate_figure_list.py and figure_list_for_press.py.
Regenerated FIGURE_LIST_VOL1.csv: 182 figures, 0 empty captions.
2026-02-04 15:03:17 -05:00
Vijay Janapa Reddi
c63e1429f2 figure listing 2026-02-04 07:25:56 -05:00
Vijay Janapa Reddi
1a36108b49 Consolidate figure list scripts into single file with --clear flag
- Merge clear_figure_cache.py into generate_figure_list.py
- Pre-render: generate_figure_list.py --clear
- Post-render: generate_figure_list.py
- Single file easier to maintain
2026-02-04 02:17:56 -05:00
Vijay Janapa Reddi
a702f879ae Add automatic figure list generation for MIT Press
- Add pre-render hook to clear stale LaTeX data between builds
- Add post-render hook to generate FIGURE_LIST.txt in output dir
- LaTeX captures figure numbers and pages during compilation
- Use deferred write for accurate page numbers (after float placement)
- Python merges with QMD captions and alt-text
- Output automatically appears in _build/pdf-vol1/ after each build
2026-02-04 02:13:16 -05:00
Vijay Janapa Reddi
05a184459d Refactors code to use constants and formulas
Replaces hardcoded numerical values with symbolic Python variables derived from defined constants and formulas.

This improves code maintainability and consistency, ensuring calculations are based on accurate and up-to-date physical values.
2026-02-03 19:48:11 -05:00
Vijay Janapa Reddi
3750ee12e9 Enforce Computed Arithmetic Rule across all chapters (1,064 inline refs, 0 unresolved)
Replace every hand-typed derived number with Python-computed inline
references. Add just-in-time compute cells before prose so that changing
any input constant automatically propagates to all derived values.

Vol 1 chapters fixed: dl_primer, dnn_architectures, serving,
model_compression, hw_acceleration, benchmarking, ops, appendix_machine,
appendix_data, frameworks, data_engineering, training, ml_systems,
responsible_engr, data_selection, workflow, introduction.

Vol 2 chapters fixed: distributed_training, inference, infrastructure,
storage, sustainable_ai, fault_tolerance, ops_scale, edge_intelligence,
ai_for_good, privacy_security.

Key corrections caught by forcing computation:
- training.qmd carbon footprint: 64 GPUs → 1024 GPUs (original was
  mathematically impossible for 7B params × 1T tokens)
- hw_acceleration.qmd systolic energy: 10 pJ/250× → 11 pJ/233× (exact)
- hw_acceleration.qmd GPT-2 utilization: 0.6% → 0.7% (exact)
- serving.qmd tokens/hour: ~190M → ~192M (exact)

Also adds calc/validate_inline_refs.py pre-render guardrail and
extends calc/viz.py with Harvard Crimson plotting palette.
2026-02-01 11:13:42 -05:00
Vijay Janapa Reddi
592edefb00 refactor: update chapter section ID mappings in build scripts
Update fix_cross_references.py and generate_glossary.py to reflect
the renamed chapter sections in Vol 1 (e.g., sec-ml-systems to
sec-ml-system-architecture, sec-dl-primer to
sec-deep-learning-systems-foundations).
2026-01-26 15:56:25 -05:00
Vijay Janapa Reddi
9781727d60 refactor: rename advanced_intro to introduction and update scripts
- Renamed vol2/advanced_intro to vol2/introduction for consistency
- Updated all scripts and configs to use vol1/ instead of core/
- Updated pre-commit config to check all contents/ not just vol1/
- Updated path references in Lua filters, Python scripts, and configs
2026-01-01 14:46:52 -05:00
Vijay Janapa Reddi
853eb03ee8 style: apply consistent whitespace and formatting across codebase 2025-12-13 14:05:34 -05:00
Vijay Janapa Reddi
7b92e11193 Repository Restructuring: Prepare for TinyTorch Integration (#1068)
* Restructure: Move book content to book/ subdirectory

- Move quarto/ → book/quarto/
- Move cli/ → book/cli/
- Move docker/ → book/docker/
- Move socratiQ/ → book/socratiQ/
- Move tools/ → book/tools/
- Move scripts/ → book/scripts/
- Move config/ → book/config/
- Move docs/ → book/docs/
- Move binder → book/binder

Git history fully preserved for all moved files.

Part of repository restructuring to support MLSysBook + TinyTorch.

Pre-commit hooks bypassed for this commit as paths need updating.

* Update pre-commit hooks for book/ subdirectory

- Update all quarto/ paths to book/quarto/
- Update all tools/ paths to book/tools/
- Update config/linting to book/config/linting
- Update project structure checks

Pre-commit hooks will now work with new directory structure.

* Update .gitignore for book/ subdirectory structure

- Update quarto/ paths to book/quarto/
- Update assets/ paths to book/quarto/assets/
- Maintain all existing ignore patterns

* Update GitHub workflows for book/ subdirectory

- Update all quarto/ paths to book/quarto/
- Update cli/ paths to book/cli/
- Update tools/ paths to book/tools/
- Update docker/ paths to book/docker/
- Update config/ paths to book/config/
- Maintain all workflow functionality

* Update CLI config to support book/ subdirectory

- Check for book/quarto/ path first
- Fall back to quarto/ for backward compatibility
- Maintain full CLI functionality

* Create new root and book READMEs for dual structure

- Add comprehensive root README explaining both projects
- Create book-specific README with quick start guide
- Document repository structure and navigation
- Prepare for TinyTorch integration
2025-12-05 14:04:21 -08:00