Commit Graph

10643 Commits

Author SHA1 Message Date
Vijay Janapa Reddi
417f47722a fix(pdf): align Vol 2 reference/citation location with Vol 1 (document) 2026-02-21 15:59:29 -05:00
Vijay Janapa Reddi
218ad6ad93 style(tables): use • and <br> for list cells in pipe tables
- edge_intelligence: Constraint-Solution Mapping (•&nbsp; → •<br>)
- security_privacy: Defense Selection Framework (- Item → •<br>)
- sustainable_ai: Critical Materials (semicolons → •<br>)

Consistent with hw_acceleration pattern for HTML/PDF/EPUB.
2026-02-21 15:58:32 -05:00
Vijay Janapa Reddi
b4c86de3f1 refactor(figures): div wrapper pattern, responsive CSS, remove redundant labels
- Wrap Python figures in div with fig-env, fig-pos, fig-cap, fig-alt
- Remove #| label from blocks when div has #fig-xxx
- Add responsive figure CSS (max-width, height: auto) for HTML
- Add figure headers to Python blocks (Context, Goal, Show, How, Imports, Exports)
2026-02-21 15:52:26 -05:00
Vijay Janapa Reddi
99661315eb feat(pdf): pipe table cell line breaks with <br> and makecell
- Add Lua filter to convert <br> in table cells to \makecell for PDF
- Use pipe table with • and <br> in hw_acceleration hardware evolution table
- Add makecell package; set arraystretch to 1.6; top-align makecell cells
- Register filter in PDF config
2026-02-21 15:50:37 -05:00
Vijay Janapa Reddi
eaa545f115 docs: add documentation-style headers to Python figure blocks
Add Context/Goal/Show/How/Imports/Exports headers to all Python figure
blocks (#| label: fig-*) in Vol 1 and Vol 2, matching the setup-block
pattern. Headers placed after Quarto options and before imports.
2026-02-21 14:44:24 -05:00
Vijay Janapa Reddi
b756cb7da3 fix: vscode extension activation error handling and workspace detection
- extension.ts: wrap activate() body in try/catch so activation failures
  surface in the Output channel instead of crashing silently
- workspace.ts: return undefined when no book/binder marker is found
  instead of returning the first workspace folder unconditionally
2026-02-21 14:33:43 -05:00
Vijay Janapa Reddi
9e809d21c4 feat: full-stack Pint robustness and class-based namespace isolation
Python library (mlsys/):
- constants.py: add ureg.default_format, set_application_registry, MS alias comment
- formatting.py: isinstance checks, add fmt_full(), fmt_split(), .m_as() modernization
- formulas.py: fleet formulas return Quantity, @ureg.check() decorators, .m_as() everywhere
- hardware.py: dimension-first validation in __post_init__, Quantity[float] annotations
- models.py: __post_init__ dimension checks, size_in_bytes() enforcement, ureg.count→ureg.param
- test_units.py: +50 robustness tests (wrong-unit HardwareSpec, fleet formulas, fmt_full)
- validate_pint_usage.py: new static analysis script for Pint anti-patterns in QMD files
- transform_pico_cells.py: transformation script for PICO cell restructuring

QMD chapters (Vol1 + Vol2 — all 43 chapters with Python cells):
- Wrapped all Python compute cells in class-based namespace isolation (PICO pattern)
- Added EXPORTS bridges so class-internal values are accessible to prose inline Python
- Modernized .to(unit).magnitude → .m_as(unit) throughout
- Removed bare .magnitude calls; all unit extractions now explicit
- Fleet appendices (appendix_fleet, appendix_communication, appendix_reliability):
  full Quantity-return cascade for MTBF, AllReduce, Young-Daly, checkpoint formulas

All 43 chapters verified building cleanly (HTML) after changes.
2026-02-21 14:33:36 -05:00
Vijay Janapa Reddi
b887b91a2c fix: resolve cross-cell export gaps found during comprehensive HTML build verification
After the class-based namespace isolation pass, missing EXPORTS bridge
variables were discovered by running all chapters through the HTML build pipeline.

Vol1 fixes:
- nn_computation: add hog_grid_str/hog_bins_str exports; convert generator
  expressions to for-loops (Python 3 class scope skips class namespace);
  add mnist_large/small_l1/l2 exports for footnote inline Python
- ml_systems: add cloud_compute/memory/ai_frac, mobile_tops/bw/ratio/
  bottleneck/compute/memory_frac, cloud_thresh_bw_str, edge_thresh_bw_str
  exports; complete ResnetMobile EXPORTS section
- data_selection: fix FpScalingCalc invariant (min_samples_threshold 50→150
  so 100 expected rare samples < 150 threshold holds true)
- model_compression: FusionCalc bandwidth_reduction invariant 50→40%
- nn_architectures: add 'param' unit to lighthouse-table-specs imports

Vol2 fixes:
- data_storage: add missing 'watt' import to chapter setup cell
- fault_tolerance: export per_node_gbs raw float for prose arithmetic
- appendix_fleet: export rho_7b raw float for fmt() call in prose
- appendix_c3: add .magnitude to calc_effective_flops() result (returns
  Quantity since formulas.py upgrade, not raw float)
- appendix_reliability: wrap worked-example-young-daly in class with EXPORTS

All 43 chapters with Python cells verified passing after fixes.
2026-02-21 14:20:43 -05:00
Vijay Janapa Reddi
5677633b4c Update symlinks to point to vol1 build config after Vol1 build run 2026-02-21 10:51:05 -05:00
Vijay Janapa Reddi
edb2dd17b0 Fix Vol1 standalone PDF build errors across 4 chapters
- hw_acceleration: escape % in callout title 'The Five-Percent Utilization Mystery'
  (LaTeX treats % as comment char in div attribute titles, truncating the box)
- data_selection: escape % in callout title 'The Ninety-Nine Percent Sparsity Trap'
  (same \fbxSimple runaway argument error)
- model_compression: remove 28-line orphaned stale class body (merge artifact);
  add missing mat_dim=4096 to LowRankFactorization class parameters
- model_serving: move littles-law-calc code cell before the prose that references
  its exported variables (serving_qps_str etc. used before they were defined)
2026-02-21 10:48:12 -05:00
Vijay Janapa Reddi
35cc915041 fix: ensure all 36 Vol2 chapters build as standalone PDFs
Fixes a series of LaTeX/Pandoc compilation errors across Vol2 so every
chapter builds cleanly with `binder build pdf <chapter> --vol2 -v`.

Key fixes applied:

- Citations removed from fig-caps, table cells, and footnote definitions
  (Quarto 1.8 `marginCitePlaceholderInlineWithProtection` bug with
  `citation-location: margin`); citations restored to surrounding prose
- TikZ nodes with `\\` line breaks given `align=center/left` to exit
  LR mode (robust_ai, sustainable_ai)
- `\argmax` → `\operatorname{arg\,max}` (undefined in amsmath)
- `\texorpdfstring` wrapping for math in section headers (notation)
- Multi-line `{python}` inline expressions in grid tables converted to
  pipe tables (appendix_communication)
- Math expressions split across grid table row boundaries converted to
  pipe tables to avoid `\{\beta\}\$` rendering corruption
- Stale class references (`ImageNetBottleneck`, `PrefetchBuffer`,
  `CheckpointStorage`) fixed → `StorageEconomics.*` (data_storage)
- Missing `batch_per_gpu` factor in aggregate bandwidth formula (data_storage)
- Duplicate `xytext` keyword in `ax.annotate()` call (edge_intelligence)
- `&lt;` HTML entity mixed with unescaped `$` in table cells fixed (security_privacy)
- Incorrect `check()` invariant corrected (appendix_fleet)
2026-02-21 09:44:46 -05:00
github-actions[bot]
8c373dfc58 docs: add @Pratham-ja as tinytorch contributor for code, bug 2026-02-21 14:42:31 +00:00
Vijay Janapa Reddi
94d079b57c Merge feature/tinytorch-core into dev (fixes #1184) 2026-02-21 09:39:50 -05:00
Vijay Janapa Reddi
d7d288dace Fix UnicodeDecodeError on Windows in tito module complete (fixes #1184)
Add encoding='utf-8' and errors='replace' to subprocess.run() calls in
workflow.py so unit and integration test output decode correctly on
Windows (cp1252) when output contains UTF-8 characters.

Co-authored-by: Pratham-ja <114498234+Pratham-ja@users.noreply.github.com>
2026-02-21 09:38:08 -05:00
Vijay Janapa Reddi
fc093ab8de Merge pull request #1194 from harvard-edge/fix/ch5
Update figure in chapter 5
2026-02-21 09:30:09 -05:00
Zeljko Hrcek
678a218372 Update figure in chapter 5 2026-02-21 15:20:35 +01:00
Vijay Janapa Reddi
62b98edee1 Updates book content and configuration
Refines book abstracts, table of contents, and diagram configurations for improved clarity and structure.

This commit enhances the descriptions of both Volume I and Volume II, emphasizing their respective focuses. It also introduces a framework decision tree to guide the selection of parallel training strategies and inference frameworks, and diagrams for visualizing hardware constraints.
2026-02-21 08:19:01 -05:00
Vijay Janapa Reddi
0614676798 Adds PDF config for Volume II of the book
Creates a YAML configuration file specifically for generating the PDF version of Volume II: Machine Learning Systems at Scale.

This configuration defines the project structure, book metadata (title, author, abstract), chapter organization, and PDF-specific settings like cover page design, table of contents depth, and inclusion of LaTeX files for custom styling.

This allows for independent building and customization of the PDF output for Volume II.
2026-02-21 08:17:13 -05:00
Vijay Janapa Reddi
9e35563d00 Merge remote-tracking branch 'origin/feature/book-volumes' into feature/book-volumes 2026-02-21 08:16:50 -05:00
Vijay Janapa Reddi
c68ca02d9e Enhances data pipeline debugging flowchart
Improves the data pipeline debugging flowchart by adding visual cues.

These cues help to highlight the type of data issue being investigated
and make the flowchart easier to understand.
2026-02-21 08:15:29 -05:00
Vijay Janapa Reddi
87ffaf288d Refines content for Volume 1 conclusion
Enhances the conclusion of Volume 1, improving clarity and flow by:

- Refining wording and structure for better readability
- Clarifying the connection between theoretical invariants and practical applications
- Adding information for clarity and context
2026-02-21 07:59:34 -05:00
Vijay Janapa Reddi
718f867039 Vol1: improve book abstracts and chapter content
- Config: academic, standalone abstracts for PDF/EPUB/copyedit
- Chapters: ml_systems, nn_architectures, nn_computation, training
2026-02-21 06:58:22 -05:00
Zeljko Hrcek
96efa3bf29 Merge pull request #1190 from Zeljko-Hrcek/fix/training
Update training chapter and add missing color definition
2026-02-21 09:55:21 +01:00
Zeljko Hrcek
ae2ef83dd3 Merge branch 'feature/book-volumes' into fix/training 2026-02-21 09:54:24 +01:00
Zeljko Hrcek
403994ea2e Update training chapter and add missing color definition 2026-02-21 09:28:04 +01:00
Vijay Janapa Reddi
09602445de chore: update book content, config, appendices, and tooling
- Vol1: chapter updates across backmatter, benchmarking, data, frameworks, etc.
- Vol2: content updates, new appendices (assumptions, communication, fleet, reliability)
- Quarto: config, styles, formulas, constants
- Add SEMINAL_PAPERS_V2.md, learning_objectives_bolding_parallel.sh
- VSCode extension: package.json, chapterNavigatorProvider
- Landing page and docs updates
2026-02-20 18:55:24 -05:00
Vijay Janapa Reddi
b5a9e590db Standardizes P.I.C.O. code blocks and consolidates specifications across Volume 2
Audits and refactors Volume 2 chapters to ensure all Python calculation cells adhere to the P.I.C.O. (Parameters, Invariants, Calculation, Outputs) standard.

- Consolidates storage specifications and economics into StorageSetup and StorageEconomics classes in data_storage.qmd.
- Refactors collective communication math into the AllReduceCost class in collective_communication.qmd.
- Standardizes infrastructure and performance engineering setups in compute_infrastructure.qmd and performance_engineering.qmd.
- Corrects NameErrors and missing imports in benchmarking and platform ROI calculations.
- Ensures all prose variables are correctly exported and scoped within Safe Class Namespaces to prevent global pollution and ensure mathematical consistency across the fleet-scale narrative.
2026-02-20 16:53:20 -05:00
Vijay Janapa Reddi
abc7ef01d8 Fixes broken P.I.C.O. code blocks and missing imports across Volume 1
Audits all Volume 1 chapters to identify and repair structural errors in Python calculation cells introduced during the P.I.C.O. refactor.

- Consolidates redundant memory calculations and fixes missing imports in nn_computation.qmd.
- Refactors AttentionMemory in nn_architectures.qmd to resolve NameErrors and duplicated blocks.
- Cleans up QuantizationSpeedup and restores MobileNetCompressionAnchor in model_compression.qmd.
- Resolves missing Models and Hardware imports in benchmarking.qmd.
- Updates LighthouseModels in ml_systems.qmd with missing variables for MobileNet and KWS.
- Corrects indentation and structural integrity across all Volume 1 calculation scenarios to ensure valid rendering and mathematical consistency.
2026-02-20 15:43:42 -05:00
Vijay Janapa Reddi
69f46d4f7e Clarifies memoization computation savings
Refines the explanation of K,V computation savings in the memoization module,
quantifying redundant computations and highlighting the efficiency gain.

The paper and module now specify that generating 100 tokens requires 5,050
total K,V computations, but only 100 are necessary, resulting in 4,950
redundant calculations.
2026-02-19 17:59:10 -05:00
github-actions[bot]
68167c3d1b docs: add @Pratham-ja as tinytorch contributor for doc 2026-02-19 22:43:52 +00:00
Vijay Janapa Reddi
0eec623b70 Merge pull request #1183 from Pratham-ja/bugfix/fix-ascii-graphs
Improve activation graph visualization in Module 02
2026-02-19 17:40:09 -05:00
unknown
5f7a696077 Improve activation graph visualization in Module 02
- Clarify node labeling
- Improve spacing for readability
- No API changes
2026-02-20 03:48:10 +05:30
Vijay Janapa Reddi
b6b2c94988 Refactors Volume II content and structure
Restructures Volume II to improve narrative flow and address scale impediments, including reordering of sections and addition of introductory material.

Introduces "Master Map" to guide readers through the volume's layered progression.

Adds callout notes to bridge concepts between sections.

Moves references.qmd to backmatter and adjusts chapter organization for clarity.

Updates hardware parameterization and network performance modeling within code blocks.
2026-02-19 14:39:54 -05:00
Vijay Janapa Reddi
45f46ad70d Reinforces key concepts with concrete examples
Deepens understanding of abstract principles by adding concrete examples and numerical anchors.

These additions provide tangible context and illustrate the practical implications of the discussed concepts, which aids in comprehension and application. It also adds context to constraints, economics and performance.
2026-02-19 14:35:48 -05:00
Vijay Janapa Reddi
13b29eb0ea Refactors concept maps for volume 1 chapters
Updates concept map YAML files for various chapters in volume 1, including introduction, benchmarking, data engineering, data selection, frameworks, hardware acceleration, ML systems, MLOps, ML workflow, model serving, NN architectures, NN computation, optimizations, responsible engineering, and training.

Replaces the old YAML structure with a new structure that focuses on primary, secondary concepts, technical terms, methodologies, and formulas. The change emphasizes the core concepts and their relationships within each chapter. The generated dates are updated to reflect a future date.
2026-02-19 13:49:04 -05:00
Vijay Janapa Reddi
e11ad3d44c Strengthen Vol1 intellectual spine with nine micro-insertions across 12 chapters
Insert thesis declarations, spine reconnections, and evidence elevations
that make the book's central claim explicit: ML systems engineering is a
distinct discipline governed by permanent physical laws. No restructuring
or deletions; insertions only, matching the surrounding rhetorical register.
2026-02-19 13:03:05 -05:00
Vijay Janapa Reddi
9942b21fb3 fix: remove ngbolin from book contributors (was incorrectly added by re-triggered bot)
ngbolin was correctly added to tinytorch (PR #1180) but the edited-comment
re-trigger on PR #1181 ran the old LLM code which hallucinated ngbolin as
the username instead of pipme.
2026-02-19 12:58:40 -05:00
Vijay Janapa Reddi
b6bd4adfcc fix: correct @pipme username (was misspelled as pipmea by LLM bot) 2026-02-19 12:51:52 -05:00
github-actions[bot]
6d9095d021 Update contributors list [skip ci] 2026-02-19 17:08:12 +00:00
Vijay Janapa Reddi
2bdebdae22 merge: bring workflow fix from main (regex username extraction) 2026-02-19 12:05:33 -05:00
Vijay Janapa Reddi
7a5da798dd fix(ci): extract contributor username via regex instead of LLM
The LLM (llama3.1:8b) was hallucinating usernames — e.g. returning
"pipmea" instead of "pipme". Since the username is always present as
an @mention in the trigger line, extract it deterministically via regex
in Step 1 and only use the LLM to classify contribution types.
2026-02-19 12:05:23 -05:00
github-actions[bot]
7c2a6d6a0a docs: add @ngbolin as book contributor for doc 2026-02-19 17:03:30 +00:00
RinZ27
7d2cd5a47d Improve robustness of dataset extraction by validating paths 2026-02-19 22:57:20 +07:00
Vijay Janapa Reddi
717dcebc31 Consolidate bib files and rename responsible_engineering to responsible_ai
- Remove 17 empty per-chapter .bib files (all contained only newlines)
- Consolidate HTML and EPUB configs to use central backmatter/references.bib
  (matching the pattern already used by PDF configs and Vol1)
- Rename responsible_engineering/ to responsible_ai/ for consistency with
  robust_ai/ and sustainable_ai/ in Part IV: The Responsible Fleet
- Update all 4 Quarto config files with new path
2026-02-19 09:39:07 -05:00
Vijay Janapa Reddi
3c40d1288b Restructures Vol2 from 5-part/19-chapter to 4-part/16-chapter Fleet Stack architecture
Major structural reorganization of Volume II:
- New 4-part structure: The Fleet, Distributed ML, Deployment at Scale, The Responsible Fleet
- Fleet Stack framework (Infrastructure/Distribution/Serving/Governance) replaces Systems Sandwich
- Renamed and reorganized 8 chapter directories to match new structure
- Absorbed ai_good/ into responsible_engineering and emerging_challenges/ into introduction
- Wrote/expanded 6 new chapters (collective_communication, compute_infrastructure,
  fleet_orchestration, network_fabrics, data_systems, performance_engineering)
- Fixed 116+ broken @sec- cross-references across all 16 chapters and glossary
- Updated all 4 Quarto config files, part-openers, and summaries.yml
- Added \mlfleetstack LaTeX command for PDF rendering
- Removed old 5-part HTML artifacts and macOS resource fork files
- Converted grid tables to pipe tables in fleet_orchestration
- Fixed inline Python in display math blocks in collective_communication
- Resolved duplicate tbl-tco-comparison label and stale part key reference
2026-02-19 09:35:37 -05:00
github-actions[bot]
a8b3e3a29c docs: add @pipmea as book contributor for doc 2026-02-19 14:10:55 +00:00
Vijay Janapa Reddi
d061df5a75 Merge pull request #1181 from pipme/patch-1
Fix PDF download link in README.md
2026-02-19 09:07:42 -05:00
pipme
24fb275abd Fix PDF download link in README.md 2026-02-19 15:15:10 +02:00
Vijay Janapa Reddi
739b48622f Add war story callout with proper icon formats and supporting files
- Add war story callout definition in custom-numbered-blocks.yml
- Create war story icon in all three formats (SVG, PNG, PDF) matching
  the 64x64 stroke-only style used by all other callout icons
- Add war story bibliography and PDF config entry
- Add first war story ("The Quadratic Wall") in nn_architectures
- Include icon conversion utility script
2026-02-19 07:38:16 -05:00
Salman Muin Kayser Chishti
07d2751b6a Upgrade GitHub Actions to latest versions
Signed-off-by: Salman Muin Kayser Chishti <13schishti@gmail.com>
2026-02-19 09:20:11 +00:00