Commit Graph

9937 Commits

Author SHA1 Message Date
Vijay Janapa Reddi
4ae406160d feat: add Quarto equation labels and cross-references across Vol 1
Add proper equation labels ({#eq-...}) and prose references (@eq-...)
to 138 equations across 15 Volume 1 chapters following the gold-standard
pattern from serving.qmd.

Key changes:
- Label all display math equations with {#eq-kebab-case-name}
- Add @eq-name references in prose before each equation
- Equations include: Iron Law, Amdahl's Law, Roofline Model,
  activation functions, backpropagation, attention mechanisms,
  queuing theory, quantization, and system throughput formulas

Also includes:
- PDF formatting improvements (newpage directives for Vol 2)
- LaTeX header updates for chapter styling
- Pre-commit config and validation script updates
2026-02-07 09:40:01 -05:00
Vijay Janapa Reddi
8f7cbbd58e feat: add Harris & Harris-style chapter opening design
- Add large decorative chapter number in upper-right corner using TikZ
- Remove redundant "Chapter X" prefix (number serves this purpose)
- Rewrite dropcap filter to process elements in document order
  (fixes bug where filter processed all headers before any paragraphs)
- Add PDF-conditional page break before Learning Objectives in intro
- Adjust section spacing for tighter layout

Design inspired by Harris & Harris "Digital Design and Computer Architecture"
2026-02-06 16:38:14 -05:00
Vijay Janapa Reddi
7ef35ebc1c refactor: standardize Python compute cells across Volume 1
- Move 38 Python cells from inside callouts to before callouts
- Add header box formatting to all 268 compute cells (100% compliance)
- Fix unescaped dollar signs for currency values
- Fix inline Python inside LaTeX math blocks
- Update validator to exclude _str variables from false positives

Chapters updated: serving, training, data_engineering, ml_systems,
data_selection, dl_primer, benchmarking, dnn_architectures,
hw_acceleration, model_compression, conclusion, frameworks,
introduction, ops, workflow, responsible_engr, appendix_*

Validation: 3,708 inline refs, 0 errors, 0 warnings
2026-02-06 16:06:30 -05:00
Vijay Janapa Reddi
44a61a0ab1 fix: resolve duplicate cell label 'foundation-cost-calc' in data_selection
Rename to 'foundation-amortization-data' to avoid collision with the
existing 'foundation-cost-calc' cell earlier in the chapter.
2026-02-06 10:02:09 -05:00
Vijay Janapa Reddi
0343a8a536 fix: resolve pre-commit errors (footnote, label, formatting)
- ml_systems: move [^fn-dgx-spark-edge] footnote out of table cell
  into the table caption text
- data_selection: rename fig-foundation-cost-data to foundation-cost-calc
  (computation cell, not a figure)
- Auto-formatter fixes: collapse blank lines, prettify pipe tables
2026-02-06 10:00:45 -05:00
Vijay Janapa Reddi
3d54da6305 fix: resolve inline Python build errors across Vol 1 chapters
Fix NameError build failures in ml_systems, data_engineering, and
benchmarking chapters caused by missing imports and variables referenced
before their defining code cells.

- ml_systems: add missing Kparam and Bparam imports from physx.constants
- data_engineering: compute transfer_time_10g_md preview in setup cell,
  add md_math import, add deduplication-dividend-calc cell, convert
  hardcoded values to physics engine units
- benchmarking: compute BERT roofline preview values in roofline-example-calc
  cell before they are referenced in narrative text, convert hardcoded
  values to inline Python, condense redundant footnotes

Also includes physics engine integration improvements across all Vol 1
chapters: unit-safe conversions, inline Python for previously hardcoded
values, streamlined footnotes with cross-references, and new content
validation scripts.

All 21 Vol 1 chapters pass PDF build tests.
2026-02-06 09:57:25 -05:00
Vijay Janapa Reddi
1d19aa676b refactor: remove conceptual redundancies across Volume 1 chapters
Systematic redundancy removal for MIT Press submission. Applied two-phase
editorial process: (1) identified all conceptual repetition and near-duplication
across main text, examples, and callouts; (2) executed targeted edits to
eliminate redundant content while preserving tone and structure.

Files modified (22 chapters):
- Frontmatter: about, acknowledgements, notation
- Part I: introduction, ml_systems, workflow, data_engineering
- Part II: dl_primer, dnn_architectures, frameworks, training
- Part III: data_selection, hw_acceleration, benchmarking
- Part IV: serving, ops, responsible_engr, conclusion
- Appendices: appendix_dam, appendix_machine, appendix_algorithm, appendix_data

Net reduction: ~72 lines of redundant content removed
2026-02-06 06:49:25 -05:00
Vijay Janapa Reddi
0e17889b20 fix: add caption and alt-text to composite figure in socratiq.qmd
The fig-quizzes composite figure was missing required fig-cap and
fig-alt attributes. Added descriptive caption and accessibility text.
2026-02-06 06:10:48 -05:00
Vijay Janapa Reddi
56657d8152 fix: move footnotes out of forbidden locations (callouts, tables)
- data_engineering.qmd: Move pricing footnote after callout block
- data_engineering.qmd: Convert SATA footnote to inline text
- dl_primer.qmd: Move GPT-4 estimate note to table caption
- introduction.qmd: Move Box quote after callout, remove unused fn-algorithm

All footnotes now follow Quarto rendering rules.
2026-02-06 06:07:08 -05:00
Vijay Janapa Reddi
e942b552ba fix: resolve cross-reference issues and add missing table/figure refs
- Update check_unreferenced_labels.py to detect YAML id: frontmatter
- Add references to all unreferenced tables and listings in Vol1
- Scope unreferenced labels hook to Vol1 only (Vol2 has WIP chapters)
- Fix inline Python in LaTeX math blocks across multiple chapters
- Update test_units.py to use Dense (not Sparse) H100 FLOPS values
- Update validate_inline_refs.py regex to ignore escaped dollar signs

Key files fixed:
- appendix_algorithm.qmd: @tbl-tensor-op-ref, @fig-broadcasting-rules
- appendix_data.qmd: @tbl-data-gravity, @tbl-serialization-cost
- appendix_dam.qmd: @tbl-dam-overlap, @tbl-bottleneck-actions, etc.
- appendix_machine.qmd: @tbl-latency-hierarchy, @tbl-hardware-cheatsheet
- frameworks.qmd: @lst-gradient-accumulation, @lst-custom-autograd-function
- dnn_architectures.qmd: @lst-conv_layer_spatial
2026-02-06 06:03:19 -05:00
Vijay Janapa Reddi
962427ffa2 refactor: continue Physics Engine integration across Volume 1
- Update appendix files with dynamic variable references
- Consolidate references.bib entries
- Apply inline Python patterns to remaining chapters
- Fix notation and formatting consistency
2026-02-06 05:18:43 -05:00
Vijay Janapa Reddi
23d76ac82e Refactor Volume 1 to use dynamic variables from Physics Engine
- Audited all .qmd files in Volume 1 to identify hardcoded numerical constants.
- Replaced hardcoded numbers with dynamic Python variables derived from `physx/constants.py`.
- Updated `physx/constants.py` with missing constants (e.g., battery specs, dataset sizes).
- Created new Python calculation blocks in chapters to derive local metrics (e.g., energy per inference, training costs) from global constants.
- Ensured mathematical consistency across chapters by linking all values to a single source of truth.
- Fixed a citation in references.bib.

This ensures that future updates to core constants (e.g., hardware specs) will automatically propagate throughout the text.
2026-02-06 04:59:21 -05:00
Vijay Janapa Reddi
e5d9dc06e1 refactor: replace remaining hardcoded byte sizes with constants
Additional locations updated to use BYTES_FP32, BYTES_FP16, and ALLREDUCE_FACTOR
from physx/constants.py instead of hardcoded values.

Files updated:
- appendix_algorithm.qmd: bytes_per_fp32 → BYTES_FP32.magnitude
- dl_primer.qmd: bytes_per_param for MNIST → BYTES_FP32.magnitude
- hw_acceleration.qmd: bytes_per_float for tensor calc → BYTES_FP32.magnitude
- serving.qmd: bytes_per_param for KV cache → BYTES_FP16.magnitude
- training.qmd: bytes_per_param_fp16 → BYTES_FP16.magnitude
2026-02-06 03:30:01 -05:00
Vijay Janapa Reddi
2f9899153c style: pre-commit fixes and inline Python improvements
- Fix LaTeX equations in appendix_dam using md() for proper rendering
- Bibtex tidy reformatting of references.bib
- Table alignment fixes across multiple chapters
- Minor formatting cleanup from pre-commit hooks
2026-02-06 03:28:13 -05:00
Vijay Janapa Reddi
8ce4e20549 refactor: use global constants for byte sizes and model parameters
Replace hardcoded byte sizes (2 for FP16, 4 for FP32) and model parameters
with global constants from physx/constants.py for consistency.

Changes:
- Add model_memory() helper to physx/formulas.py for standardized memory calculations
- Replace manual memory calculations with model_memory(params, bytes_per_param, unit)
- Use BYTES_FP16, BYTES_FP32 constants instead of hardcoded 2/4 values
- Use GPT2_PARAMS, GPT3_PARAMS constants instead of local 1.5e9/175e9 values

Files updated: hw_acceleration, dnn_architectures, training, data_engineering,
dl_primer, frameworks
2026-02-06 03:26:33 -05:00
Vijay Janapa Reddi
184fdf34b8 Fix and verify bibliography references
Comprehensive verification and cleanup of references.bib:

- Verified and updated 38 entries with missing URLs/ISBNs/DOIs
- Fixed critical error in vaswani2017attention (Transformer paper)
  - Had incorrect DOI from "Shenzhen Medical Academy" dated 2025
  - Corrected to proper 2017 NeurIPS publication with arXiv URL
- Removed 4 fabricated/unverifiable references
  - Chowdhery2021 (fake Edge TPU paper, not cited)
  - Cheng2022 (fake memory-efficient DL survey)
  - huang2023adaptive (fake autonomous driving paper)
  - yu2023efficient (fake early exit paper)
- Added 2 verified replacement references
  - chen2024eellm (EE-LLM: ICML 2024)
  - seo2023neuroflow (NeuroFlow: arXiv 2023)
- Updated citations in model_compression.qmd to use verified sources

Key papers verified: GPT-3, BERT, Transformer, InstructGPT, Switch
Transformers, Vision Transformer, CLIP, DALL-E, ResNet, SimCLR,
AlpaServe, Ansor, GShard, Clockwork, DeepSpeed, TensorFlow Lite Micro,
MLPerf Mobile, Edge Impulse, and many more.

Results: 759 entries (down from 760), 92.5% with verification metadata,
all critical errors and fabrications eliminated.
2026-02-06 02:17:31 -05:00
Vijay Janapa Reddi
75bb63d9e3 style: standardize compute cell headers with PURPOSE/INPUT/PROCESS/OUTPUT
Apply consistent header format to setup cells in appendix_machine.qmd
and appendix_dam.qmd. All compute cells now follow the same structured
pattern used throughout ml_systems.qmd and other chapters.
2026-02-06 01:33:49 -05:00
Vijay Janapa Reddi
c1a3d08284 refactor: replace hardcoded arithmetic with computed inline refs across Vol1
Convert magic numbers and hardcoded calculations to Python-computed
inline references following the Computed Arithmetic Rule. Changes span
appendices (D·A·M, Machine Foundations), all main chapters, and glossary.

Key improvements:
- Amdahl's/Gustafson's Law examples now compute all derived values
- Training time formula example uses computed days/minutes
- Little's Law example computes concurrent requests from QPS×latency
- Bandwidth-latency example parameterizes link speed and ping
- Glossary consolidates forward pass/forward propagation entries
- Add audit_narrative.py script for prose validation
2026-02-06 01:32:17 -05:00
Vijay Janapa Reddi
2cee4e9b81 Add lead-in sentence for Statistics of Representation callout
Add contextual lead-in before the notebook callout to maintain
consistency with other chapters' callout patterns.
2026-02-06 00:55:31 -05:00
Vijay Janapa Reddi
40e71b54c7 Improve figure reference narrative and fix factual inaccuracies across Vol1
Improve how all ~260 figure references flow in the prose across all 16
chapters and appendices. Replace generic verbs (illustrates, shows, depicts)
with directive, student-engaging language that tells readers what to observe
and why.

Also fix 16 factual inaccuracies found during verification audit:
- introduction: correct compute growth from "five" to "eight" orders of magnitude
- frameworks: fix inverted slope descriptions and crossover magnitudes in
  compilation continuum; correct "embedded targets" to "language bindings"
- data_engineering: remove fabricated "feature engineering" stage from TFX
  pipeline; remove unverifiable animal species names from hard labels
- benchmarking: correct power units from "microwatts/megawatts" to
  "milliwatts/hundreds of kilowatts"
- responsible_engr: correct governance pillar labels to match figure caption
- ml_systems: fix cloud ML examples, mobile ML characteristics, and hybrid
  sync description to match actual figure content
- training: correct LLM scaling curve attribution; fix node color description
- hw_acceleration: fix tiling diagram description
- model_compression: fix quantization error distribution description
- dnn_architectures: fix im2col kernel size; fix attention visualization
2026-02-06 00:08:02 -05:00
Vijay Janapa Reddi
41ef0cacdb wip: isolate hw_acceleration chapter and fix missing imports
- Comment out all chapters except hw_acceleration in PDF config for focused testing
- Add missing physx.constants imports to ml_systems TCO calculation block
- Update figure manifest to reflect single-chapter build
2026-02-05 22:18:03 -05:00
Vijay Janapa Reddi
17c2de646d docs: apply stashed prose improvements and tighten figure references 2026-02-05 21:17:56 -05:00
Vijay Janapa Reddi
3f12a6555e refactor: rename DAM Taxonomy to D·A·M taxonomy and standardize terminology 2026-02-05 21:09:52 -05:00
Vijay Janapa Reddi
605c48737d Docs: clarify Vol1 figure context
Tighten surrounding narrative for key figures to note units and illustrative
assumptions without changing the book's structure or tone.
2026-02-05 15:44:53 -05:00
Vijay Janapa Reddi
83f05a51b5 feat(pdf): update layout to MIT Press 8x10 specifications
Per MIT Press production feedback (Feb 2026):
- Change paper size from 7x10 to 8x10 inches
- Set 1/2" top margin to header
- Set 5/8" bottom margin
- Set 7/8" gutter (inner margin)
- Move page numbers to outside edge (standard book convention)
- Change PDF layout from TwoPageRight to SinglePage for preflight

Also adds copyedit configs for double-spaced PDFs:
- _quarto-pdf-vol1-copyedit.yml
- _quarto-pdf-vol2-copyedit.yml
2026-02-05 14:28:36 -05:00
Vijay Janapa Reddi
20b54a774e chore: update volume configs and frontmatter assets
Remove legacy _quarto.yml and figure index, adjust volume config files,
refresh acknowledgements/references, and add theme and epub assets.
2026-02-04 17:42:17 -05:00
Vijay Janapa Reddi
0fba57e1b0 docs: annotate egress pricing baseline
Document AWS 2024 egress pricing as the baseline and note it in the data gravity callout.
2026-02-04 17:41:07 -05:00
Vijay Janapa Reddi
354bbeee31 refactor: standardize vol1 constants and conversions
Route canonical time, precision, pricing, and reference values through physx.
Update vol1 QMDs to use shared constants and conversion factors.
2026-02-04 17:32:43 -05:00
Vijay Janapa Reddi
563061a0aa refactor: centralize canonical constants in physx
Move energy, network, AlexNet, and carbon baselines into physx constants.
Wire vol1 QMDs to consume those constants for consistent formatting.
2026-02-04 17:21:11 -05:00
Vijay Janapa Reddi
19fb2fba78 fix: use accelerator-first terminology in purpose sections
Purpose sections are abstract by design—they teach principles,
not specific hardware. Replace GPU/TPU references with
"accelerators" in the three Vol 1 purpose sections that
named specific hardware (serving, hw_acceleration, dl_primer).
2026-02-04 17:20:10 -05:00
Vijay Janapa Reddi
47bd285d29 fix: clarify carbon conversion and derive low-util energy
Make the CO2 conversion formula explicit about the hour term.
Compute low-utilization joules/token from idle power and throughput.
2026-02-04 16:38:22 -05:00
Vijay Janapa Reddi
668cc25030 refactor: inline QMD plots and slim viz helpers
Move remaining plot logic into QMD blocks and keep physx/viz styling-only.
Update preview scripts to use local plot code.
2026-02-04 16:34:31 -05:00
Vijay Janapa Reddi
ab9d9b49a5 feat: Add volume-specific theming system
- Vol1: Harvard Crimson (#A51C30)
- Vol2: ETH Zurich Blue (#1F407A)

Architecture:
- themes/_theme-harvard.scss, _theme-eth.scss: Color variables
- _base-styles.scss, _dark-mode-base.scss: Shared styles using $accent
- style-vol1/2.scss, dark-mode-vol1/2.scss: Entry points per volume

Each volume now has its own distinct visual identity while sharing
the same underlying style rules.
2026-02-04 15:48:52 -05:00
Vijay Janapa Reddi
e236277925 Move shelved AutoML section from vol1 to vol2 optimization
AutoML content is better suited for Volume II's optimization chapter
(distributed-scale model search). Moved from vol1/optimizations/ to
vol2/optimization/ to keep it accessible for future integration.
2026-02-04 15:16:47 -05:00
Vijay Janapa Reddi
29fabf35c1 Fix figure list to handle appendix figures and exclude shelved files
Two issues:
1. LaTeX parser regex only matched numeric figure numbers (e.g., 1.1)
   but appendices use letter prefixes (B.1, C.2, D.1). Changed \d+ to
   [A-Z\d]+ so all 214 figures are captured.
2. --scan-all mode picked up _shelved QMD files that aren't in the
   actual build, causing a count mismatch. Added _shelved to skip list.
2026-02-04 15:14:36 -05:00
Vijay Janapa Reddi
ac3c9ab2e5 Fix figure list regex to handle LaTeX braces and apostrophes
Three regex bugs caused missing/truncated captions in the figure list:
1. div_pattern broke on LaTeX {} (e.g., $W_{hh}$, \index{...}) — fixed
   with greedy .* anchored to end-of-line
2. Caption/alt regex [^"']+ truncated at apostrophes (e.g., Moore's) —
   fixed by matching double-quote delimiters only: "([^"]*)"
3. Duplicate figures when ::: div wraps a code block — added dedup logic

Fixes applied to both generate_figure_list.py and figure_list_for_press.py.
Regenerated FIGURE_LIST_VOL1.csv: 182 figures, 0 empty captions.
mit-submission-v1
2026-02-04 15:03:17 -05:00
Vijay Janapa Reddi
f0edc97e0d remove vol 1 title 2026-02-04 08:19:32 -05:00
Vijay Janapa Reddi
8fb27cc973 mit release (after fig alt issue fix) 2026-02-04 08:14:34 -05:00
Vijay Janapa Reddi
c63e1429f2 figure listing 2026-02-04 07:25:56 -05:00
Vijay Janapa Reddi
765896b90d fix principle references 2026-02-04 07:25:42 -05:00
Vijay Janapa Reddi
ace5f2f673 Fix malformed equation in serving.qmd
Convert plain text equation to proper LaTeX math block with label
for @eq-precision-throughput cross-reference to work.
2026-02-04 02:32:52 -05:00
Vijay Janapa Reddi
7f0e31bfb4 Fix fenced div and footnote warnings
- Fix malformed div in networking.qmd: :::.column-margin -> ::: {.column-margin}
- Add missing footnote reference [^fn-box-model] in introduction.qmd
2026-02-04 02:25:22 -05:00
Vijay Janapa Reddi
9d0eb24fa3 Enable all chapters in PDF vol1 config for full book builds
Uncomment all frontmatter, chapters, and appendices so future
builds include the complete book with all figure numbers.
2026-02-04 02:21:19 -05:00
Vijay Janapa Reddi
1a36108b49 Consolidate figure list scripts into single file with --clear flag
- Merge clear_figure_cache.py into generate_figure_list.py
- Pre-render: generate_figure_list.py --clear
- Post-render: generate_figure_list.py
- Single file easier to maintain
2026-02-04 02:17:56 -05:00
Vijay Janapa Reddi
a702f879ae Add automatic figure list generation for MIT Press
- Add pre-render hook to clear stale LaTeX data between builds
- Add post-render hook to generate FIGURE_LIST.txt in output dir
- LaTeX captures figure numbers and pages during compilation
- Use deferred write for accurate page numbers (after float placement)
- Python merges with QMD captions and alt-text
- Output automatically appears in _build/pdf-vol1/ after each build
2026-02-04 02:13:16 -05:00
Vijay Janapa Reddi
6c5ffae4cb stale 2026-02-04 01:23:41 -05:00
Vijay Janapa Reddi
9e857f318d fixes 2026-02-04 01:18:33 -05:00
Vijay Janapa Reddi
d29965a0c3 Fix: move #| directives before imports in all code blocks
Quarto requires #| directives to be at the start of code blocks.
Fixed 93+ code blocks across 15 files where imports came before
the echo: false directive, causing code to be visible in PDFs.
2026-02-04 00:36:33 -05:00
Vijay Janapa Reddi
8094efe659 Simplify plotting code: project-wide PYTHONPATH + viz returns plt
- Added PYTHONPATH='.' to quarto execute config
- Modified viz.setup_plot() to return (fig, ax, COLORS, plt)
- Cleaned up all plotting cells to use simple imports
- No more sys.path manipulation needed in individual cells
2026-02-04 00:30:16 -05:00
Vijay Janapa Reddi
0ef4842d91 Fix: use '.' for sys.path to import physx module
sys.path.insert(0, '.') adds the project root to Python's module search
path, allowing 'from physx import viz' to find the physx package.
2026-02-04 00:26:27 -05:00