Commit Graph

1494 Commits

Author SHA1 Message Date
Vijay Janapa Reddi
dcf48671e2 Merge remote-tracking branch 'origin/feature/book-volumes' into feature/book-volumes 2026-02-27 08:09:51 -05:00
Vijay Janapa Reddi
acd3f59f4f Displays pre-flight build manifest in output
Introduces a detailed build manifest that appears in a dedicated output channel prior to any build or debug command execution.

The manifest provides key information about the upcoming operation, including the target volume, build format, execution mode (sequential or parallel), the Quarto configuration file in use, and a comprehensive list of all chapters slated for compilation. The chapter list is derived directly from the Quarto YML, acting as a single source of truth that reflects the full intended book structure, even for entries that are currently commented out.

Additionally, the manifest clearly displays the exact shell command that will be executed, enhancing transparency and aiding in debugging.
2026-02-27 08:09:12 -05:00
Vijay Janapa Reddi
b02b38aa32 fix: resolve PDF build failures in distributed_training and robust_ai
distributed_training: fix unclosed code cell (backticks appended to comment
line), add missing variable computations (a100_mem, nvlink_a100, etc.),
reorder LEGO cells so inline Python refs follow their defining cells, fix
duplicate cell label and stray code fence near young-daly-calc.

robust_ai: add missing TikZ definitions (gear macro, brain/skull pics,
LinePE style) to the data poisoning diagram so it compiles standalone.
2026-02-27 08:08:43 -05:00
Vijay Janapa Reddi
9cba37c92d Refactor TikZ figures and standardize code constants
Introduces reusable `pic` definitions for common elements across numerous TikZ diagrams, enhancing modularity and visual consistency. Improves diagram readability through explicit node positioning and refined styling.

Standardizes hardware and model constants in Python code by using specific `mlsys.constants` and dedicated setup classes, improving maintainability and clarity.

Addresses minor LaTeX formatting in math blocks and refines unit-aware calculations.
2026-02-27 07:15:37 -05:00
Zeljko Hrcek
6de84f20e6 Update chapter 20 figures 2026-02-27 12:02:50 +01:00
Vijay Janapa Reddi
303cd26669 refactor: use fmt_percent across Vol 1 and Vol 2 to prevent Pint precision bugs
This commit standardizes percentage formatting across the entire codebase to prevent critical rendering bugs (like the `19250000000000%` effective utilization bug in Vol 2).

Root Cause:
When dividing two Pint Quantities (e.g., `flop/second` by `TFLOPs/second`), Pint creates a mixed unit (`flop/TFLOPs`). The raw `.magnitude` of this fraction is $10^{12}$. When passed to `fmt(x * 100)`, it multiplied that massive magnitude by 100, resulting in an incorrect display.

Fix:
1. Fortified `fmt_percent` and `display_percent` in `mlsys/formatting.py` to defensively strip units using `.m_as('')`. This forces Pint to cancel out the units (e.g., `flop/TFLOPs` becomes `1.0`) *before* extracting the number.
2. Replaced all instances of `fmt(X * 100)` with the fortified `fmt_percent(X)` across Vol 1 and Vol 2.
3. Fixed inline f-strings in `appendix_assumptions.qmd` by moving formatting logic into the Python setup cell as `_str` variables, adhering to the book's standard practice.

Validation:
- Audited all `.magnitude` extractions in the codebase to ensure they are safe (e.g., explicitly converting to dimensionless units first).
- Ran `validate_inline_refs.py` and confirmed no Python variables are trapped inside LaTeX math mode.
- Successfully built full PDFs for both Volume 1 and Volume 2.
2026-02-26 20:59:43 -05:00
Vijay Janapa Reddi
96336ab0c6 fix: resolve Vol 2 PDF build failures and Pint unit display bugs
- Add missing attributes to FleetFoundations in appendix_fleet.qmd
- Fix regression_testing.png image path in fault_tolerance.qmd
- Add pgfplots package to header-includes.tex for TikZ compatibility
- Fortify fmt_percent in formatting.py to handle Pint Quantities properly, fixing the 19250000000000% display bug
2026-02-26 20:46:12 -05:00
Vijay Janapa Reddi
baebb4c6d7 fix(vol1): model_serving PDF build — Python cell and TikZ
- Remove duplicate indented block in resnet-spectrum-calc cell that caused
  IndentationError (partial EXPORTS + stray class-body lines).
- Fix TikZ in fig-server-anatomy: add missing 'to' in brain path segments,
  remove stray/double commas in node and draw options.
2026-02-26 17:35:42 -05:00
Vijay Janapa Reddi
141a1efbe3 Refactor Volume 2 TikZ diagrams for structural integrity and positioning 2026-02-26 16:05:29 -05:00
Vijay Janapa Reddi
c69b6ab2d1 Add book tools (agent personas, check_figure_div_syntax) 2026-02-26 15:23:19 -05:00
Vijay Janapa Reddi
fd21a57dd3 Update vscode-ext (debug commands, terminal) 2026-02-26 15:23:08 -05:00
Vijay Janapa Reddi
5e0c9a2f5d Update book quarto mlsys (hardware, validate_inline_refs, engine) 2026-02-26 15:23:07 -05:00
Vijay Janapa Reddi
73e39a0b8e Update book index 2026-02-26 15:23:04 -05:00
Vijay Janapa Reddi
2be59e3cec Update shared frontmatter (about, socratiq) 2026-02-26 15:23:04 -05:00
Vijay Janapa Reddi
0e992b79ae Update vol2 content and config 2026-02-26 15:23:03 -05:00
Vijay Janapa Reddi
49ca6889ca Update pre-commit config 2026-02-26 15:23:01 -05:00
Vijay Janapa Reddi
c8447dd556 Update vol1 content and config 2026-02-26 15:11:04 -05:00
Vijay Janapa Reddi
45a3ad829e feat(landing): refine DAM/C3 hexagon wireframe visibility 2026-02-26 13:14:46 -05:00
Vijay Janapa Reddi
9420cfb87e feat(landing): replace sliders with DAM/C3 hexagon cube animation 2026-02-26 13:12:38 -05:00
Vijay Janapa Reddi
fe4daeb728 chore(landing): remove unused background variations 2026-02-26 12:47:54 -05:00
Vijay Janapa Reddi
59cffeef48 feat(landing): add matrix and particle background variations 2026-02-26 12:31:08 -05:00
Vijay Janapa Reddi
e0a71023e4 chore(landing): remove separate layout files in favor of unified light/dark mode 2026-02-26 11:43:19 -05:00
Vijay Janapa Reddi
fadef036e0 feat(landing): add multiple background animation variations and fix index.qmd 2026-02-26 11:42:59 -05:00
Vijay Janapa Reddi
809fd5ffce feat(landing): add dark/cyberpunk and minimal/brutalist variations 2026-02-26 11:37:48 -05:00
Vijay Janapa Reddi
293623e8e7 feat(landing): update modern landing page with pixel bg and animations 2026-02-26 11:33:23 -05:00
Vijay Janapa Reddi
bdf8f7decd Merge remote-tracking branch 'origin/feature/book-volumes' into feature/book-volumes 2026-02-26 08:10:51 -05:00
Zeljko Hrcek
b16f8f36cd A figure has been updated in chapter 18 2026-02-26 12:57:51 +01:00
Zeljko Hrcek
81e9c34ba7 Updated a figure in chapter 16 2026-02-26 10:19:42 +01:00
Vijay Janapa Reddi
79b7925b95 Landing site: two-volume hub with vol1/vol2 navbar, hero, cards, local covers 2026-02-25 15:08:33 -05:00
Vijay Janapa Reddi
9dbdac00a1 refactor: final Gold Standard polish across both volumes; ensure all mathematical variables render correctly and narrative is authoritatively consistent 2026-02-25 08:39:30 -05:00
Vijay Janapa Reddi
2de66f1c0f refactor: complete Gold Standard audit for core foundation chapters; unify Volume 1 and Volume 2 math; verify physical realism of hardware constants 2026-02-25 08:31:21 -05:00
Vijay Janapa Reddi
c990d0037e Merge remote-tracking branch 'origin/fix/ch15' into feature/book-volumes 2026-02-25 07:54:56 -05:00
Vijay Janapa Reddi
ad6229a899 Adds options for targeted reference validation
Introduces `--only-from-report` and `--only-keys` arguments to the `references` validation command.
These allow re-validating only specific citation keys, either from a previous validation report or a custom list.
This significantly improves the workflow for correcting references by enabling focused re-runs and reducing validation time.

Removes the standalone `README_REFERENCE_CHECK.md` documentation, as its content is now implicitly handled by the integrated CLI help and broader documentation.
2026-02-25 07:48:18 -05:00
Vijay Janapa Reddi
aafc8f5d95 Merge remote-tracking branch 'origin/feature/book-volumes' into feature/book-volumes 2026-02-25 07:47:27 -05:00
Zeljko Hrcek
1f85111486 Updated a figure in chapter 15 2026-02-25 10:54:38 +01:00
Zeljko Hrcek
eccbd9d5d6 Updated a figure in chapter 15 2026-02-25 10:47:25 +01:00
Vijay Janapa Reddi
76b06d526f refactor: anchor Volume 2 Compute Infrastructure math to the Frontier Mission; standardize hardware twin naming and unify Vol 1 and Vol 2 logic 2026-02-24 21:10:22 -05:00
Vijay Janapa Reddi
cc7c54e4ed refactor: finalize the 'Engineering Crux' terminology (Hardware -> Systems -> Workloads -> Missions) across both volumes 2026-02-24 21:08:05 -05:00
Vijay Janapa Reddi
fdfd91bf03 refactor: sharpen System layer naming to reflect integrated Platforms (Hubs, Nodes, Phones) rather than hardware modules 2026-02-24 21:05:44 -05:00
Vijay Janapa Reddi
54957b891a chore: verify and update system numbers for B200 and ESP32 to maintain consistency with physical reality and book prose 2026-02-24 21:03:50 -05:00
Vijay Janapa Reddi
bbfdcb5e55 refactor: fully unify Volume 2 Introduction with the Engineering Crux; anchor GPT-4 failure math to GPU_MTTF_HOURS 2026-02-24 20:55:34 -05:00
Vijay Janapa Reddi
20a0918ed1 refactor: anchor Volume 2 Distributed Training math to the Frontier Mission archetype; unify Vol 1 and Vol 2 logic 2026-02-24 20:54:27 -05:00
Vijay Janapa Reddi
977daf2b7c refactor: anchor Interconnect Hierarchy and Serving Spectrum to System Archetypes; ensure self-healing math across Vol 1 2026-02-24 20:46:45 -05:00
Vijay Janapa Reddi
c57db2c2d6 feat: establish the 'Engineering Crux' hierarchy (Hardware -> Models -> Systems -> Scenarios) as the foundational framework for the curriculum 2026-02-24 20:26:22 -05:00
Vijay Janapa Reddi
56e091f7e0 feat: standardize System Archetypes in Vol 1 and Vol 2; add canonical roster table to Introductions; ensure tight math-prose integration 2026-02-24 20:23:50 -05:00
Vijay Janapa Reddi
ad843e21f7 style: Vol2 register pass follow-up #2 — fix two more violations in sustainable_ai
Flagged by the sustainable_ai editor agent as newly discovered during fixing:

- line 635: "If your cluster consumes...how much...actually went...how much was wasted?"
  → impersonal declarative; removes "your", two embedded rhetorical questions, two "actually"
- line 2261: "You want to fine-tune a small language model" in .callout-notebook
  → "Consider fine-tuning a small language model" (impersonal)
2026-02-24 19:52:53 -05:00
Vijay Janapa Reddi
d67ad7005c style: Vol2 register pass follow-up — fix missed violations in distributed_training and sustainable_ai
Post-commit verification found 6 additional violations not caught by the
initial audit agents:

distributed_training (4 fixes):
- line 108: second person "If you could purchase a single GPU" → impersonal
- line 280: rhetorical Q "How exactly do 1,024 GPUs...agree" → declarative
- line 784: second person "Your AllReduce...Where do you look?" in
  .callout-perspective → impersonal problem statement
- line 1347: rhetorical Q "where did the missing 25%...go?" → declarative

sustainable_ai (2 fixes):
- line 2047: embedded rhetorical Q "where does the dominant share of energy go?" → declarative
- line 2414: closing rhetorical Q "what happens to these clusters...?" → declarative noun phrase
2026-02-24 19:50:36 -05:00
Vijay Janapa Reddi
1ddf9bd5e3 style: Vol2 register pass — eliminate rhetorical questions, second person, vague intensifiers
Systematic register audit and fix across all 13 non-clean Vol2 chapters.
Clean chapters (compute_infrastructure, network_fabrics, inference, responsible_ai,
frontmatter, backmatter) required no edits.

Violations fixed by chapter:
- introduction: 14 fixes (rhetorical Qs, second person, vague intensifiers)
- collective_communication: 27 fixes (rhetorical Qs, contractions, second person, intensifiers)
- distributed_training: 7 fixes (all rhetorical questions → declarative statements)
- ops_scale: 6 fixes (intensifiers, second person, rhetorical Q, announcement transition)
- performance_engineering: 3 fixes (rhetorical Q, second person, announcement transition)
- robust_ai: 4 fixes (hedging, second person in callout-notebook)
- sustainable_ai: 4 fixes (rhetorical Q, second person, bold starter in callout)
- fleet_orchestration: 4 fixes (rhetorical questions)
- security_privacy: 4 fixes (banned phrase, second person, rhetorical Q)
- edge_intelligence: 4 fixes (rhetorical Q, vague intensifiers)
- fault_tolerance: 1 fix (second person in callout-notebook)
- data_storage: 1 fix (sentence-starting "But,")
- conclusion: 2 fixes (first-person "We have climbed", "To conclude" opener)

Pre-commit rendering/inline-refs failures are pre-existing on this branch
(77 files, 116 rendering issues, 179 inline-ref errors in unrelated files).
None of the 13 edited files have rendering violations.
2026-02-24 19:46:16 -05:00
Vijay Janapa Reddi
e881d92625 refactor: introduce System Archetypes in mlsys/systems.py and integrate into Introduction and Serving chapters; verify math integrity and rationale for LEGO blocks 2026-02-24 19:12:51 -05:00
Vijay Janapa Reddi
a0ce7cc746 style: Vol1 register pass — academic formality across 16 chapters
Systematic prose register audit and fix pass across all substantive
Vol1 chapters, enforcing book-prose.md Section 1 "Tone Register &
Academic Formality" rules:

- Rhetorical questions in body prose → declarative statements
- Sentence-starting coordinating conjunctions (But/And/So) → restructured
- Banned AI-pattern phrases ("leverage" → "use", "state-of-the-art" →
  "top benchmark", "powerful" → precise alternatives, "groundbreaking"
  removed, "dramatic" → quantified)
- Contractions in body prose → expanded forms
- Second person "you/your" → impersonal/third-person voice
- Vague intensifiers ("just", "simply", "actually", "perhaps", "clearly",
  "very") → removed or replaced with precise language
- Bold paragraph starters in body prose → plain text

Protected content left unchanged: Purpose hook questions, .callout-
checkpoint content, code blocks, Python cells, TikZ/LaTeX math,
Fallacy/Pitfall structural labels, direct quotations.

Chapters modified (15 files, ~350 targeted edits):
introduction, ml_systems, nn_computation, ml_workflow, frameworks,
nn_architectures, training, data_selection, hw_acceleration,
model_compression, model_serving, benchmarking, ml_ops,
responsible_engr, conclusion

(data_engineering fixed in prior session)
2026-02-24 17:46:36 -05:00