Commit Graph

9020 Commits

Author SHA1 Message Date
Vijay Janapa Reddi
95e0eafcdc refactor: rename Vol I tagline from Operate to Deploy
Update Volume I tagline from "Build, Optimize, Operate" to
"Build, Optimize, Deploy" across all documentation.

- "Deploy" better matches Part IV content (Serving, MLOps, Responsible Engineering)
- "Operate" implies ongoing management which is more Volume II territory
- Also fixes hw_acceleration table to use proper grid table format
2026-01-03 10:09:19 -05:00
Vijay Janapa Reddi
773897c07d refactor: restructure Vol II chapters and add Vol I serving chapter
This commit consolidates several structural changes:

1. Merge ondevice_learning into edge_intelligence chapter
   - Add support files (bib, concepts, glossary, quizzes)
   - Add images from former ondevice_learning
   - Update all cross-references from @sec-ondevice-learning to @sec-edge-intelligence

2. Delete Vol II hw_acceleration chapter (merged into distributed_training)

3. Add new Serving chapter to Vol I Part IV (Deployment)
   - Create placeholder serving.qmd with section structure
   - Add support files for bibliography, concepts, glossary, quizzes

4. Update configs and documentation
   - Update _quarto-html.yml with serving chapter in sidebar and bibliography
   - Update VOLUME_STRUCTURE.md to reflect 16 chapters in Vol I
   - Update About page with simplified Part names
   - Fix table formatting in hw_acceleration chapter

Vol I now has 16 chapters (4 in each Part) matching Vol II structure.
2026-01-03 10:01:25 -05:00
Vijay Janapa Reddi
35bc9ae952 refactor: remove ondevice_learning chapter (merged into edge_intelligence)
The ondevice_learning chapter has been merged into edge_intelligence
for better thematic coherence in Volume II. Edge Intelligence now
covers federated learning, fleet coordination, and on-device adaptation
in a single comprehensive chapter.
2026-01-03 09:57:26 -05:00
Vijay Janapa Reddi
ddb8068e6b docs: remove stale volume planning documents
Remove outdated planning documents that have been superseded by
VOLUME_STRUCTURE.md which now serves as the authoritative reference
for the two-volume textbook organization.

Removed files:
- VOLUME_SPLIT_ROADMAP.md
- VOLUME_SPLIT_SURGICAL_PLAN.md
- VOLUME_STRUCTURE_PROPOSAL.md
- reviewer-feedback-synthesis-r1.md
- volume-outline-draft.md
2026-01-03 09:57:16 -05:00
Vijay Janapa Reddi
f3a38e32d5 refactor: simplify Vol I Part names to single words
Change Part names from compound to single-word format:
- ML Foundations → Foundations
- System Development → Development
- Model Optimization → Optimization
- System Operations → Deployment

Creates clean pedagogical progression: Foundations → Development → Optimization → Deployment
2026-01-03 09:31:19 -05:00
Vijay Janapa Reddi
24474d0be8 style: convert bold-term patterns to narrative prose in hw_acceleration chapters
Vol I awareness section:
- Converted 'Bold term paragraph' patterns to flowing narrative
- Added transitional phrases between paragraphs
- Maintained all technical content and quantitative details

Vol II chapter:
- Removed AI writing patterns (pivotal, formidable, harnessing)
- Restructured long parenthetical phrases for better flow
- Fixed comma-dash hybrid constructions
2026-01-02 20:36:17 -05:00
Vijay Janapa Reddi
79726df9ff refactor: migrate multi-chip hardware content from Vol I to Vol II
Vol I Changes (hw_acceleration.qmd):
- Replace detailed multi-chip section with awareness content
- Add Scaling Beyond Single Accelerators section with overview
- Forward references to Vol II for implementation details

Vol II New Chapter (scaling-ai-hardware):
- Comprehensive multi-chip acceleration chapter
- Detailed NVSwitch, TPU Pods, wafer-scale coverage
- Amdahl Law communication overhead analysis
- Distributed execution strategies
2026-01-02 20:24:27 -05:00
Vijay Janapa Reddi
22484ae24f fix: address medium-priority review findings across 6 chapters
- dnn_architectures: fix accuracy/error rate confusion, remove redundant phrase
- data_engineering: remove authoring placeholder text
- training: fix LaTeX typo, correct FLOPS calculation
- hw_acceleration: fix Intel H100 → Intel Gaudi 2 (product naming error)
- optimizations: fix figure label typo, URL capitalization, GPU/TPU caps
- workflow: fix incomplete footnote sentences, remove redundant phrases
2026-01-02 19:57:46 -05:00
Vijay Janapa Reddi
fe66e921fe fix: address review findings across 6 chapters
- workflow: remove duplicate "Studies show that Studies suggest/indicate" phrases
- ops: remove duplicate correction cascades content and maturity table, update outdated 2025 projection
- benchmarking: replace hardcoded Section 8.2/8.3 with @sec- cross-references
- dl_primer: fix incomplete figure caption, add missing TikZ commas
- efficient_ai: fix date inconsistency (2018 → 2019)
- distributed_training: remove duplicate paragraph, fix microbatching caption, correct percent→fraction
2026-01-02 19:50:38 -05:00
Vijay Janapa Reddi
63377fb48d refactor: migrate distributed training content from Vol I to Vol II
Training chapter (Vol I):
- Remove ~1,069 lines of detailed distributed training implementation
- Add ~95 lines of awareness-level content in new "Scaling Beyond Single
  Machines" section covering WHAT/WHY of distributed approaches
- Update Purpose to focus on single-machine training optimization
- Update Learning Objectives to awareness-level for distributed concepts
- Update Summary and Key Takeaways to reflect new scope
- Chapter now self-contained with no Vol II forward references

Distributed Training chapter (Vol II):
- Add comprehensive ~1,100 line chapter with full implementation details
- Data parallelism: gradient averaging, AllReduce, ring topology
- Model parallelism: layer-wise, operator-level partitioning
- Pipeline parallelism: microbatching, scheduling
- Hybrid parallelism: combining strategies for large-scale training
- Efficiency metrics, framework integration, strategy comparison
- Add bibliography with key distributed training references

Follows Hennessy & Patterson model:
- Vol I: Awareness of distributed concepts (WHAT/WHY)
- Vol II: Implementation details (HOW)
2026-01-02 19:06:32 -05:00
Vijay Janapa Reddi
ba889bcc26 docs: update book README to reflect two-volume structure
Update the book/README.md to match the new two-volume organization:
- Add Volume I (Build, Optimize, Operate) and Volume II (Scale, Distribute, Govern) sections
- Update directory structure to show vol1/ and vol2/ instead of core/ and labs/
- Fix reference to config/ directory for Quarto configuration files
2026-01-02 17:40:43 -05:00
Vijay Janapa Reddi
3db6c81c1b cleanup: remove unused xrefs cross-reference system
- Delete 20 chapter *_xrefs.json files (vol1 + vol2)
- Delete 5 cross_refs*.json data files
- Delete 2 Lua filter files (inject_xrefs.lua, inject-xrefs-clean.lua)
- Delete entire book/tools/scripts/cross_refs/ directory (29 scripts)
- Remove crossrefs YAML header from 21 chapter .qmd files
- Remove cross-references config from 3 Quarto yml files

The xrefs system was built but never enabled (all filter references
were commented out). Removing to reduce codebase complexity.
2026-01-02 17:22:55 -05:00
Vijay Janapa Reddi
8951b408d7 refactor: remove Volume I reference from distributed_training placeholder
Replace Volume I reference with self-contained description of chapter
content for standalone reading experience.
2026-01-02 17:14:16 -05:00
Vijay Janapa Reddi
26be0aecde refactor: replace Vol I self-references with this textbook in conclusion
Update Vol I Conclusion to use 'this textbook' for self-references while
keeping appropriate references to Volume II for companion volume pointers.
2026-01-02 17:13:44 -05:00
Vijay Janapa Reddi
11f8e01e2c refactor: replace Volume II references with this textbook for self-contained reading
Update Vol II Introduction and Conclusion to use 'this textbook' instead
of 'Volume II' throughout, ensuring self-contained reading experience.

- Introduction: update structure section, how-to-use section, and journey
- Conclusion: update synthesis, extended principles, and complete system sections
- Keep Volume I references where they provide context about the companion
2026-01-02 17:06:51 -05:00
Vijay Janapa Reddi
0df5ac8fd3 docs: add captions to data and model parallelism figures
Add descriptive captions to fig-data-fm-parallelism and
fig-fm-model-parallelism in the frameworks chapter, matching
the bold-title-colon-description style used by other figures
in the chapter.
2026-01-02 17:01:46 -05:00
Vijay Janapa Reddi
b42a144737 feat: add Build/Optimize/Operate structure to Vol I Introduction
Add explicit build/optimize/operate theme statement parallel to Vol II
Scale/Distribute/Govern framework. Add four-part structure table showing
how the textbook organizes around these imperatives:

- Part I: ML Foundations (Build)
- Part II: System Development (Build)
- Part III: Model Optimization (Optimize)
- Part IV: System Operations (Operate)

Update How to Use section with cleaner formatting and reading paths
for different audiences (sequential readers, practitioners, students).

Remove references to Volume I in favor of this textbook for
self-contained reading experience.
2026-01-02 16:56:10 -05:00
Vijay Janapa Reddi
b0b4b74718 polish: comprehensive editorial polish on Volume II Introduction
Validation & Verification:
- Fixed ring all-reduce timing discrepancy (28s → 56s theoretical, aligned with footnote calculation)
- Added PaLM citation that was in bibliography but not referenced in text
- Updated Facebook → Meta for company naming consistency
- Softened Microsoft Azure GPU claim (100K → tens of thousands) to remove unsourced specifics

Content Enhancements:
- Added inline definition for federated learning when first introduced
- Added inline definition for differential privacy with technical detail
- Improved clarity of ring all-reduce performance expectations

Academic Quality:
- All 14 bibliography entries now properly utilized
- All quantitative claims verified and accurate
- Footnotes technically sound and consistent with main text
- No AI writing patterns detected
- Learning objectives fully aligned with content
2026-01-02 16:51:34 -05:00
Vijay Janapa Reddi
5a53c8c5bf docs: add two-volume structure with Hennessy & Patterson model
Update README.md and About page to reflect the two-volume organization:
- Volume I: Build, Optimize, Operate (foundations, single-machine)
- Volume II: Scale, Distribute, Govern (distributed, production scale)

Reference the Hennessy & Patterson pedagogical model as the guiding
framework for content placement between volumes.
2026-01-02 10:25:34 -05:00
Vijay Janapa Reddi
b4fbf7571e feat: add purpose sections to all Vol II chapters
- Condensed Vol II Introduction purpose to single paragraph
- Added Purpose sections to 8 placeholder chapters:
  - Infrastructure: datacenter and cluster management
  - Storage: distributed storage for ML workloads
  - Communication: collective operations and network constraints
  - Distributed Training: large-scale training systems
  - Fault Tolerance: reliability in production ML
  - Inference: serving systems at scale
  - Edge Intelligence: edge-cloud coordination
  - ML Ops at Scale: organizational ML operations

All purposes follow Vol I pattern: italicized question + one foundational paragraph explaining why the topic deserves dedicated study.
2026-01-02 10:22:23 -05:00
Vijay Janapa Reddi
49520ed2ba feat: complete Vol II intro/conclusion with textbook elements
Vol II Introduction:
- Structured around Scale/Distribute/Govern themes
- Added 10 technical footnotes with systems perspective
- Added quantitative data with verified citations (14 refs)
- Added callout definitions and examples
- All claims fact-checked and corrected
- Polished academic style via stylist agent

Vol II Conclusion:
- Added bibliography with 5 citations
- Maintains synthesis of both volumes

Both volumes now self-contained with proper bridging.
2026-01-02 10:15:07 -05:00
Vijay Janapa Reddi
073231734b feat: add volume support to Binder CLI
Add --vol1 and --vol2 flags for building individual volumes:
- ./binder pdf --vol1   Build Volume I as PDF
- ./binder pdf --vol2   Build Volume II as PDF
- ./binder epub --vol1  Build Volume I as EPUB
- ./binder epub --vol2  Build Volume II as EPUB
- ./binder list --vol1  List Volume I chapters only
- ./binder list --vol2  List Volume II chapters only

Add ambiguous chapter detection for chapters that exist in both
volumes (e.g., introduction, conclusion). When ambiguous, the CLI
shows an error with guidance to use vol1/ or vol2/ prefix.

Changes:
- discovery.py: Add AmbiguousChapterError, volume-aware discovery
- build.py: Add build_volume() method
- main.py: Add volume flag handling and updated help text
2026-01-02 10:00:31 -05:00
Vijay Janapa Reddi
1f79db00f6 refactor: restructure Vol II intro around Scale/Distribute/Govern themes
Rewrite introduction to establish clear vision:
- Vol I: Build, Optimize, Operate
- Vol II: Scale, Distribute, Govern

The three imperatives now drive the narrative structure:
- Scale: Infrastructure challenges at production magnitude
- Distribute: Coordination across machines, geographies, edge devices
- Govern: Security, privacy, fairness, sustainability, accountability

Replaces survey-style structure with vision-first approach
that parallels Vol I narrative arc.
2026-01-02 09:15:22 -05:00
Vijay Janapa Reddi
bb0a096f7f feat: add Volume II introduction and conclusion chapters
Replace placeholder content with full chapters:
- Introduction: establishes scale imperative, bridges from Vol I foundations,
  previews all 15 chapters across 4 parts, provides reading guidance
- Conclusion: synthesizes extended principles at scale, describes complete
  system architecture, positions readers for continued growth

Both chapters maintain consistency with Vol I style while establishing
Vol II as a standalone yet connected volume.

Note: Cover images are placeholders copied from Vol I until custom
images are generated.
2026-01-02 09:10:41 -05:00
Vijay Janapa Reddi
2b335ee615 fix: update Volume II chapters for two-volume terminology
Updated references from 'this textbook' to 'this work' and fixed
table formatting in frontiers.qmd for proper column widths.
2026-01-02 09:04:30 -05:00
Vijay Janapa Reddi
141fd5c98d fix: update Volume I chapters for two-volume terminology
Update terminology from 'this textbook' to 'this volume' in Volume I
chapters to reflect two-volume structure.
2026-01-02 08:58:53 -05:00
Vijay Janapa Reddi
94a3a32ccd config: update Quarto configs for responsible_engr rename
Update all three Quarto configuration files to reference the renamed
Responsible Engineering chapter:
- _quarto-html.yml: chapter href and bibliography path
- _quarto-pdf.yml: chapter path
- _quarto-epub.yml: chapter path and bibliography path
2026-01-02 08:57:59 -05:00
Vijay Janapa Reddi
d5e187617b refactor: rename Responsible Systems to Responsible Engineering
Rename chapter from 'Responsible Systems' to 'Responsible Engineering'
to better reflect the focus on engineering practice and mindset rather
than systems as artifacts.

- Rename folder: responsible_systems -> responsible_engr
- Rename files: responsible_systems.qmd -> responsible_engr.qmd
- Create new bib file with updated references
- Update all section IDs from sec-responsible-systems-* to
  sec-responsible-engineering-*
2026-01-02 08:57:41 -05:00
Vijay Janapa Reddi
ee7618574c polish: update energy consumption claim for citation accuracy
Refined energy consumption statement (Line 277) to align with 2019
citation timing. Changed from "GPT-3 scale models" to "large models"
to match the Strubell et al. 2019 paper timeframe.

Polish Workflow Summary:
- Fact-checking: All technical claims verified (A-)
- Citations: 10/10 properly formatted (A)
- Cross-references: Exemplary integration (A+)
- Footnotes: 6 educational footnotes, well-placed (A)
- Glossary: 5 terms identified for textbook glossary (B+)
- Learning objectives: Perfect alignment, 97% coverage (A+)
- Content editing: Minimal correction needed (A)
- Style polish: Publication-ready prose (A+)

Overall: Chapter at publication quality. One citation phrasing
update applied. Zero AI patterns detected.
2026-01-01 19:23:34 -05:00
Vijay Janapa Reddi
25a3cd0dcf feat: add complete Responsible Systems chapter for Volume I
Add foundational chapter on responsible ML systems engineering:

- Purpose section and learning objectives framing engineering mindset
- Engineering Responsibility Gap: Amazon, COMPAS, and YouTube case studies
  illustrating when optimization succeeds but systems fail
- Responsible Engineering Checklist: pre-deployment assessment framework,
  model documentation standards, population testing, incident response
- Environmental and Cost Awareness: computational costs, brain efficiency
  benchmark, total cost of ownership, environmental impact
- Conclusion and Volume II Preview: bridges to Robust AI, Security/Privacy,
  Responsible AI, and Sustainable AI chapters

Includes 6 citations, 2 formatted tables, 5 footnotes, and cross-references
to Vol 1 chapters. Focuses on engineering mindset rather than ethics philosophy.
2026-01-01 18:01:47 -05:00
Vijay Janapa Reddi
b7d5944d12 fix: restore 'this textbook' terminology in Vol 2 chapters
- Revert 'this work' back to 'this textbook' in frontiers.qmd (14 instances)
- Revert 'this work' back to 'this textbook' in ai_for_good.qmd (2 instances)
- Both volumes are part of the same textbook, so this terminology is appropriate
2026-01-01 17:41:23 -05:00
Vijay Janapa Reddi
736dbc64f8 fix: remove 'Volume I' from table header in ondevice_learning
- Change 'Inference (Volume I)' to 'Inference' in table header
- Remove '(Volume I)' from table caption
- Run table formatter for consistent column widths
2026-01-01 17:36:37 -05:00
Vijay Janapa Reddi
9c920bfc44 refactor: update ai_for_good to use 'this work' terminology
- Change 'this textbook' to 'this work' (2 instances)
2026-01-01 17:33:38 -05:00
Vijay Janapa Reddi
96acabc712 refactor: update ondevice_learning for two-volume structure
- Change 'Part III' references to 'Volume I'
- Update table header from 'Inference (Part III)' to 'Inference (Volume I)'
- Remove 'opening Part IV: Trustworthy Systems' reference
- Fix table column width formatting for consistency
2026-01-01 17:33:16 -05:00
Vijay Janapa Reddi
2b7a702a48 refactor: update frontiers chapter for two-volume structure
- Replace "this textbook" with "this work" throughout
- Maintain references to other chapters using @sec- format
- No functional changes to content
2026-01-01 17:29:25 -05:00
Vijay Janapa Reddi
4f69f1c6e7 refactor: remove Vol 2 cross-references from remaining Vol 1 chapters
- ml_systems: remove refs to robust_ai, responsible_ai
- data_engineering: remove ref to responsible_ai
- hw_acceleration: remove refs to security_privacy, sustainable_ai
2026-01-01 17:16:54 -05:00
Vijay Janapa Reddi
49638e60c2 refactor: remove Vol 2 cross-references from foundation chapters
- dl_primer: remove refs to sustainable_ai, robust_ai, agi_systems
- dnn_architectures: remove ref to sustainable_ai
- workflow: remove refs to robust_ai, responsible_ai
2026-01-01 17:16:38 -05:00
Vijay Janapa Reddi
903bf7bb6c refactor: remove Vol 2 cross-references from optimizations chapter
Remove refs to sustainable_ai, ondevice_learning, security_privacy,
robust_ai, and responsible_ai, converting to descriptive text about
sustainability, edge deployment, and robustness considerations.
2026-01-01 17:16:18 -05:00
Vijay Janapa Reddi
1313c766e2 refactor: remove Vol 2 cross-references from frameworks and training
- frameworks: remove refs to ondevice_learning, security_privacy, responsible_ai
- training: remove refs to sustainable_ai, robust_ai, security_privacy,
  ondevice_learning, responsible_ai
2026-01-01 17:15:56 -05:00
Vijay Janapa Reddi
a1be284d47 refactor: remove Vol 2 cross-references from efficiency chapters
- efficient_ai: remove refs to sustainable_ai, ai_good, security_privacy,
  ondevice_learning, robust_ai, responsible_ai
- benchmarking: remove refs to sustainable_ai, responsible_ai, security_privacy
2026-01-01 17:15:40 -05:00
Vijay Janapa Reddi
57f2482e59 refactor: remove Vol 2 cross-references from ops chapter
Remove references to @sec-ondevice-learning, @sec-security-privacy,
and @sec-robust-ai, converting to descriptive text about edge learning,
security protocols, and fault tolerance methodologies.
2026-01-01 17:15:23 -05:00
Vijay Janapa Reddi
22eabfefca docs: update Vol 1 introduction and conclusion for self-contained volume
- Remove cross-references to Vol 2 chapters from introduction
- Update conclusion with Vol 1 specific content
- Add "Continuing in Volume II" section bridging to advanced topics
- Remove Part markers for Volume 2 content
2026-01-01 17:15:06 -05:00
Vijay Janapa Reddi
4d5bd7b731 docs: update frontmatter for two-volume structure
- Update Preface to describe two-volume organization
- Revise About page with Volume I/II structure and reading paths
- Add announcement bar notice for two-volume edition
- Change "this book" references to "this work" or "this volume"
2026-01-01 17:14:45 -05:00
Vijay Janapa Reddi
126834674f fix: update remaining core/ references to vol1/vol2
- Updated GitHub workflow link-check examples
- Updated quiz filter default scan directory to contents/
- Updated PDF config scan-directory to contents/ (scans both volumes)
- Updated Lua filter comments
- Updated quiz JSON source_file metadata to correct paths
2026-01-01 14:53:36 -05:00
Vijay Janapa Reddi
e58065e40e refactor: update epub config for two-volume structure 2026-01-01 14:49:02 -05:00
Vijay Janapa Reddi
c4367b1b92 refactor: update inject_xrefs.lua for vol1/vol2 paths 2026-01-01 14:48:56 -05:00
Vijay Janapa Reddi
9781727d60 refactor: rename advanced_intro to introduction and update scripts
- Renamed vol2/advanced_intro to vol2/introduction for consistency
- Updated all scripts and configs to use vol1/ instead of core/
- Updated pre-commit config to check all contents/ not just vol1/
- Updated path references in Lua filters, Python scripts, and configs
2026-01-01 14:46:52 -05:00
Vijay Janapa Reddi
06913458fd refactor: rename content folders to vol1 and vol2
Rename core/ to vol1/ and advanced/ to vol2/ for clarity.
Explicit naming makes it immediately obvious which volume
content belongs to.

Updated all path references in _quarto-html.yml and _quarto-pdf.yml
2026-01-01 14:43:22 -05:00
Vijay Janapa Reddi
0b90b11224 refactor: move Vol II chapters from core to advanced
Moved 7 chapters to align file structure with volume organization:
- ondevice_learning
- privacy_security
- robust_ai
- responsible_ai
- sustainable_ai
- ai_for_good
- frontiers

Updated all path references in _quarto-html.yml and _quarto-pdf.yml

Note: Cross-volume references (@sec-*) now span core/ and advanced/
which is expected for the two-volume structure.
2026-01-01 14:37:21 -05:00
Vijay Janapa Reddi
ef68ee319b docs: add Vol II part divider keys in advanced chapter files
Add \part{key:...} references for vol2_distributed, vol2_production,
vol2_responsible, and backmatter dividers
2026-01-01 14:29:02 -05:00