Update Volume I tagline from "Build, Optimize, Operate" to
"Build, Optimize, Deploy" across all documentation.
- "Deploy" better matches Part IV content (Serving, MLOps, Responsible Engineering)
- "Operate" implies ongoing management which is more Volume II territory
- Also fixes hw_acceleration table to use proper grid table format
This commit consolidates several structural changes:
1. Merge ondevice_learning into edge_intelligence chapter
- Add support files (bib, concepts, glossary, quizzes)
- Add images from former ondevice_learning
- Update all cross-references from @sec-ondevice-learning to @sec-edge-intelligence
2. Delete Vol II hw_acceleration chapter (merged into distributed_training)
3. Add new Serving chapter to Vol I Part IV (Deployment)
- Create placeholder serving.qmd with section structure
- Add support files for bibliography, concepts, glossary, quizzes
4. Update configs and documentation
- Update _quarto-html.yml with serving chapter in sidebar and bibliography
- Update VOLUME_STRUCTURE.md to reflect 16 chapters in Vol I
- Update About page with simplified Part names
- Fix table formatting in hw_acceleration chapter
Vol I now has 16 chapters (4 in each Part) matching Vol II structure.
The ondevice_learning chapter has been merged into edge_intelligence
for better thematic coherence in Volume II. Edge Intelligence now
covers federated learning, fleet coordination, and on-device adaptation
in a single comprehensive chapter.
Remove outdated planning documents that have been superseded by
VOLUME_STRUCTURE.md which now serves as the authoritative reference
for the two-volume textbook organization.
Removed files:
- VOLUME_SPLIT_ROADMAP.md
- VOLUME_SPLIT_SURGICAL_PLAN.md
- VOLUME_STRUCTURE_PROPOSAL.md
- reviewer-feedback-synthesis-r1.md
- volume-outline-draft.md
Change Part names from compound to single-word format:
- ML Foundations → Foundations
- System Development → Development
- Model Optimization → Optimization
- System Operations → Deployment
Creates clean pedagogical progression: Foundations → Development → Optimization → Deployment
Training chapter (Vol I):
- Remove ~1,069 lines of detailed distributed training implementation
- Add ~95 lines of awareness-level content in new "Scaling Beyond Single
Machines" section covering WHAT/WHY of distributed approaches
- Update Purpose to focus on single-machine training optimization
- Update Learning Objectives to awareness-level for distributed concepts
- Update Summary and Key Takeaways to reflect new scope
- Chapter now self-contained with no Vol II forward references
Distributed Training chapter (Vol II):
- Add comprehensive ~1,100 line chapter with full implementation details
- Data parallelism: gradient averaging, AllReduce, ring topology
- Model parallelism: layer-wise, operator-level partitioning
- Pipeline parallelism: microbatching, scheduling
- Hybrid parallelism: combining strategies for large-scale training
- Efficiency metrics, framework integration, strategy comparison
- Add bibliography with key distributed training references
Follows Hennessy & Patterson model:
- Vol I: Awareness of distributed concepts (WHAT/WHY)
- Vol II: Implementation details (HOW)
Update the book/README.md to match the new two-volume organization:
- Add Volume I (Build, Optimize, Operate) and Volume II (Scale, Distribute, Govern) sections
- Update directory structure to show vol1/ and vol2/ instead of core/ and labs/
- Fix reference to config/ directory for Quarto configuration files
Update Vol II Introduction and Conclusion to use 'this textbook' instead
of 'Volume II' throughout, ensuring self-contained reading experience.
- Introduction: update structure section, how-to-use section, and journey
- Conclusion: update synthesis, extended principles, and complete system sections
- Keep Volume I references where they provide context about the companion
Add descriptive captions to fig-data-fm-parallelism and
fig-fm-model-parallelism in the frameworks chapter, matching
the bold-title-colon-description style used by other figures
in the chapter.
Add explicit build/optimize/operate theme statement parallel to Vol II
Scale/Distribute/Govern framework. Add four-part structure table showing
how the textbook organizes around these imperatives:
- Part I: ML Foundations (Build)
- Part II: System Development (Build)
- Part III: Model Optimization (Optimize)
- Part IV: System Operations (Operate)
Update How to Use section with cleaner formatting and reading paths
for different audiences (sequential readers, practitioners, students).
Remove references to Volume I in favor of this textbook for
self-contained reading experience.
Validation & Verification:
- Fixed ring all-reduce timing discrepancy (28s → 56s theoretical, aligned with footnote calculation)
- Added PaLM citation that was in bibliography but not referenced in text
- Updated Facebook → Meta for company naming consistency
- Softened Microsoft Azure GPU claim (100K → tens of thousands) to remove unsourced specifics
Content Enhancements:
- Added inline definition for federated learning when first introduced
- Added inline definition for differential privacy with technical detail
- Improved clarity of ring all-reduce performance expectations
Academic Quality:
- All 14 bibliography entries now properly utilized
- All quantitative claims verified and accurate
- Footnotes technically sound and consistent with main text
- No AI writing patterns detected
- Learning objectives fully aligned with content
Update README.md and About page to reflect the two-volume organization:
- Volume I: Build, Optimize, Operate (foundations, single-machine)
- Volume II: Scale, Distribute, Govern (distributed, production scale)
Reference the Hennessy & Patterson pedagogical model as the guiding
framework for content placement between volumes.
- Condensed Vol II Introduction purpose to single paragraph
- Added Purpose sections to 8 placeholder chapters:
- Infrastructure: datacenter and cluster management
- Storage: distributed storage for ML workloads
- Communication: collective operations and network constraints
- Distributed Training: large-scale training systems
- Fault Tolerance: reliability in production ML
- Inference: serving systems at scale
- Edge Intelligence: edge-cloud coordination
- ML Ops at Scale: organizational ML operations
All purposes follow Vol I pattern: italicized question + one foundational paragraph explaining why the topic deserves dedicated study.
Vol II Introduction:
- Structured around Scale/Distribute/Govern themes
- Added 10 technical footnotes with systems perspective
- Added quantitative data with verified citations (14 refs)
- Added callout definitions and examples
- All claims fact-checked and corrected
- Polished academic style via stylist agent
Vol II Conclusion:
- Added bibliography with 5 citations
- Maintains synthesis of both volumes
Both volumes now self-contained with proper bridging.
Add --vol1 and --vol2 flags for building individual volumes:
- ./binder pdf --vol1 Build Volume I as PDF
- ./binder pdf --vol2 Build Volume II as PDF
- ./binder epub --vol1 Build Volume I as EPUB
- ./binder epub --vol2 Build Volume II as EPUB
- ./binder list --vol1 List Volume I chapters only
- ./binder list --vol2 List Volume II chapters only
Add ambiguous chapter detection for chapters that exist in both
volumes (e.g., introduction, conclusion). When ambiguous, the CLI
shows an error with guidance to use vol1/ or vol2/ prefix.
Changes:
- discovery.py: Add AmbiguousChapterError, volume-aware discovery
- build.py: Add build_volume() method
- main.py: Add volume flag handling and updated help text
Rewrite introduction to establish clear vision:
- Vol I: Build, Optimize, Operate
- Vol II: Scale, Distribute, Govern
The three imperatives now drive the narrative structure:
- Scale: Infrastructure challenges at production magnitude
- Distribute: Coordination across machines, geographies, edge devices
- Govern: Security, privacy, fairness, sustainability, accountability
Replaces survey-style structure with vision-first approach
that parallels Vol I narrative arc.
Replace placeholder content with full chapters:
- Introduction: establishes scale imperative, bridges from Vol I foundations,
previews all 15 chapters across 4 parts, provides reading guidance
- Conclusion: synthesizes extended principles at scale, describes complete
system architecture, positions readers for continued growth
Both chapters maintain consistency with Vol I style while establishing
Vol II as a standalone yet connected volume.
Note: Cover images are placeholders copied from Vol I until custom
images are generated.
Update all three Quarto configuration files to reference the renamed
Responsible Engineering chapter:
- _quarto-html.yml: chapter href and bibliography path
- _quarto-pdf.yml: chapter path
- _quarto-epub.yml: chapter path and bibliography path
Rename chapter from 'Responsible Systems' to 'Responsible Engineering'
to better reflect the focus on engineering practice and mindset rather
than systems as artifacts.
- Rename folder: responsible_systems -> responsible_engr
- Rename files: responsible_systems.qmd -> responsible_engr.qmd
- Create new bib file with updated references
- Update all section IDs from sec-responsible-systems-* to
sec-responsible-engineering-*
Add foundational chapter on responsible ML systems engineering:
- Purpose section and learning objectives framing engineering mindset
- Engineering Responsibility Gap: Amazon, COMPAS, and YouTube case studies
illustrating when optimization succeeds but systems fail
- Responsible Engineering Checklist: pre-deployment assessment framework,
model documentation standards, population testing, incident response
- Environmental and Cost Awareness: computational costs, brain efficiency
benchmark, total cost of ownership, environmental impact
- Conclusion and Volume II Preview: bridges to Robust AI, Security/Privacy,
Responsible AI, and Sustainable AI chapters
Includes 6 citations, 2 formatted tables, 5 footnotes, and cross-references
to Vol 1 chapters. Focuses on engineering mindset rather than ethics philosophy.
- Revert 'this work' back to 'this textbook' in frontiers.qmd (14 instances)
- Revert 'this work' back to 'this textbook' in ai_for_good.qmd (2 instances)
- Both volumes are part of the same textbook, so this terminology is appropriate
- Change 'Inference (Volume I)' to 'Inference' in table header
- Remove '(Volume I)' from table caption
- Run table formatter for consistent column widths
Remove refs to sustainable_ai, ondevice_learning, security_privacy,
robust_ai, and responsible_ai, converting to descriptive text about
sustainability, edge deployment, and robustness considerations.
Remove references to @sec-ondevice-learning, @sec-security-privacy,
and @sec-robust-ai, converting to descriptive text about edge learning,
security protocols, and fault tolerance methodologies.
- Remove cross-references to Vol 2 chapters from introduction
- Update conclusion with Vol 1 specific content
- Add "Continuing in Volume II" section bridging to advanced topics
- Remove Part markers for Volume 2 content
- Update Preface to describe two-volume organization
- Revise About page with Volume I/II structure and reading paths
- Add announcement bar notice for two-volume edition
- Change "this book" references to "this work" or "this volume"
- Renamed vol2/advanced_intro to vol2/introduction for consistency
- Updated all scripts and configs to use vol1/ instead of core/
- Updated pre-commit config to check all contents/ not just vol1/
- Updated path references in Lua filters, Python scripts, and configs
Rename core/ to vol1/ and advanced/ to vol2/ for clarity.
Explicit naming makes it immediately obvious which volume
content belongs to.
Updated all path references in _quarto-html.yml and _quarto-pdf.yml
Moved 7 chapters to align file structure with volume organization:
- ondevice_learning
- privacy_security
- robust_ai
- responsible_ai
- sustainable_ai
- ai_for_good
- frontiers
Updated all path references in _quarto-html.yml and _quarto-pdf.yml
Note: Cross-volume references (@sec-*) now span core/ and advanced/
which is expected for the two-volume structure.