From 718f8670394db345c8bb0ce6a723571b7108653e Mon Sep 17 00:00:00 2001 From: Vijay Janapa Reddi Date: Sat, 21 Feb 2026 06:58:22 -0500 Subject: [PATCH] Vol1: improve book abstracts and chapter content - Config: academic, standalone abstracts for PDF/EPUB/copyedit - Chapters: ml_systems, nn_architectures, nn_computation, training --- book/quarto/config/_quarto-epub-vol1.yml | 4 +- book/quarto/config/_quarto-html-vol1.yml | 2 +- .../config/_quarto-pdf-vol1-copyedit.yml | 4 +- book/quarto/config/_quarto-pdf-vol1.yml | 64 +++++++++---------- .../contents/vol1/ml_systems/ml_systems.qmd | 2 +- .../nn_architectures/nn_architectures.qmd | 3 + .../vol1/nn_computation/nn_computation.qmd | 4 ++ .../contents/vol1/training/training.qmd | 8 +++ 8 files changed, 53 insertions(+), 38 deletions(-) diff --git a/book/quarto/config/_quarto-epub-vol1.yml b/book/quarto/config/_quarto-epub-vol1.yml index f2224f567..07d0ec8ec 100644 --- a/book/quarto/config/_quarto-epub-vol1.yml +++ b/book/quarto/config/_quarto-epub-vol1.yml @@ -45,7 +45,7 @@ book: orcid: "0000-0002-5259-7721" abstract: | - Volume I of Machine Learning Systems presents a comprehensive introduction to understanding and engineering machine learning systems, with a focus on mastering the machine learning node. This volume establishes the foundations through four progressive stages: building conceptual models, engineering complete workflows, optimizing for real-world constraints, and deploying with confidence. + Foundations of machine learning systems engineering for single-machine deployment. ML systems are treated as infrastructure governed by physical constraints: data movement, memory bandwidth, and compute limits shape design decisions from model architecture to deployment target. The treatment progresses through conceptual models, end-to-end workflows, optimization under operational constraints, and production deployment. Quantitative reasoning and enduring principles are emphasized over transient tools. Suitable for undergraduate and graduate courses in computer science and engineering, and for practitioners designing flexible, efficient, and robust ML systems. repo-url: https://github.com/harvard-edge/cs249r_book @@ -53,7 +53,7 @@ book: left: | Written, edited and curated by Prof. Vijay Janapa Reddi (Harvard University) right: | - This book was built with Quarto. + Built with Quarto. chapters: - index.qmd diff --git a/book/quarto/config/_quarto-html-vol1.yml b/book/quarto/config/_quarto-html-vol1.yml index b6ff6f82e..faf01313e 100644 --- a/book/quarto/config/_quarto-html-vol1.yml +++ b/book/quarto/config/_quarto-html-vol1.yml @@ -70,7 +70,7 @@ website: text: "Volume I: Introduction" href: ./ - icon: journal - text: "Volume II: Advanced" + text: "Volume II: At Scale" href: ../vol2/ - text: "---" - icon: fire diff --git a/book/quarto/config/_quarto-pdf-vol1-copyedit.yml b/book/quarto/config/_quarto-pdf-vol1-copyedit.yml index 524343c09..ffc5c7fa9 100644 --- a/book/quarto/config/_quarto-pdf-vol1-copyedit.yml +++ b/book/quarto/config/_quarto-pdf-vol1-copyedit.yml @@ -35,7 +35,7 @@ book: roles: "Author, editor and curator." abstract: | - Volume I of Machine Learning Systems presents a comprehensive introduction to understanding and engineering machine learning systems, with a focus on mastering the machine learning node. This volume establishes the foundations through four progressive stages: building conceptual models, engineering complete workflows, optimizing for real-world constraints, and deploying with confidence. The content bridges theoretical foundations and practical engineering, emphasizing the systems context that engineers need to master when building AI solutions. While ML applications and tools evolve rapidly, the engineering principles for building ML systems remain largely consistent. This volume distills these enduring concepts for anyone seeking to build flexible, efficient, and robust ML systems. + Foundations of machine learning systems engineering for single-machine deployment. ML systems are treated as infrastructure governed by physical constraints: data movement, memory bandwidth, and compute limits shape design decisions from model architecture to deployment target. The treatment progresses through conceptual models, end-to-end workflows, optimization under operational constraints, and production deployment. Quantitative reasoning and enduring principles are emphasized over transient tools. Suitable for undergraduate and graduate courses in computer science and engineering, and for practitioners designing flexible, efficient, and robust ML systems. repo-url: https://github.com/harvard-edge/cs249r_book @@ -43,7 +43,7 @@ book: left: | Written, edited and curated by Prof. Vijay Janapa Reddi (Harvard University) right: | - This book was built with Quarto. + Built with Quarto. chapters: - index.qmd diff --git a/book/quarto/config/_quarto-pdf-vol1.yml b/book/quarto/config/_quarto-pdf-vol1.yml index 030a25716..afffdab85 100644 --- a/book/quarto/config/_quarto-pdf-vol1.yml +++ b/book/quarto/config/_quarto-pdf-vol1.yml @@ -34,7 +34,7 @@ book: roles: "Author, editor and curator." abstract: | - This book presents a comprehensive introduction to understanding and engineering machine learning systems, with a focus on mastering the machine learning node. It establishes the foundations through four progressive stages: building conceptual models, engineering complete workflows, optimizing for real-world constraints, and deploying with confidence. The content bridges theoretical foundations and practical engineering, emphasizing the systems context that engineers need to master when building AI solutions. While ML applications and tools evolve rapidly, the engineering principles for building ML systems remain largely consistent. This book distills these enduring concepts for anyone seeking to build flexible, efficient, and robust ML systems. + Foundations of machine learning systems engineering for single-machine deployment. ML systems are treated as infrastructure governed by physical constraints: data movement, memory bandwidth, and compute limits shape design decisions from model architecture to deployment target. The treatment progresses through conceptual models, end-to-end workflows, optimization under operational constraints, and production deployment. Quantitative reasoning and enduring principles are emphasized over transient tools. Suitable for undergraduate and graduate courses in computer science and engineering, and for practitioners designing flexible, efficient, and robust ML systems. repo-url: https://github.com/harvard-edge/cs249r_book @@ -42,7 +42,7 @@ book: left: | Written, edited and curated by Prof. Vijay Janapa Reddi (Harvard University) right: | - This book was built with Quarto. + Built with Quarto. chapters: - index.qmd @@ -50,58 +50,58 @@ book: # ================================================== # Volume I Frontmatter # ================================================== - - contents/vol1/frontmatter/dedication.qmd - - contents/vol1/frontmatter/foreword.qmd - - contents/vol1/frontmatter/about.qmd - - contents/vol1/frontmatter/acknowledgements.qmd - - contents/vol1/frontmatter/notation.qmd + # - contents/vol1/frontmatter/dedication.qmd + # - contents/vol1/frontmatter/foreword.qmd + # - contents/vol1/frontmatter/about.qmd + # - contents/vol1/frontmatter/acknowledgements.qmd + # - contents/vol1/frontmatter/notation.qmd # ================================================== # Part I: Foundations # ================================================== - - contents/vol1/parts/foundations_principles.qmd - - contents/vol1/introduction/introduction.qmd - - contents/vol1/ml_systems/ml_systems.qmd - - contents/vol1/ml_workflow/ml_workflow.qmd - - contents/vol1/data_engineering/data_engineering.qmd + # - contents/vol1/parts/foundations_principles.qmd + # - contents/vol1/introduction/introduction.qmd + # - contents/vol1/ml_systems/ml_systems.qmd + # - contents/vol1/ml_workflow/ml_workflow.qmd + # - contents/vol1/data_engineering/data_engineering.qmd # ================================================== # Part II: Build # ================================================== - - contents/vol1/parts/build_principles.qmd - - contents/vol1/nn_computation/nn_computation.qmd - - contents/vol1/nn_architectures/nn_architectures.qmd - - contents/vol1/frameworks/frameworks.qmd - - contents/vol1/training/training.qmd + # - contents/vol1/parts/build_principles.qmd + # - contents/vol1/nn_computation/nn_computation.qmd + # - contents/vol1/nn_architectures/nn_architectures.qmd + # - contents/vol1/frameworks/frameworks.qmd + # - contents/vol1/training/training.qmd # ================================================== # Part III: Optimize # ================================================== - - contents/vol1/parts/optimize_principles.qmd - - contents/vol1/data_selection/data_selection.qmd - - contents/vol1/optimizations/model_compression.qmd - - contents/vol1/hw_acceleration/hw_acceleration.qmd + # - contents/vol1/parts/optimize_principles.qmd + # - contents/vol1/data_selection/data_selection.qmd + # - contents/vol1/optimizations/model_compression.qmd + # - contents/vol1/hw_acceleration/hw_acceleration.qmd - contents/vol1/benchmarking/benchmarking.qmd # ================================================== # Part IV: Deploy # ================================================== - - contents/vol1/parts/deploy_principles.qmd - - contents/vol1/model_serving/model_serving.qmd - - contents/vol1/ml_ops/ml_ops.qmd - - contents/vol1/responsible_engr/responsible_engr.qmd - - contents/vol1/conclusion/conclusion.qmd - - contents/vol1/backmatter/references.qmd + # - contents/vol1/parts/deploy_principles.qmd + # - contents/vol1/model_serving/model_serving.qmd + # - contents/vol1/ml_ops/ml_ops.qmd + # - contents/vol1/responsible_engr/responsible_engr.qmd + # - contents/vol1/conclusion/conclusion.qmd + # - contents/vol1/backmatter/references.qmd # ================================================== # Appendices (uses Appendix A, B, C... numbering) # ================================================== # appendices: - - contents/vol1/backmatter/appendix_dam.qmd - - contents/vol1/backmatter/appendix_machine.qmd - - contents/vol1/backmatter/appendix_algorithm.qmd - - contents/vol1/backmatter/appendix_data.qmd - - contents/vol1/backmatter/glossary/glossary.qmd + # - contents/vol1/backmatter/appendix_dam.qmd + # - contents/vol1/backmatter/appendix_machine.qmd + # - contents/vol1/backmatter/appendix_algorithm.qmd + # - contents/vol1/backmatter/appendix_data.qmd + # - contents/vol1/backmatter/glossary/glossary.qmd bibliography: - contents/vol1/backmatter/references.bib diff --git a/book/quarto/contents/vol1/ml_systems/ml_systems.qmd b/book/quarto/contents/vol1/ml_systems/ml_systems.qmd index f87035ffa..4f18e2f0d 100644 --- a/book/quarto/contents/vol1/ml_systems/ml_systems.qmd +++ b/book/quarto/contents/vol1/ml_systems/ml_systems.qmd @@ -705,7 +705,7 @@ These archetypes map naturally to deployment paradigms: **Compute Beasts** and * # └───────────────────────────────────────────────────────────────────────────── from mlsys import Models from mlsys.constants import ( - RESNET50_FLOPs, GFLOPs, Mparam, Bparam, byte, MB, GB + RESNET50_FLOPs, GFLOPs, Mparam, Bparam, Kparam, byte, MB, GB, KB ) from mlsys.formatting import fmt, check diff --git a/book/quarto/contents/vol1/nn_architectures/nn_architectures.qmd b/book/quarto/contents/vol1/nn_architectures/nn_architectures.qmd index 1d7dbb39c..baf305888 100644 --- a/book/quarto/contents/vol1/nn_architectures/nn_architectures.qmd +++ b/book/quarto/contents/vol1/nn_architectures/nn_architectures.qmd @@ -297,6 +297,9 @@ class LighthouseSpecs: a100_mem_str = fmt(a100_mem, precision=0) +# Transformer quadratic scaling: doubling sequence length quadruples attention memory +transformer_scaling_ratio_str = "4" + # Note: No exports. Use LighthouseSpecs.variable directly. ``` diff --git a/book/quarto/contents/vol1/nn_computation/nn_computation.qmd b/book/quarto/contents/vol1/nn_computation/nn_computation.qmd index 0102fe245..5f886382b 100644 --- a/book/quarto/contents/vol1/nn_computation/nn_computation.qmd +++ b/book/quarto/contents/vol1/nn_computation/nn_computation.qmd @@ -66,6 +66,10 @@ Neural networks reduce to a small set of mathematical operations. Matrix multipl from mlsys.constants import * from mlsys.formatting import fmt, sci from mlsys.formulas import model_memory + +# MNIST 784→128→64→10 MAC count (used in Purpose / From Logic to Arithmetic) +_inf_madd = 784 * 128 + 128 * 64 + 64 * 10 # 109,184 +inf_madd_total_str = f"{_inf_madd:,}" ``` ## From Logic to Arithmetic {#sec-neural-computation-deep-learning-systems-engineering-foundation-597f} diff --git a/book/quarto/contents/vol1/training/training.qmd b/book/quarto/contents/vol1/training/training.qmd index cdb87668e..2b8ac6fcb 100644 --- a/book/quarto/contents/vol1/training/training.qmd +++ b/book/quarto/contents/vol1/training/training.qmd @@ -17,6 +17,14 @@ engine: jupyter ::: +```{python} +#| label: purpose-anchor +#| echo: false +# Early anchor for Purpose section inline refs (training-setup runs later) +gpt2_train_cost_str = "$50,000" +gpt2_inf_cost_str = "$0.0001" +``` + ## Purpose {.unnumbered} \begin{marginfigure}