From e3d4deeb1288065db3444b0be28d1bb597fff741 Mon Sep 17 00:00:00 2001 From: Vijay Janapa Reddi Date: Sat, 31 Jan 2026 19:45:52 -0500 Subject: [PATCH] Standardize volume titles to full 'Machine Learning Systems' naming Replaces abbreviated 'ML Systems' with 'Machine Learning Systems' across volume titles, part openers, frontmatter, summaries, docs, and landing page for consistency with MIT Press submission standards. --- book/docs/PART_KEY_VALIDATION.md | 4 ++-- book/docs/VOLUME_STRUCTURE.md | 18 +++++++------- .../contents/vol1/frontmatter/about.qmd | 24 ++++++++++++++----- .../contents/vol1/parts/build_principles.qmd | 12 +++++----- .../contents/vol1/parts/deploy_principles.qmd | 10 ++++---- .../vol1/parts/foundations_principles.qmd | 14 +++++------ .../vol1/parts/optimize_principles.qmd | 12 +++++----- book/quarto/contents/vol1/parts/summaries.yml | 4 ++-- .../contents/vol2/frontmatter/about.qmd | 2 +- .../contents/vol2/frontmatter/foreword.qmd | 2 +- book/quarto/contents/vol2/index.qmd | 3 +-- book/quarto/contents/vol2/parts/summaries.yml | 4 ++-- landing/index.html | 4 ++-- 13 files changed, 62 insertions(+), 51 deletions(-) diff --git a/book/docs/PART_KEY_VALIDATION.md b/book/docs/PART_KEY_VALIDATION.md index 7beb756d8..bebcd0a72 100644 --- a/book/docs/PART_KEY_VALIDATION.md +++ b/book/docs/PART_KEY_VALIDATION.md @@ -54,12 +54,12 @@ The following keys are defined in `quarto/contents/parts/summaries.yml`: | Key | Title | Type | |-----|-------|------| | `frontmatter` | Frontmatter | Division | -| `volume_1` | Volume I: Introduction to ML Systems | Division | +| `volume_1` | Volume I: Introduction to Machine Learning Systems | Division | | `vol1_foundations` | ML Foundations | Part | | `vol1_development` | System Development | Part | | `vol1_optimization` | Model Optimization | Part | | `vol1_operations` | System Operations | Part | -| `volume_2` | Volume II: Advanced ML Systems | Division | +| `volume_2` | Volume II: Advanced Machine Learning Systems | Division | | `vol2_scale` | Foundations of Scale | Part | | `vol2_distributed` | Distributed Systems | Part | | `vol2_production` | Production Challenges | Part | diff --git a/book/docs/VOLUME_STRUCTURE.md b/book/docs/VOLUME_STRUCTURE.md index d1e4bf50a..3d8316f9b 100644 --- a/book/docs/VOLUME_STRUCTURE.md +++ b/book/docs/VOLUME_STRUCTURE.md @@ -28,7 +28,7 @@ A reader completes Volume I and can competently build, optimize, and deploy ML s - Practitioners transitioning into ML systems ### Course Mapping -- Single semester "Introduction to ML Systems" course +- Single semester "Introduction to Machine Learning Systems" course - Foundation for more advanced distributed systems or MLOps courses ### Structure (16 chapters) @@ -40,16 +40,16 @@ Establish the conceptual framework for understanding ML as a systems discipline. |----|-------|---------| | 1 | Introduction | Why ML systems thinking matters | | 2 | ML Systems | Survey of the field, deployment paradigms | -| 3 | Deep Learning Primer | Mathematical and conceptual foundations | -| 4 | DNN Architectures | CNNs, RNNs, Transformers, architectural choices | +| 3 | Workflow | End-to-end ML development process | +| 4 | Data Engineering | Pipelines, preprocessing, data quality | -#### Part II: Development -Practical skills for constructing ML systems from data to trained model. +#### Part II: Build +The technical implementation of machine learning systems from math to trained models. | Ch | Title | Purpose | |----|-------|---------| -| 5 | Workflow | End-to-end ML development process | -| 6 | Data Engineering | Pipelines, preprocessing, data quality | +| 5 | DL Primer | Mathematical and conceptual foundations | +| 6 | DNN Architectures | CNNs, RNNs, Transformers, architectural choices | | 7 | Frameworks | PyTorch, TensorFlow, JAX ecosystem | | 8 | Training | Training loops, optimization, debugging | @@ -58,8 +58,8 @@ Techniques for making ML systems efficient and fast. | Ch | Title | Purpose | |----|-------|---------| -| 9 | Efficient AI | Why efficiency matters, scaling laws, metrics | -| 10 | Optimizations | Quantization, pruning, distillation | +| 9 | Data Selection | Optimizing information, active learning, pruning | +| 10 | Model Compression | Quantization, pruning, distillation | | 11 | Hardware Acceleration | GPUs, TPUs, custom accelerators | | 12 | Benchmarking | Measuring performance, MLPerf | diff --git a/book/quarto/contents/vol1/frontmatter/about.qmd b/book/quarto/contents/vol1/frontmatter/about.qmd index c1b21c9eb..c4e908d65 100644 --- a/book/quarto/contents/vol1/frontmatter/about.qmd +++ b/book/quarto/contents/vol1/frontmatter/about.qmd @@ -8,13 +8,22 @@ Most readers will not become ML hardware architects or compiler engineers. Yet e ## Why a Systems Textbook for Machine Learning {#sec-about-why-systems} -In 1968, a NATO conference in Garmisch, Germany, coined the term *software engineering*. The motivation was a crisis: software systems had grown too complex for ad hoc programming to manage. Building reliable software required its own discipline, with its own principles, methodologies, and rigor. Nearly a decade later, in 1977, computer engineering emerged as a distinct field, recognizing that designing hardware systems demanded more than applied physics. In both cases, the practitioners already existed. The body of knowledge already existed. What was missing was the institutional recognition that these activities constituted a discipline. +In 1968, a NATO conference in Garmisch, Germany, coined the term *software engineering* to address a crisis: software systems had grown too complex for ad hoc programming to manage. Building reliable software required its own discipline, with its own principles, methodologies, and rigor. Decades later, Hennessy and Patterson transformed computer architecture from an art into a quantitative science, giving engineers the tools to measure, predict, and optimize processor performance from first principles. In both cases, the practitioners already existed. The body of knowledge already existed. What was missing was the recognition that these activities constituted a discipline. -Machine learning faces an analogous moment. Everyone wants to build models, but models alone are not systems. A neural network that achieves state-of-the-art accuracy in a notebook may fail to serve a single user in production if the data pipeline is brittle, the serving infrastructure cannot meet latency requirements, or the hardware is mismatched to the workload. The gap between a trained model and a working ML system is an engineering gap, and closing it requires an engineering discipline. +Machine learning faces its own Garmisch moment. -If AI is electricity, someone must build more than the appliances. Who engineers the generators? Who builds the transmission lines? Who designs the transformers that step voltage down for homes and offices? That is AI engineering: the training infrastructure, the serving systems, the optimization for edge devices, the monitoring that catches failures before users do. Computer science asks what is computable. Computer engineering asks how to build systems that compute reliably. ML research asks what models can learn. AI engineering asks how to build systems that learn reliably, serve efficiently, fail safely, and scale sustainably. +For the past decade, the field has been dominated by *model-centric* thinking—a focus on discovering new algorithms and architectures. But a model is not a system. A neural network that achieves state-of-the-art accuracy in a research notebook is useless if it cannot be served within latency constraints, if its data pipeline cannot scale to petabytes, or if its energy cost exceeds its economic value. The gap between a working model and a production system is not just a matter of "coding harder"; it is a gap in fundamental engineering principles. -This book treats that discipline as its subject. It sits at the intersection of ML theory, ML systems, and applied ML, teaching the practice of building efficient, reliable, and robust intelligent systems that operate in the real world, not just models in isolation. +This book treats ML systems engineering not as a collection of tools --- Kubernetes, PyTorch, CUDA --- but as a discipline governed by physical invariants. Just as civil engineers cannot ignore gravity, AI engineers cannot ignore the constraints that govern every system they build: + +* **Data Gravity**: The physical cost of moving information. +* **Compute Density**: The limits of silicon throughput and thermal envelopes. +* **The Iron Law**: The immutable relationship between data movement, arithmetic intensity, and system performance. +* **Conservation of Complexity**: The reality that simplifying code often shifts complexity to data pipelines or infrastructure. + +If computer science asks "what is computable?" and ML research asks "what is learnable?", then AI engineering asks "what is buildable, given the physics of real hardware and real data?" + +This book is the foundation for that discipline. We aim to do for AI systems what Hennessy and Patterson did for computer architecture: replace intuition with measurement, and art with engineering. The goal is to teach you to reason about the system *before* you build it, treating data, algorithms, and hardware not as separate silos but as a single, coupled optimization problem. Consider a box of LEGO bricks. The same interlocking pieces build a spaceship, a castle, or an entire city. The sets change; the bricks do not. ML systems work the same way. A convolutional network for image classification, a transformer for language generation, and a reinforcement learning agent for robotics are vastly different applications, but beneath them sit the same building blocks: computational graphs, memory hierarchies, data pipelines, optimization loops, and the tradeoffs between latency, throughput, and energy. Master the building blocks and you can construct any system. @@ -100,7 +109,7 @@ This book is open source. The full text, figures, and build system are publicly ## About the Companion Volume {#sec-about-companion-volume} -This is the first of two volumes. Volume I covers the single compute node: one machine, one to eight accelerators, a shared memory space. A companion volume, *Machine Learning Systems, Volume II: Advanced ML Systems*, extends these foundations to the distributed scale. +This is the first of two volumes. Volume I covers the single compute node: one machine, one to eight accelerators, a shared memory space. A companion volume, *Advanced Machine Learning Systems*, extends these foundations to the distributed scale. Where this volume asks how to build, optimize, and deploy ML systems on a single machine, Volume II asks how to coordinate computation across thousands of machines. It covers distributed training strategies, fleet-scale infrastructure, production deployment at global scale, and the governance challenges that arise when ML systems operate in the real world. @@ -116,10 +125,13 @@ Volume II is forthcoming. Updates and early access materials are available at ** ## Using This Book in a Course {#sec-about-course-use} -This book grew out of CS249r at Harvard University and the TinyML edX professional certificate. It is designed to support a one-semester course covering the full ML systems stack. Instructors may also select individual parts for shorter modules: +This book grew out of CS249r at Harvard University and the TinyML edX professional certificate program. The TinyML course taught ML systems at the most constrained end of the spectrum --- microcontrollers running on milliwatts with kilobytes of memory --- and revealed a surprising lesson: the fundamental constraints that govern tiny devices (memory bandwidth, compute density, energy per operation) are the same constraints that govern data center GPUs. The physics scales; only the numbers change. That insight shaped this book. What began as a course on resource-constrained ML matured into a comprehensive treatment of ML systems engineering, grounded in the principle that mastering the fundamentals at any scale prepares you for every scale. + +The book is designed to support a one-semester course covering the full ML systems stack. Instructors may also select individual parts for shorter modules: - **Parts I--II** (Foundations and Build) suit an introductory half-semester on ML development fundamentals. - **Parts III--IV** (Optimize and Deploy) suit an applied half-semester on production ML engineering. +- **Principles Only** (Part introductions: @sec-part-foundations, @sec-part-build, @sec-part-optimize, @sec-part-deploy) suits a condensed overview of ML systems invariants for survey courses or executive programs. ::: {.content-visible when-format="html"} Lecture slides, assignments, and instructor resources are available at [mlsysbook.ai](https://mlsysbook.ai). diff --git a/book/quarto/contents/vol1/parts/build_principles.qmd b/book/quarto/contents/vol1/parts/build_principles.qmd index 2e0ccb143..9505c73c2 100644 --- a/book/quarto/contents/vol1/parts/build_principles.qmd +++ b/book/quarto/contents/vol1/parts/build_principles.qmd @@ -8,7 +8,7 @@ If Part I established *what* limits us, Part II addresses *how* we design around Four principles govern the construction of ML systems. We begin with the fundamental equation that determines all ML performance: -::: {#nte-iron-law .callout-principle title="The Iron Law of ML Systems" icon="false"} +::: {#principle-iron-law .callout-principle title="The Iron Law of ML Systems" icon="false"} **The Invariant**: The latency ($L$) of any machine learning operation is determined by the sum of three components, divided by the system's efficiency ($\eta$): $$ L = \frac{\text{Data Movement} + \text{Compute} + \text{System Overhead}}{\eta} $$ @@ -17,7 +17,7 @@ $$ L = \frac{\text{Data Movement} + \text{Compute} + \text{System Overhead}}{\et The Iron Law tells us *what* to optimize, but not *how*. The answer depends on which hardware resource your architecture will saturate—a choice that defines an implicit contract: -::: {#nte-silicon-contract .callout-principle title="The Silicon Contract" icon="false"} +::: {#principle-silicon-contract .callout-principle title="The Silicon Contract" icon="false"} **The Invariant**: Every model architecture makes an implicit agreement with the hardware. - **ResNet-50** assumes high-density floating-point compute (Compute-Bound). @@ -29,7 +29,7 @@ The Iron Law tells us *what* to optimize, but not *how*. The answer depends on w Once you've committed to a hardware target, efficiency emerges from how you distribute computation across layers. Not all layers are created equal: -::: {#nte-depth-efficiency .callout-principle title="The Depth-Efficiency Law" icon="false"} +::: {#principle-depth-efficiency .callout-principle title="The Depth-Efficiency Law" icon="false"} **The Invariant**: In hierarchical feature extractors, computational efficiency is maximized when early layers process high-resolution, low-semantic data (edges, textures) and later layers process low-resolution, high-semantic data (objects, concepts). **The Implication**: Architectures should be tapered (e.g., Pyramidal CNNs). Allocating equal compute to every layer is inefficient; capacity should be concentrated where semantic density is highest. @@ -37,7 +37,7 @@ Once you've committed to a hardware target, efficiency emerges from how you dist Depth enables expressiveness, but it introduces its own constraint. As networks grow deeper, a mathematical invariant determines whether learning is even possible: -::: {#nte-residual-gradient .callout-principle title="The Residual Gradient Invariant" icon="false"} +::: {#principle-residual-gradient .callout-principle title="The Residual Gradient Invariant" icon="false"} **The Invariant**: Deep network trainability is preserved only if there exists a direct path for gradient flow that bypasses non-linear transformations. $$ \frac{\partial L}{\partial x} \approx 1 $$ @@ -46,9 +46,9 @@ $$ \frac{\partial L}{\partial x} \approx 1 $$ ## Part II Roadmap {#sec-build-roadmap} -The chapters translate these principles into the components of the ML stack. Each chapter assembles one leg of the **DAM Taxonomy**: we define the **Logic** (Algorithm), provision the **Physics** (Machine), and refine the **Information** (Data). +The chapters translate these principles into the components of the ML stack. We focus on the constructive elements of the **DAM Taxonomy**: defining the **Logic** (Algorithm) through architectures, and provisioning the **Physics** (Machine) through frameworks and training systems. -1. **Deep Learning Foundations (@sec-deep-learning-systems-foundations)**: The mathematical foundations of gradient flow—backpropagation, loss landscapes, and the calculus that makes learning possible. +1. **Deep Learning Systems Foundations (@sec-deep-learning-systems-foundations)**: The mathematical foundations of gradient flow—backpropagation, loss landscapes, and the calculus that makes learning possible. 2. **DNN Architectures (@sec-dnn-architectures)**: Defining the **Logic**. How we structure the Silicon Contract for Vision, Language, and Recommendations. Each architecture family represents a different bet on which hardware resource to saturate. diff --git a/book/quarto/contents/vol1/parts/deploy_principles.qmd b/book/quarto/contents/vol1/parts/deploy_principles.qmd index 231afa80a..0978a1727 100644 --- a/book/quarto/contents/vol1/parts/deploy_principles.qmd +++ b/book/quarto/contents/vol1/parts/deploy_principles.qmd @@ -8,7 +8,7 @@ These principles define the "Physics of Reliability"—the laws that govern why Four principles govern the reliability of deployed ML systems. We begin with what makes ML verification fundamentally different from traditional software: -::: {#nte-verification-gap .callout-principle title="The Verification Gap" icon="false"} +::: {#principle-verification-gap .callout-principle title="The Verification Gap" icon="false"} **The Invariant**: In traditional software, you verify behavior using **Unit Tests** (asserting that $f(x) == y$). In machine learning, you verify behavior using **Statistical Bounds**: $$ P(f(X) \approx Y) > 1 - \epsilon $$ @@ -17,7 +17,7 @@ $$ P(f(X) \approx Y) > 1 - \epsilon $$ The Verification Gap means we cannot prove correctness—we can only bound statistical performance. But even those bounds erode over time. The world does not stand still: -::: {#nte-statistical-drift .callout-principle title="The Statistical Drift Invariant" icon="false"} +::: {#principle-statistical-drift .callout-principle title="The Statistical Drift Invariant" icon="false"} **The Invariant**: Unlike traditional software, which fails only when code changes, ML systems fail because the *environment* changes. Reliability degrades monotonically over time as the world moves away from the training distribution. **The Implication**: Observability must shift from system metrics (latency, errors) to statistical metrics (distribution distance). A system without **Data Drift Monitoring** is a system in a state of unobserved decay. @@ -25,7 +25,7 @@ The Verification Gap means we cannot prove correctness—we can only bound stati External drift is not the only threat. Even when the world holds still, your own systems can diverge. A subtler failure mode emerges from inconsistency between training and serving: -::: {#nte-training-serving-skew .callout-principle title="The Training-Serving Skew Law" icon="false"} +::: {#principle-training-serving-skew .callout-principle title="The Training-Serving Skew Law" icon="false"} **The Invariant**: Model performance degrades (typically 5–15%) whenever the serving data distribution or feature logic diverges from the training environment. **The Implication**: Feature consistency is a hard architectural requirement. **Feature Stores** are not just caches; they are consistency engines that ensure the mathematical function computed at inference is identical to the one learned during training. @@ -33,7 +33,7 @@ External drift is not the only threat. Even when the world holds still, your own Beneath all these reliability concerns lies a non-negotiable constraint. In production, you cannot trade latency for anything else: -::: {#nte-latency-budget .callout-principle title="The Latency Budget Invariant" icon="false"} +::: {#principle-latency-budget .callout-principle title="The Latency Budget Invariant" icon="false"} **The Invariant**: In real-time serving, **P99 Latency** is the hard constraint; throughput is the variable to be optimized within that constraint. **The Implication**: Serving systems must implement **Tail-Tolerant** designs (e.g., dynamic batching, hedged requests). You must be willing to sacrifice overall throughput to meet the latency deadline of the oldest request in the queue. @@ -45,7 +45,7 @@ The chapters bridge the gap between benchmark performance and production reality 1. **Model Serving Systems (@sec-model-serving-systems)**: Translating optimized models into production services—load balancing, batching strategies, and the infrastructure that turns benchmarks into real-world throughput. -2. **Machine Learning Operations (@sec-machine-learning-operations-mlops)**: The MLOps lifecycle—automated retraining, monitoring, versioning, and the continuous validation that catches drift before users do. +2. **Machine Learning Operations (MLOps) (@sec-machine-learning-operations-mlops)**: The MLOps lifecycle—automated retraining, monitoring, versioning, and the continuous validation that catches drift before users do. 3. **Responsible Engineering (@sec-responsible-engineering)**: Ensuring fairness, safety, and transparency in high-stakes deployments—because a fast, reliable system that causes harm is still a failure. diff --git a/book/quarto/contents/vol1/parts/foundations_principles.qmd b/book/quarto/contents/vol1/parts/foundations_principles.qmd index 41766383f..0295befa0 100644 --- a/book/quarto/contents/vol1/parts/foundations_principles.qmd +++ b/book/quarto/contents/vol1/parts/foundations_principles.qmd @@ -8,7 +8,7 @@ Before diving into architectures, frameworks, and optimizations, we must first u Four principles define the physics of ML systems. We begin with the meta-principle that governs all system design: -::: {#nte-conservation-complexity .callout-principle title="Conservation of Complexity" icon="false"} +::: {#principle-conservation-complexity .callout-principle title="Conservation of Complexity" icon="false"} **The Invariant**: You cannot destroy complexity; you can only displace it. In ML systems, complexity flows between the **Information** (Data), the **Logic** (Algorithm), and the **Physics** (Machine). $$ \Delta C_{total} = \Delta C_{data} + \Delta C_{algorithm} + \Delta C_{machine} \approx 0 $$ @@ -17,7 +17,7 @@ $$ \Delta C_{total} = \Delta C_{data} + \Delta C_{algorithm} + \Delta C_{machine If complexity must live somewhere, where does it reside in ML systems? The answer reveals the fundamental difference between traditional software and machine learning: -::: {#nte-data-as-code .callout-principle title="The Data as Code Invariant" icon="false"} +::: {#principle-data-as-code .callout-principle title="The Data as Code Invariant" icon="false"} **The Invariant**: Data *is* the source code of the ML system. A change in the training dataset ($\Delta D$) is functionally equivalent to a change in the executable logic ($\Delta P$). $$ \text{System Behavior} \approx f(\text{Data}) $$ @@ -26,7 +26,7 @@ $$ \text{System Behavior} \approx f(\text{Data}) $$ But data is not just logically important—it has physical properties that constrain system architecture. Unlike code, which can be copied and distributed freely, data resists movement: -::: {#nte-data-gravity .callout-principle title="The Data Gravity Invariant" icon="false"} +::: {#principle-data-gravity .callout-principle title="The Data Gravity Invariant" icon="false"} **The Invariant**: Data possesses mass. As dataset scale ($D$) increases, the cost (latency, bandwidth, energy) of moving data exceeds the cost of moving compute. $$ C_{move}(D) \gg C_{move}(Compute) $$ @@ -35,7 +35,7 @@ $$ C_{move}(D) \gg C_{move}(Compute) $$ Data gravity explains *where* data lives; the final principle explains how data *flows*. Even with perfectly positioned data, systems face a fundamental throughput constraint: -::: {#nte-pipeline-stall .callout-principle title="The Pipeline Stall Law" icon="false"} +::: {#principle-pipeline-stall .callout-principle title="The Pipeline Stall Law" icon="false"} **The Invariant**: The throughput of a training system is strictly bounded by the slowest component in the data pipeline, not the fastest accelerator. $$ T_{system} = \min(T_{IO}, T_{CPU}, T_{GPU}) $$ @@ -48,10 +48,10 @@ The chapters in Part I build the conceptual foundation for everything that follo 1. **Introduction (@sec-introduction)**: Why ML systems engineering exists as a discipline. We establish the core metric—*Samples per Dollar*—and frame the optimization problem that the rest of the book solves. -2. **ML System Architecture (@sec-ml-system-architecture)**: How physical constraints (speed of light, power wall, memory wall) create the deployment spectrum from cloud to TinyML. This chapter applies the Pipeline Stall Law to explain why different environments demand fundamentally different architectures. +2. **ML System Architecture (@sec-ml-system-architecture)**: How physical constraints (speed of light, power wall, memory wall) create the deployment spectrum from cloud to TinyML. This chapter applies the **Pipeline Stall Law** (@principle-pipeline-stall) to explain why different environments demand fundamentally different architectures. -3. **The AI Development Workflow (@sec-ai-development-workflow)**: The lifecycle that manages the Conservation of Complexity. From data collection through deployment, we trace how complexity moves between stages and how teams coordinate that flow. +3. **The AI Development Workflow (@sec-ai-development-workflow)**: The lifecycle that manages the **Conservation of Complexity** (@principle-conservation-complexity). From data collection through deployment, we trace how complexity moves between stages and how teams coordinate that flow. -4. **Data Engineering for ML (@sec-data-engineering-ml)**: The practical application of the Data-as-Code and Data Gravity invariants. This chapter covers the tools and techniques for treating data with the same engineering rigor as source code. +4. **Data Engineering for ML (@sec-data-engineering-ml)**: The practical application of the **Data-as-Code** (@principle-data-as-code) and **Data Gravity** (@principle-data-gravity) invariants. This chapter covers the tools and techniques for treating data with the same engineering rigor as source code. By the end of Part I, you will have internalized the invariants that constrain every ML system and learned to reason about where complexity must reside in your designs. With these foundations in place, Part II turns from understanding constraints to constructing systems that operate within them, beginning with the mathematical machinery of deep learning and progressing through architectures, frameworks, and training. diff --git a/book/quarto/contents/vol1/parts/optimize_principles.qmd b/book/quarto/contents/vol1/parts/optimize_principles.qmd index e3d15b168..5a445bdf0 100644 --- a/book/quarto/contents/vol1/parts/optimize_principles.qmd +++ b/book/quarto/contents/vol1/parts/optimize_principles.qmd @@ -8,7 +8,7 @@ These principles define the "Physics of Efficiency"—the laws that determine wh Four principles govern the efficiency of ML systems. We begin by acknowledging that optimization is inherently multi-dimensional: -::: {#nte-pareto-frontier .callout-principle title="The Pareto Frontier" icon="false"} +::: {#principle-pareto-frontier .callout-principle title="The Pareto Frontier" icon="false"} **The Invariant**: Optimization is not a single-objective goal. It is a multi-dimensional search for the **Pareto Frontier**—the boundary where you cannot improve one metric without degrading another. - **Quantization** trades numerical precision for memory bandwidth. @@ -20,7 +20,7 @@ Four principles govern the efficiency of ML systems. We begin by acknowledging t Navigating the Pareto Frontier requires knowing *which* resource to optimize. Before selecting a technique, you must diagnose the bottleneck: -::: {#nte-arithmetic-intensity .callout-principle title="Arithmetic Intensity Law" icon="false"} +::: {#principle-arithmetic-intensity .callout-principle title="Arithmetic Intensity Law" icon="false"} **The Invariant**: System throughput ($P$) is bounded by the minimum of peak compute ($P_{peak}$) and memory bandwidth ($B_{mem}$) relative to the workload's arithmetic intensity ($I$): $$ P = \min(P_{peak}, I \times B_{mem}) $$ @@ -29,7 +29,7 @@ $$ P = \min(P_{peak}, I \times B_{mem}) $$ The Roofline reveals whether you're limited by compute or memory bandwidth. But *why* does memory bandwidth matter so much? The answer lies in the physics of data movement: -::: {#nte-energy-movement .callout-principle title="The Energy-Movement Invariant" icon="false"} +::: {#principle-energy-movement .callout-principle title="The Energy-Movement Invariant" icon="false"} **The Invariant**: Moving 1 bit of data from DRAM costs 100–500× more energy than performing an arithmetic operation on it. $$ E_{move} \gg E_{compute} $$ @@ -38,7 +38,7 @@ $$ E_{move} \gg E_{compute} $$ Even with perfect data locality and optimal bottleneck targeting, a final constraint limits how much speedup is achievable. No optimization can escape Amdahl's ceiling: -::: {#nte-amdahls-law .callout-principle title="Amdahl's Law" icon="false"} +::: {#principle-amdahls-law .callout-principle title="Amdahl's Law" icon="false"} **The Invariant**: The maximum speedup of a system is limited by the fraction of the workload that cannot be accelerated. $$ \text{Speedup} = \frac{1}{(1-p) + \frac{p}{s}} $$ where $p$ is the parallelizable fraction and $s$ is the speedup of that fraction. @@ -90,8 +90,8 @@ We structure our optimization journey by following the **DAM Taxonomy** in this 2. **Model Compression (@sec-model-compression)**: **Optimizing Logic**. Once we have the right data, we optimize the blueprint. We reduce complexity through pruning, quantization, and knowledge distillation—extracting the same capability from fewer parameters. This optimizes the $Ops$ term. We call this "Model Compression" because we are compressing the logical representation of the task. -3. **Hardware Acceleration (@sec-ai-acceleration)**: **Optimizing Physics**. Finally, we map the compressed blueprint onto silicon. We design software kernels and memory hierarchies that match the physical reality of the hardware. This optimizes the $Throughput \times Utilization$ denominator. We call this "Hardware Acceleration" because we are exploiting the specific physics of the machine (GPUs, TPUs) to execute the logic. +3. **AI Acceleration (@sec-ai-acceleration)**: **Optimizing Physics**. Finally, we map the compressed blueprint onto silicon. We design software kernels and memory hierarchies that match the physical reality of the hardware. This optimizes the $Throughput \times Utilization$ denominator. We call this "Hardware Acceleration" because we are exploiting the specific physics of the machine (GPUs, TPUs) to execute the logic. -4. **Benchmarking (@sec-benchmarking-ai)**: **Validating the System**. Optimization without measurement is guesswork. We learn to measure performance reliably and diagnose bottlenecks systematically. When performance stalls, we ask: is the bottleneck in the Information (Data), the Logic (Algorithm), or the Physics (Machine)? +4. **Benchmarking AI (@sec-benchmarking-ai)**: **Validating the System**. Optimization without measurement is guesswork. We learn to measure performance reliably and diagnose bottlenecks systematically. When performance stalls, we ask: is the bottleneck in the Information (Data), the Logic (Algorithm), or the Physics (Machine)? By the end of Part III, you will be able to diagnose bottlenecks, navigate the Pareto frontier of competing objectives, and apply systematic optimizations across data, algorithms, and hardware. With an efficient model in hand, Part IV addresses the final and most unforgiving challenge: deploying that model into production, where statistical drift, serving constraints, and real-world reliability determine whether your system delivers lasting value. diff --git a/book/quarto/contents/vol1/parts/summaries.yml b/book/quarto/contents/vol1/parts/summaries.yml index efd234aef..7159acc63 100644 --- a/book/quarto/contents/vol1/parts/summaries.yml +++ b/book/quarto/contents/vol1/parts/summaries.yml @@ -17,14 +17,14 @@ parts: The main content of the textbook organized into thematic parts. # ============================================================================= - # VOLUME I: INTRODUCTION TO ML SYSTEMS + # VOLUME I: INTRODUCTION TO MACHINE LEARNING SYSTEMS # ============================================================================= - key: "volume_1" division: "mainmatter" type: "division" numbered: false - title: "Volume I: Introduction to ML Systems" + title: "Volume I: Introduction to Machine Learning Systems" description: > Volume I provides a comprehensive introduction to machine learning systems, focusing on the foundational concepts and engineering practices for single-machine systems. diff --git a/book/quarto/contents/vol2/frontmatter/about.qmd b/book/quarto/contents/vol2/frontmatter/about.qmd index 533d12ed0..9baf075d4 100644 --- a/book/quarto/contents/vol2/frontmatter/about.qmd +++ b/book/quarto/contents/vol2/frontmatter/about.qmd @@ -2,7 +2,7 @@ ## Overview {#sec-volume-overview-52bf} -Volume II: Advanced ML Systems addresses the complexities of **building and managing the machine learning fleet**. This volume follows the Hennessy & Patterson pedagogical model, building upon single-node foundations to tackle the challenges of ML systems that span thousands of machines, traverse global networks, and serve millions of users simultaneously. It shifts the focus from the individual accelerator to the warehouse-scale computer—the Machine Learning Fleet. +Volume II: Advanced Machine Learning Systems addresses the complexities of **building and managing the machine learning fleet**. This volume follows the Hennessy & Patterson pedagogical model, building upon single-node foundations to tackle the challenges of ML systems that span thousands of machines, traverse global networks, and serve millions of users simultaneously. It shifts the focus from the individual accelerator to the warehouse-scale computer—the Machine Learning Fleet. ## What This Volume Covers {#sec-volume-volume-covers-e07b} diff --git a/book/quarto/contents/vol2/frontmatter/foreword.qmd b/book/quarto/contents/vol2/frontmatter/foreword.qmd index 763c4392c..86e1560cd 100644 --- a/book/quarto/contents/vol2/frontmatter/foreword.qmd +++ b/book/quarto/contents/vol2/frontmatter/foreword.qmd @@ -2,7 +2,7 @@ ::: {style="font-style: italic;"} -Volume II: Advanced ML Systems addresses the challenges of building machine learning systems at production scale. While foundational texts often focus on single-machine systems, this volume teaches you to build and operate them across distributed infrastructure. +Volume II: Advanced Machine Learning Systems addresses the challenges of building machine learning systems at production scale. While foundational texts often focus on single-machine systems, this volume teaches you to build and operate them across distributed infrastructure. The transition from single machine to distributed systems represents one of the most significant challenges in ML engineering. Models that train in hours on a single GPU may require weeks without proper parallelization strategies. Inference systems that work perfectly in development may fail catastrophically under production load. This volume addresses these challenges systematically. diff --git a/book/quarto/contents/vol2/index.qmd b/book/quarto/contents/vol2/index.qmd index 3420501be..f2bcea80a 100644 --- a/book/quarto/contents/vol2/index.qmd +++ b/book/quarto/contents/vol2/index.qmd @@ -1,8 +1,7 @@ --- format: html: - title: "Machine Learning Systems: Volume II" - subtitle: "Advanced ML Systems" + title: "Advanced Machine Learning Systems" date: today date-format: long doi: "v0.5.1" diff --git a/book/quarto/contents/vol2/parts/summaries.yml b/book/quarto/contents/vol2/parts/summaries.yml index 31aeacba0..ac4651c7a 100644 --- a/book/quarto/contents/vol2/parts/summaries.yml +++ b/book/quarto/contents/vol2/parts/summaries.yml @@ -17,14 +17,14 @@ parts: The main content of the textbook organized into thematic parts. # ============================================================================= - # VOLUME II: ADVANCED ML SYSTEMS + # VOLUME II: ADVANCED MACHINE LEARNING SYSTEMS # ============================================================================= - key: "volume_2" division: "mainmatter" type: "division" numbered: false - title: "Volume II: Advanced ML Systems" + title: "Volume II: Advanced Machine Learning Systems" description: > Volume II explores advanced topics for distributed and production-scale systems, focusing on the infrastructure and coordination required to scale beyond single machines. diff --git a/landing/index.html b/landing/index.html index 5799d44ea..9e1f477f9 100644 --- a/landing/index.html +++ b/landing/index.html @@ -623,7 +623,7 @@
📘

Volume I: Foundations

-

Introduction to ML Systems

+

Introduction to Machine Learning Systems

Hardware fundamentals, ML basics, training pipelines, and optimization techniques. Start here.

@@ -631,7 +631,7 @@
📙

Volume II: Advanced

-

Scaling ML Systems

+

Advanced Machine Learning Systems

Distributed training, production deployment, security, responsible AI, and emerging architectures.