diff --git a/quarto/contents/core/ai_for_good/ai_for_good.qmd b/quarto/contents/core/ai_for_good/ai_for_good.qmd index e7ced862b..f728dad12 100644 --- a/quarto/contents/core/ai_for_good/ai_for_good.qmd +++ b/quarto/contents/core/ai_for_good/ai_for_good.qmd @@ -397,7 +397,7 @@ The Distributed Knowledge Pattern enables peer-to-peer learning and coordination The Adaptive Resource Pattern dynamically adjusts computation based on current resource availability. Drawing on the power management and thermal optimization strategies from @sec-ai-acceleration, this pattern implements energy-aware inference scheduling. It proves most effective for deployments with predictable resource patterns such as solar charging cycles and network availability windows. -### Pattern Comparison Framework {#sec-ai-good-pattern-comparison-framework} +### Pattern Comparison Framework {#sec-ai-good-pattern-comparison-framework-5d2b} The four design patterns address different combinations of constraints and operational contexts. @tbl-design-pattern-comparison provides a systematic comparison to guide pattern selection for specific deployment scenarios. @@ -1125,23 +1125,23 @@ Quality degradation management presents ongoing challenges, especially in ML app These limitations necessitate careful system design and implementation strategies. Successful deployments often implement robust monitoring systems, graceful degradation mechanisms, and clear quality thresholds for different resource states. While these challenges don't negate the pattern's utility, they emphasize the importance of thorough planning and realistic performance expectations in adaptive system deployments. -## Resource-Constrained Learning Theory {#sec-ai-good-theoretical-foundations-resourceconstrained-learning-709d} +## Resource-Constrained Learning Theory {#sec-ai-good-resourceconstrained-learning-theory-b4f7} The design patterns presented above emerge from fundamental theoretical constraints that distinguish resource-limited deployments from conventional ML systems. While the patterns provide practical guidance, understanding their theoretical foundations enables engineers to make principled design decisions and recognize when to adapt or combine patterns for specific contexts. Social good applications reveal fundamental limitations in current machine learning approaches, where resource constraints expose gaps between theoretical learning requirements and practical deployment realities. The training methodologies from @sec-ai-training assumed abundant computational resources, large datasets, and reliable infrastructure. Here, we examine how those foundational principles must be reconsidered when these assumptions fail. -### Sample Complexity in Low-Resource Environments {#sec-ai-good-sample-complexity-lowresource-environments-2068} +### Sample Complexity in Low-Resource Environments {#sec-ai-good-sample-complexity-lowresource-environments-3f27} Traditional supervised learning assumes abundant labeled data, typically requiring 1000+ examples per class to achieve acceptable generalization performance. Resource-constrained environments challenge this assumption, often providing fewer than 100 examples per class while demanding human-level learning efficiency. -#### Few-Shot Learning Requirements {#sec-ai-good-fewshot-learning-requirements-d0b3} +#### Few-Shot Learning Requirements {#sec-ai-good-fewshot-learning-requirements-2179} This challenge becomes concrete in applications like agricultural disease detection. While commercial crop monitoring systems train on millions of labeled images from controlled environments, rural deployments must identify diseases using fewer than 50 examples per disease class. This 20× reduction in training data requires fundamentally different learning approaches that leverage structural similarities across disease types and transfer knowledge from related domains. The theoretical gap becomes apparent when comparing learning curves. Traditional deep learning approaches require exponential data scaling to achieve linear improvements in accuracy, following power laws where accuracy ∝ (data_size)^α with α typically 0.1-0.3. Resource-constrained environments require learning algorithms that achieve α ≥ 0.7, approaching human-level sample efficiency where single examples can generalize to entire categories. -#### Information-Theoretic Bounds {#sec-ai-good-informationtheoretic-bounds-6342} +#### Information-Theoretic Bounds {#sec-ai-good-informationtheoretic-bounds-0108} ::: {.callout-note title="Mathematical Depth" collapse="false"} @@ -1161,17 +1161,17 @@ Bridging this gap necessitates learning approaches that exploit additional struc - **Multi-task learning**: Sharing representations across related diseases - **Active learning**: Strategically selecting informative examples for labeling -### Self-Supervised Learning Foundations {#sec-ai-good-selfsupervised-learning-foundations-27d3} +### Self-Supervised Learning Foundations {#sec-ai-good-selfsupervised-learning-foundations-3193} Building on these sample complexity challenges, resource-constrained environments often contain abundant unlabeled data despite scarce labeled examples. Rural health clinics generate thousands of diagnostic images daily, but expert annotations remain limited. Self-supervised learning provides theoretical frameworks for extracting useful representations from this unlabeled data. -#### Contrastive Learning Theory {#sec-ai-good-contrastive-learning-theory-2736} +#### Contrastive Learning Theory {#sec-ai-good-contrastive-learning-theory-a8d8} Contrastive approaches learn representations by distinguishing between similar and dissimilar examples without requiring explicit labels. From a systems engineering perspective, this impacts deployment architecture in several ways. Edge devices can collect unlabeled data continuously during normal operation, building local datasets without expensive annotation. Regional servers can then perform contrastive pretraining on aggregated unlabeled data, creating foundation models that edge devices download and fine-tune with their limited labeled examples. This architectural pattern reduces the sample complexity burden by factors of 5-15× compared to training from scratch. For a crop monitoring system, this means a deployment can achieve 87% disease detection accuracy with fewer than 50 labeled examples per disease class, provided it has access to thousands of unlabeled field images. The systems challenge becomes managing this two-stage pipeline—unsupervised pretraining at regional scale followed by supervised fine-tuning at edge scale—within bandwidth and compute constraints. -#### Mutual Information Bounds {#sec-ai-good-mutual-information-bounds-076d} +#### Mutual Information Bounds {#sec-ai-good-mutual-information-bounds-7b7c} To understand these improvements theoretically, information theory provides fundamental limits on how much unlabeled data can compensate for limited labels. The mutual information I(X;Y) between inputs X and labels Y bounds the maximum achievable performance with any learning algorithm. Self-supervised pretraining increases effective mutual information by learning representations that capture task-relevant structure in the input distribution. @@ -1180,11 +1180,11 @@ For social good applications, this suggests prioritizing domains where: - Tasks share common underlying structure (related diseases, similar environmental conditions) - Domain expertise can guide representation learning (medical knowledge, agricultural practices) -### Resource-Constrained Optimization Theory {#sec-ai-good-resourceconstrained-optimization-theory-25ab} +### Resource-Constrained Optimization Theory {#sec-ai-good-resourceconstrained-optimization-theory-774b} Moving beyond data availability to optimization challenges, traditional optimization theory assumes abundant computational resources and focuses on convergence rates to global optima. Resource-constrained environments require optimization under strict memory, compute, and energy budgets that fundamentally change theoretical analysis. -#### Communication-Constrained Learning {#sec-ai-good-communicationconstrained-learning-55bb} +#### Communication-Constrained Learning {#sec-ai-good-communicationconstrained-learning-7474} A primary constraint in these environments involves distributed learning, where communication bottlenecks dominate computational costs. Consider federated learning with n edge devices, each with local dataset Di and model parameters θi: @@ -1194,7 +1194,7 @@ A primary constraint in these environments involves distributed learning, where This inversion of traditional assumptions requires new theoretical frameworks where communication efficiency becomes the primary optimization objective. Gradient compression, sparse updates, and local model personalization emerge as theoretically motivated solutions rather than engineering optimizations. -#### Energy-Aware Learning Theory {#sec-ai-good-energyaware-learning-theory-bb14} +#### Energy-Aware Learning Theory {#sec-ai-good-energyaware-learning-theory-0ecb} Battery-powered deployments introduce energy constraints absent from traditional learning theory. Each model evaluation consumes measurable energy, creating trade-offs between accuracy and operational lifetime. Theoretical frameworks must incorporate energy budgets as first-class constraints: @@ -1206,13 +1206,13 @@ This leads to energy-aware learning algorithms that explicitly trade accuracy fo These theoretical foundations provide the scientific underpinning for the design patterns presented earlier in this chapter. The three core constraints revealed by this analysis—communication bottlenecks, sample scarcity, and energy limitations—directly motivated the architectural approaches embodied in hierarchical processing, progressive enhancement, distributed knowledge, and adaptive resource patterns. Understanding these mathematical principles enables engineers to make informed adaptations and combinations of patterns based on specific deployment contexts. -## Fallacies and Pitfalls {#sec-ai-good-fallacies-pitfalls} +## Fallacies and Pitfalls {#sec-ai-good-fallacies-pitfalls-2678} While technical constraints dominate the engineering challenges discussed throughout this chapter, sociotechnical deployment pitfalls often determine the ultimate success or failure of AI for social good initiatives. These pitfalls emerge from the intersection of technical systems and social contexts, where engineering assumptions collide with community realities, deployment environments, and organizational dynamics. Understanding these common fallacies enables development teams to anticipate and mitigate risks that traditional software engineering processes may not surface. The pitfalls presented here complement the technical constraints explored earlier by highlighting failure modes that occur even when technical implementations succeed according to engineering metrics. -### Technical Superiority Fallacy {#sec-ai-good-technical-superiority-fallacy} +### Technical Superiority Fallacy {#sec-ai-good-technical-superiority-fallacy-ea7e} The assumption that technical performance metrics directly translate to real-world impact represents perhaps the most pervasive fallacy in AI for social good deployments. Development teams often focus exclusively on optimizing accuracy, latency, and throughput while overlooking the sociotechnical factors that determine actual adoption and effectiveness. @@ -1222,7 +1222,7 @@ This fallacy manifests in several common deployment mistakes. Systems designed w The underlying error involves confusing technical optimization with outcome optimization. Technical metrics measure system behavior under controlled conditions, while social impact depends on complex interactions between technology, users, communities, and existing institutional structures. Successful deployments require explicit consideration of adoption barriers, cultural integration challenges, and alignment with community priorities from the earliest design phases. -### Infrastructure Assumption Pitfall {#sec-ai-good-infrastructure-assumption-pitfall} +### Infrastructure Assumption Pitfall {#sec-ai-good-infrastructure-assumption-pitfall-bd90} Even systems explicitly designed for resource-constrained environments often carry hidden assumptions about basic infrastructure availability that prove incorrect in real deployment contexts. These assumptions typically involve network connectivity, power reliability, device maintenance capabilities, and technical support availability. @@ -1234,7 +1234,7 @@ Maintenance and support infrastructure assumptions often prove most critical for These infrastructure pitfalls demand comprehensive deployment context analysis that extends beyond initial technical requirements to examine long-term operational realities. Successful systems often incorporate redundancy, graceful degradation, and community-based maintenance approaches that reduce dependency on external infrastructure. -### Community Engagement Oversimplification {#sec-ai-good-community-engagement-oversimplification} +### Community Engagement Oversimplification {#sec-ai-good-community-engagement-oversimplification-4c63} Technical teams frequently underestimate the complexity of community engagement, treating it as a implementation detail rather than a fundamental design constraint that shapes system architecture and deployment strategy. This oversimplification leads to systems that may function technically but fail to integrate meaningfully into community practices and decision-making processes. @@ -1246,7 +1246,7 @@ Power dynamics and consent processes often receive insufficient attention from t The scope of community engagement requirements often exceeds what technical teams anticipate. Effective engagement may require months of relationship-building, multiple community meetings, translation into local languages, adaptation to local communication norms, and ongoing consultation throughout development and deployment. These requirements have direct implications for project timelines, budgets, and technical architectures that must accommodate evolving community priorities. -### Data Colonialism and Extractive Practices {#sec-ai-good-data-colonialism-extractive} +### Data Colonialism and Extractive Practices {#sec-ai-good-data-colonialism-extractive-practices-0ab5} AI for social good initiatives can inadvertently perpetuate extractive relationships where communities provide data and labor while external organizations capture value and control system evolution. These dynamics represent serious ethical pitfalls with long-term implications for community autonomy and technology justice. @@ -1260,7 +1260,7 @@ Local economic impacts require careful consideration to avoid extractive outcome Addressing extractive potential requires intentional design for community ownership, local capacity building, and economic sustainability. Technical architectures should support local data processing, transparent algorithms, and community-controlled system evolution. Economic models should ensure value capture benefits communities directly rather than flowing primarily to external technology organizations. -### Sustainability Myopia {#sec-ai-good-sustainability-myopia} +### Sustainability Myopia {#sec-ai-good-sustainability-myopia-af98} Many AI for social good projects demonstrate sustainability myopia by focusing primarily on initial deployment success while inadequately planning for long-term viability. This short-term perspective creates systems that may achieve impressive early results but fail to establish sustainable operations, maintenance, and evolution pathways. diff --git a/quarto/contents/core/benchmarking/benchmarking.qmd b/quarto/contents/core/benchmarking/benchmarking.qmd index cbf95bc00..a4a52a918 100644 --- a/quarto/contents/core/benchmarking/benchmarking.qmd +++ b/quarto/contents/core/benchmarking/benchmarking.qmd @@ -560,12 +560,12 @@ As shown in @tbl-benchmark-comparison, different challenges emerge at different \node[below right,color=BlueLine] at (macro) {\textbf{Macro-benchmarks}}; \node[below right,color=GreenLine] at (endtoend) {\textbf{End-to-End benchmarks}}; -% Add trend line -\draw[dashed,thick,gray!60] (1,4.2) -- (4.8,0.8); +% Add trend line (passes through all three points) +\draw[dashed,thick,gray!60] (1.5,4) -- (4.5,1); % Add axis labels -\node[rotate=90,above] at (-0.3,2.5) {High}; -\node[rotate=90,below] at (-0.3,0.5) {Low}; +\node[rotate=90] at (-0.5,4.5) {High}; +\node[rotate=90] at (-0.5,0.2) {Low}; \node[below] at (0.5,-0.3) {Low}; \node[below] at (5,-0.3) {High}; @@ -905,7 +905,7 @@ The benchmark reveals inherent trade-offs between performance metrics in machine Ultimately, whether these measurements constitute a "passing" benchmark depends on the specific requirements of the intended application. The benchmark framework provides the structure and methodology for consistent evaluation, while the acceptance criteria must align with deployment constraints and performance requirements. -### Compression Benchmarks {#sec-benchmarking-ai-neural-network-compression-benchmarking-a4c0} +### Compression Benchmarks {#sec-benchmarking-ai-compression-benchmarks-42c9} Extending beyond general benchmarking principles, as machine learning models continue to grow in size and complexity, neural network compression has emerged as a critical optimization technique for deployment across resource-constrained environments. Compression benchmarking methodologies evaluate the effectiveness of techniques including pruning, quantization, knowledge distillation, and architecture optimization. These specialized benchmarks measure the fundamental trade-offs between model size reduction, accuracy preservation, and computational efficiency improvements. @@ -921,7 +921,7 @@ Finally, acceleration factor measurements for optimized models reveal the practi Efficiency-aware benchmarking addresses critical gaps in traditional evaluation frameworks. Current benchmark suites like MLPerf focus primarily on dense, unoptimized models that do not represent production deployments, where optimized models are ubiquitous. Future benchmarking frameworks should include efficiency model divisions specifically evaluating optimized architectures, reduced-precision inference, and compact models to accurately reflect real deployment practices and guide efficiency research toward practical impact. -### Mobile & Edge Benchmarks {#sec-benchmarking-ai-mobile-heterogeneous-ai-benchmarking-c015} +### Mobile & Edge Benchmarks {#sec-benchmarking-ai-mobile-edge-benchmarks-7f78} Mobile SoCs integrate heterogeneous processors (CPU, GPU, DSP, NPU) requiring specialized benchmarking that captures workload distribution complexity while accounting for thermal and battery constraints. Effective processor coordination achieves 3-5x performance improvements, but sustained workloads trigger thermal throttling—Snapdragon 8 Gen 3 drops from 35 TOPS peak to 20 TOPS sustained. Battery impact varies dramatically: computational photography consumes 2-5W while background AI requires 5-50mW for acceptable endurance. @@ -929,7 +929,7 @@ Mobile benchmarking must also evaluate 5G/WiFi edge-cloud coordination, with URL [^fn-urllc]: **URLLC**: 5G service category requiring 99.999% reliability and <1ms latency for mission-critical applications. -## Training vs. Inference Framework {#sec-benchmarking-ai-training-vs-inference-comparative-framework-4795} +## Training vs. Inference Framework {#sec-benchmarking-ai-training-vs-inference-framework-611c} The benchmark components and granularity levels apply differently to ML systems' two fundamental operational phases: training and inference. While both phases process data through neural networks, their contrasting objectives create distinct benchmarking requirements. The training methodologies from @sec-ai-training focus on iterative optimization over large datasets, while deployment strategies from @sec-ml-operations prioritize consistent, low-latency serving. These differences cascade through metric selection, resource allocation, and scaling behavior. @@ -969,7 +969,7 @@ Furthermore, in many cases, using a single hardware accelerator, such as a singl To illustrate these benchmarking principles, we will reference [MLPerf Training](https://mlcommons.org/benchmarks/training/) throughout this section. MLPerf, introduced earlier in @sec-benchmarking-ai-historical-context-1c54, provides the standardized framework we reference throughout this analysis of training benchmarks. -### Training Benchmark Motivation {#sec-benchmarking-ai-motivation-53a4} +### Training Benchmark Motivation {#sec-benchmarking-ai-training-benchmark-motivation-1224} From a systems perspective, training machine learning models represents a computationally intensive process that requires careful optimization of resources. Training benchmarks serve as essential tools for evaluating system efficiency, identifying bottlenecks, and ensuring that machine learning systems can scale effectively. They provide a standardized approach to measuring how various system components, including hardware accelerators, memory, storage, and network infrastructure, affect training performance. @@ -1165,7 +1165,7 @@ Y,Date **MLPerf Training Progress**: Standardized benchmarks reveal that machine learning training performance consistently surpasses moore's law, indicating substantial gains from systems-level optimizations. These trends emphasize how focused measurement and iterative improvement drive rapid advancements in ML training efficiency and scalability. Source: [@tschand2024mlperf]. ::: -#### Importance of Training Benchmarks {#sec-benchmarking-ai-importance-training-benchmarks-f74a} +#### Importance of Training Benchmarks {#sec-benchmarking-ai-importance-training-benchmarks-5d95} As machine learning models grow in complexity, training becomes increasingly demanding in terms of compute power, memory, and data storage. The ability to measure and compare training efficiency is critical to ensuring that systems can effectively handle large-scale workloads. Training benchmarks provide a structured methodology for assessing performance across different hardware platforms, software frameworks, and optimization techniques. @@ -1173,7 +1173,7 @@ One of the fundamental challenges in training machine learning models is the eff Training benchmarks help uncover such inefficiencies by measuring key performance indicators, including system throughput, time-to-accuracy, and hardware utilization. Recall from @sec-ai-acceleration that GPUs achieve approximately 15,700 GFLOPS for mixed-precision operations while TPUs deliver 275,000 INT8 operations per second for specialized tensor workloads. Training benchmarks enable us to measure whether these theoretical hardware capabilities translate to actual training speedups under realistic conditions. These benchmarks allow practitioners to analyze whether accelerators are being leveraged effectively or whether specific bottlenecks, such as memory bandwidth constraints from @sec-ai-acceleration or inefficient data pipelines from @sec-data-engineering, are reducing overall system performance. For example, a system using TF32 precision1 may achieve higher throughput than one using FP32, but if TF32 introduces numerical instability that increases the number of iterations required to reach the target accuracy, the overall training time may be longer. By providing insights into these factors, benchmarks support the design of more efficient training workflows that maximize hardware potential while minimizing unnecessary computation. -#### Hardware & Software Optimization {#sec-benchmarking-ai-hardware-software-optimization-750c} +#### Hardware & Software Optimization {#sec-benchmarking-ai-hardware-software-optimization-4f19} The performance of machine learning training is heavily influenced by the choice of hardware and software. Training benchmarks guide system designers in selecting optimal configurations by measuring how different architectures, including GPUs, TPUs, and emerging AI accelerators, handle computational workloads. These benchmarks also evaluate how well deep learning frameworks, such as TensorFlow and PyTorch, optimize performance across different hardware setups. @@ -1183,7 +1183,7 @@ Beyond hardware selection, training benchmarks also inform software optimization [^fn-bench-mixed-precision]: **Mixed-Precision Training**: A training technique that uses both 16-bit (FP16) and 32-bit (FP32) floating-point representations to accelerate training while maintaining model accuracy. Introduced by NVIDIA in 2017, mixed precision can achieve 1.5-2x speedups on modern GPUs with Tensor Cores while reducing memory usage by ~40%, enabling larger batch sizes and faster convergence for large models. -#### Scalability & Efficiency {#sec-benchmarking-ai-scalability-efficiency-05f3} +#### Scalability & Efficiency {#sec-benchmarking-ai-scalability-efficiency-18ff} As machine learning workloads continue to grow, efficient scaling across distributed computing environments has become a key concern. Many modern deep learning models are trained across multiple GPUs or TPUs, requiring efficient parallelization strategies to ensure that additional computing resources lead to meaningful performance improvements. Training benchmarks measure how well a system scales by evaluating system throughput, memory efficiency, and overall training time as additional computational resources are introduced. @@ -1191,7 +1191,7 @@ Effective scaling is not always guaranteed. While adding more GPUs or TPUs shoul Another crucial factor in training efficiency is time-to-accuracy, which measures how quickly a model reaches a target accuracy level. This metric bridges the algorithmic and system dimensions of our framework, connecting model convergence characteristics with computational efficiency. By leveraging training benchmarks, system designers can assess whether their infrastructure is capable of handling large-scale workloads efficiently while maintaining training stability and accuracy. -#### Cost & Energy Factors {#sec-benchmarking-ai-cost-energy-factors-1d9e} +#### Cost & Energy Factors {#sec-benchmarking-ai-cost-energy-factors-8e47} The computational cost of training large-scale models has risen sharply in recent years, making cost-efficiency a critical consideration. Training a model such as GPT-3 can require millions of dollars in cloud computing resources, making it imperative to evaluate cost-effectiveness across different hardware and software configurations. Training benchmarks provide a means to quantify the cost per training run by analyzing computational expenses, cloud pricing models, and energy consumption. @@ -1199,7 +1199,7 @@ Beyond financial cost, energy efficiency has become an increasingly important me For example, MLPerf includes an energy benchmarking component that tracks the power consumption of various hardware accelerators during training. This allows researchers to compare different computing platforms not only in terms of raw performance but also in terms of their environmental impact. By integrating energy efficiency metrics into benchmarking studies, organizations can design AI systems that balance computational power with sustainability goals. -#### Fair ML Systems Comparison {#sec-benchmarking-ai-fair-ml-systems-comparison-562e} +#### Fair ML Systems Comparison {#sec-benchmarking-ai-fair-ml-systems-comparison-cd73} One of the primary functions of training benchmarks is to establish a standardized framework for comparing ML systems. Given the wide variety of hardware architectures, deep learning frameworks, and optimization techniques available today, ensuring fair and reproducible comparisons is essential. @@ -1207,13 +1207,13 @@ Standardized benchmarks provide a common evaluation methodology, allowing resear This standardized approach addresses reproducibility concerns in machine learning research by providing clearly defined evaluation methodologies. Results can be consistently reproduced across different computing environments, enabling researchers to make informed decisions when selecting hardware, software, and training methodologies while driving systematic progress in AI systems development. -### Training Metrics {#sec-benchmarking-ai-metrics-37b0} +### Training Metrics {#sec-benchmarking-ai-training-metrics-dc97} Evaluating the performance of machine learning training requires a set of well-defined metrics that go beyond conventional algorithmic measures. From a systems perspective, training benchmarks assess how efficiently and effectively a machine learning model can be trained to a predefined accuracy threshold. Metrics such as throughput, scalability, and energy efficiency are only meaningful in relation to whether the model successfully reaches its target accuracy. Without this constraint, optimizing for raw speed or resource utilization may lead to misleading conclusions. Training benchmarks, such as MLPerf Training, define specific accuracy targets for different machine learning tasks, ensuring that performance measurements are made in a fair and reproducible manner. A system that trains a model quickly but fails to reach the required accuracy is not considered a valid benchmark result. Conversely, a system that achieves the best possible accuracy but takes an excessive amount of time or resources may not be practically useful. Effective benchmarking requires balancing speed, efficiency, and accuracy convergence. -#### Time and Throughput {#sec-benchmarking-ai-time-throughput-4270} +#### Time and Throughput {#sec-benchmarking-ai-time-throughput-cc05} One of the fundamental metrics for evaluating training efficiency is the time required to reach a predefined accuracy threshold. Training time ($T_{\text{train}}$) measures how long a model takes to converge to an acceptable performance level, reflecting the overall computational efficiency of the system. It is formally defined as: $$ @@ -1230,7 +1230,7 @@ where $N_{\text{samples}}$ is the total number of training samples processed. Ho For example, in MLPerf Training, the benchmark for ResNet-50 may require reaching an accuracy target like 75.9% top-1 on the ImageNet dataset. A system that processes 10,000 images per second but fails to achieve this accuracy is not considered a valid benchmark result, while a system that processes fewer images per second but converges efficiently is preferable. This highlights why throughput must always be evaluated in relation to time-to-accuracy rather than as an independent performance measure. -#### Scalability & Parallelism {#sec-benchmarking-ai-scalability-parallelism-ea91} +#### Scalability & Parallelism {#sec-benchmarking-ai-scalability-parallelism-cbc4} As machine learning models increase in size, training workloads often require distributed computing across multiple processors or accelerators. Scalability measures how effectively training performance improves as more computational resources are added. An ideal system should exhibit near-linear scaling, where doubling the number of GPUs or TPUs leads to a proportional reduction in training time. However, real-world performance is often constrained by factors such as communication overhead, memory bandwidth limitations, and inefficiencies in parallelization strategies. @@ -1242,7 +1242,7 @@ Parallelism in training is categorized into data parallelism[^fn-data-parallel], [^fn-model-parallel]: **Model Parallelism**: A distributed training approach where different parts of the neural network are placed on different GPUs, essential for models too large to fit in a single GPU's memory. GPT-3's 175B parameters required model parallelism across multiple nodes, as even high-memory GPUs can only hold ~40B parameters in mixed precision. -#### Resource Utilization {#sec-benchmarking-ai-resource-utilization-3c96} +#### Resource Utilization {#sec-benchmarking-ai-resource-utilization-20c7} The efficiency of machine learning training depends not only on speed and scalability but also on how well available hardware resources are utilized. Compute utilization measures the extent to which processing units, such as GPUs or TPUs, are actively engaged during training. Low utilization may indicate bottlenecks in data movement, memory access, or inefficient workload scheduling. @@ -1252,7 +1252,7 @@ Memory bandwidth is another critical factor, as deep learning models require fre I/O performance also plays a significant role in training efficiency, particularly when working with large datasets that cannot fit entirely in memory. Benchmarks evaluate the efficiency of data loading pipelines, including preprocessing operations, caching mechanisms, and storage retrieval speeds. Systems that fail to optimize data loading can experience significant slowdowns, regardless of computational power. -#### Energy Efficiency & Cost {#sec-benchmarking-ai-energy-efficiency-cost-2952} +#### Energy Efficiency & Cost {#sec-benchmarking-ai-energy-efficiency-cost-c03c} Training large-scale machine learning models requires substantial computational resources, leading to significant energy consumption and financial costs. Energy efficiency metrics quantify the power usage of training workloads, helping identify systems that optimize computational efficiency while minimizing energy waste. The increasing focus on sustainability has led to the inclusion of energy-based benchmarks, such as those in MLPerf Training, which measure power consumption per training run. @@ -1260,13 +1260,13 @@ Training GPT-3 was estimated to consume 1,287 MWh of electricity [@strubell2019e Cost considerations extend beyond electricity usage to include hardware expenses, cloud computing costs, and infrastructure maintenance. Training benchmarks provide insights into the cost-effectiveness of different hardware and software configurations by measuring training time in relation to resource expenditure. Organizations can use these benchmarks to balance performance and budget constraints when selecting training infrastructure. -#### Fault Tolerance & Robustness {#sec-benchmarking-ai-fault-tolerance-robustness-fe50} +#### Fault Tolerance & Robustness {#sec-benchmarking-ai-fault-tolerance-robustness-0cf1} Training workloads often run for extended periods, sometimes spanning days or weeks, making fault tolerance an essential consideration. A robust system must be capable of handling unexpected failures, including hardware malfunctions, network disruptions, and memory errors, without compromising accuracy convergence. In large-scale cloud-based training, node failures are common due to hardware instability. If a GPU node in a distributed cluster fails, training must continue without corrupting the model. MLPerf Training includes evaluations of fault-tolerant training strategies, such as checkpointing, where models periodically save their progress. This ensures that failures do not require restarting the entire training process. -#### Reproducibility & Standardization {#sec-benchmarking-ai-reproducibility-standardization-50cc} +#### Reproducibility & Standardization {#sec-benchmarking-ai-reproducibility-standardization-cbd1} For benchmarks to be meaningful, results must be reproducible across different runs, hardware platforms, and software frameworks. Variability in training results can arise due to stochastic processes, hardware differences, and software optimizations. Ensuring reproducibility requires standardizing evaluation protocols, controlling for randomness in model initialization, and enforcing consistency in dataset processing. @@ -1431,7 +1431,7 @@ ggplot(power_data, aes(x = SystemType)) + As with training, we will reference MLPerf Inference throughout this section to illustrate benchmarking principles. MLPerf's inference benchmarks, building on the foundation established in @sec-benchmarking-ai-historical-context-1c54, provide standardized evaluation across deployment scenarios from cloud to edge devices. -### Inference Benchmark Motivation {#sec-benchmarking-ai-motivation-2afc} +### Inference Benchmark Motivation {#sec-benchmarking-ai-inference-benchmark-motivation-9d45} Deploying machine learning models for inference introduces a unique set of challenges distinct from training. While training optimizes large-scale computation over extensive datasets, inference must deliver predictions efficiently and at scale in real-world environments. Inference benchmarks evaluate deployment-specific performance challenges, identifying bottlenecks that emerge when models transition from development to production serving. @@ -1439,7 +1439,7 @@ Unlike training, which typically runs on dedicated high-performance hardware, in Inference benchmarks help answer fundamental questions about model deployment. How quickly can a model generate predictions in real-world conditions? What are the trade-offs between inference speed and accuracy? Can an inference system handle increasing demand while maintaining low latency? By evaluating these factors, benchmarks guide optimizations in both hardware and software to improve overall efficiency [@reddi2020mlperf]. -#### Importance of Inference Benchmarks {#sec-benchmarking-ai-importance-inference-benchmarks-43cd} +#### Importance of Inference Benchmarks {#sec-benchmarking-ai-importance-inference-benchmarks-2774} Inference plays a critical role in AI applications, where performance directly affects usability and cost. Unlike training, which is often performed offline, inference typically operates in real-time or near real-time, making latency a primary concern. A self-driving car processing camera feeds must react within milliseconds, while a voice assistant generating responses should feel instantaneous to users. @@ -1447,7 +1447,7 @@ Different applications impose varying constraints on inference. Some workloads r A key difference between training and inference is that inference workloads often run continuously in production, meaning that small inefficiencies can compound over time. Unlike a training job that runs once and completes, an inference system deployed in the cloud may serve millions of queries daily, and a model running on a smartphone must manage battery consumption over extended use. Benchmarks provide a structured way to measure inference efficiency under these real-world constraints, helping developers make informed choices about model optimization, hardware selection, and deployment strategies. -#### Hardware & Software Optimization {#sec-benchmarking-ai-hardware-software-optimization-8c4d} +#### Hardware & Software Optimization {#sec-benchmarking-ai-hardware-software-optimization-6728} Efficient inference depends on both hardware acceleration and software optimizations. While GPUs and TPUs dominate training, inference is more diverse in its hardware needs. A cloud-based AI service might leverage powerful accelerators for large-scale workloads, whereas mobile devices rely on specialized inference chips like NPUs or optimized CPU execution. On embedded systems, where resources are constrained, achieving high performance requires careful memory and compute efficiency. Benchmarks help evaluate how well different hardware platforms handle inference workloads, guiding deployment decisions. @@ -1461,7 +1461,7 @@ Software optimizations are just as important. Frameworks like TensorRT[^fn-tenso [^fn-operator-fusion]: **Operator Fusion**: A compiler optimization technique that combines multiple neural network operations into single kernels to reduce memory bandwidth requirements and improve cache efficiency. For example, fusing convolution with batch normalization and ReLU can eliminate intermediate memory writes, achieving 20-40% speedups in inference workloads. -#### Scalability & Efficiency {#sec-benchmarking-ai-scalability-efficiency-7f21} +#### Scalability & Efficiency {#sec-benchmarking-ai-scalability-efficiency-ddbb} Inference workloads vary significantly in their scaling requirements. A cloud-based AI system handling millions of queries per second must ensure that increasing demand does not cause delays, while a mobile application running a model locally must execute quickly even under power constraints. Unlike training, which is typically performed on a fixed set of high-performance machines, inference must scale dynamically based on usage patterns and available computational resources. @@ -1469,23 +1469,23 @@ Benchmarks evaluate how inference systems scale under different conditions. They Another key factor in inference efficiency is cold-start performance, the time it takes for a model to load and begin processing queries. This is especially relevant for applications that do not run inference continuously but instead load models on demand. Benchmarks help determine whether a system can quickly transition from idle to active execution without significant overhead. -#### Cost & Energy Factors {#sec-benchmarking-ai-cost-energy-factors-24bc} +#### Cost & Energy Factors {#sec-benchmarking-ai-cost-energy-factors-b86f} Because inference workloads run continuously, operational cost and energy efficiency are critical factors. Unlike training, where compute costs are incurred once, inference costs accumulate over time as models are deployed in production. Running an inefficient model at scale can significantly increase cloud compute expenses, while an inefficient mobile inference system can drain battery life quickly. Benchmarks provide insights into cost per inference request, helping organizations optimize for both performance and affordability. Energy efficiency is also a growing concern, particularly for mobile and edge AI applications. Many inference workloads run on battery-powered devices, where excessive computation can impact usability. A model running on a smartphone, for example, must be optimized to minimize power consumption while maintaining responsiveness. Benchmarks help evaluate inference efficiency per watt, ensuring that models can operate sustainably across different platforms. -#### Fair ML Systems Comparison {#sec-benchmarking-ai-fair-ml-systems-comparison-3666} +#### Fair ML Systems Comparison {#sec-benchmarking-ai-fair-ml-systems-comparison-bdf8} Applying the standardized evaluation principles established for training benchmarks, inference evaluation requires the same rigorous comparison methodologies. MLPerf Inference extends these principles to deployment scenarios, defining evaluation criteria for tasks such as image classification, object detection, and speech recognition across different hardware platforms and optimization techniques. This ensures that inference performance comparisons remain meaningful and reproducible while accounting for deployment-specific constraints like latency requirements and energy efficiency. -### Inference Metrics {#sec-benchmarking-ai-metrics-4f14} +### Inference Metrics {#sec-benchmarking-ai-inference-metrics-34bd} Evaluating the performance of inference systems requires a distinct set of metrics from those used for training. While training benchmarks emphasize throughput, scalability, and time-to-accuracy, inference benchmarks must focus on latency, efficiency, and resource utilization in practical deployment settings. These metrics ensure that machine learning models perform well across different environments, from cloud data centers handling millions of requests to mobile and edge devices operating under strict power and memory constraints. Unlike training benchmarks that emphasize throughput and time-to-accuracy as established earlier, inference benchmarks evaluate how efficiently a trained model can process inputs and generate predictions at scale. The following sections describe the most important inference benchmarking metrics, explaining their relevance and how they are used to compare different systems. -#### Latency & Tail Latency {#sec-benchmarking-ai-latency-tail-latency-167f} +#### Latency & Tail Latency {#sec-benchmarking-ai-latency-tail-latency-d5dc} Latency is one of the most critical performance metrics for inference, particularly in real-time applications where delays can negatively impact user experience or system safety. Latency refers to the time taken for an inference system to process an input and produce a prediction. While the average latency of a system is useful, it does not capture performance in high-demand scenarios where occasional delays can degrade reliability. @@ -1495,7 +1495,7 @@ Tail latency's connection to user experience at scale becomes critical in produc Service level objectives (SLOs) in production systems therefore focus on tail latency rather than mean latency to ensure consistent user experience. Typical production SLOs specify P95 < 100ms and P99 < 500ms for interactive services, recognizing that occasional slow responses have disproportionate impact on user satisfaction. Large-scale systems like Netflix and Uber optimize for P99.9 latency to handle traffic spikes and infrastructure variations that affect service reliability. -#### Throughput & Batch Efficiency {#sec-benchmarking-ai-throughput-batch-processing-efficiency-abd3} +#### Throughput & Batch Efficiency {#sec-benchmarking-ai-throughput-batch-efficiency-91d2} While latency measures the speed of individual inference requests, throughput measures how many inference requests a system can process per second. It is typically expressed in queries per second (QPS) or frames per second (FPS) for vision tasks. Some inference systems operate on a single-instance basis, where each input is processed independently as soon as it arrives. Other systems process multiple inputs in parallel using batch inference, which can significantly improve efficiency by leveraging hardware optimizations. @@ -1503,7 +1503,7 @@ For example, cloud-based services handling millions of queries per second benefi Benchmarks must consider both single-instance and batch throughput to provide a comprehensive understanding of inference performance across different deployment scenarios. -#### Precision & Accuracy Trade-offs {#sec-benchmarking-ai-precision-accuracy-tradeoffs-a115} +#### Precision & Accuracy Trade-offs {#sec-benchmarking-ai-precision-accuracy-tradeoffs-828e} Optimizing inference performance often involves reducing numerical precision, which can significantly accelerate computation while reducing memory and energy consumption. However, lower-precision calculations can introduce accuracy degradation, making it essential to benchmark the trade-offs between speed and predictive quality. @@ -1517,13 +1517,13 @@ Inference benchmarks evaluate how well models perform under different numerical [^fn-model-compression]: **Model Compression**: Techniques to reduce model size and computational requirements including precision reduction (reducing numerical precision), structural optimization (removing unnecessary parameters), knowledge transfer (training smaller models to mimic larger ones), and tensor decomposition. These methods can achieve 10-100x size reduction while maintaining 90-99% of original accuracy. -#### Memory Footprint & Model Size {#sec-benchmarking-ai-memory-footprint-model-size-6520} +#### Memory Footprint & Model Size {#sec-benchmarking-ai-memory-footprint-model-size-8176} Beyond computational optimizations, memory footprint is another critical consideration for inference systems, particularly for devices with limited resources. Efficient inference depends not only on speed but also on memory usage. Unlike training, where large models can be distributed across powerful GPUs or TPUs, inference often requires models to run within strict memory budgets. The total model size determines how much storage is required for deployment, while RAM usage reflects the working memory needed during execution. Some models require large memory bandwidth to efficiently transfer data between processing units, which can become a bottleneck if the hardware lacks sufficient capacity. Inference benchmarks evaluate these factors to ensure that models can be deployed effectively across a range of devices. A model that achieves high accuracy but exceeds memory constraints may be impractical for real-world use. To address this, various compression techniques are often applied to reduce model size while maintaining accuracy. Benchmarks help assess whether these optimizations strike the right balance between memory efficiency and predictive performance. -#### Cold-Start & Model Load Time {#sec-benchmarking-ai-coldstart-model-load-time-827f} +#### Cold-Start & Model Load Time {#sec-benchmarking-ai-coldstart-model-load-time-ec33} Once memory requirements are optimized, cold-start performance becomes critical for ensuring inference systems are ready to respond quickly upon deployment. In many deployment scenarios, models are not always kept in memory but instead loaded on demand when needed. This can introduce significant delays, particularly in serverless AI environments[^fn-serverless-ai], where resources are allocated dynamically based on incoming requests. Cold-start performance measures how quickly a system can transition from idle to active execution, ensuring that inference is available without excessive wait times. @@ -1531,7 +1531,7 @@ Once memory requirements are optimized, cold-start performance becomes critical Model load time refers to the duration required to load a trained model into memory before it can process inputs. In some cases, particularly on resource-limited devices, models must be reloaded frequently to free up memory for other applications. The time taken for the first inference request is also an important consideration, as it reflects the total delay users experience when interacting with an AI-powered service. Benchmarks help quantify these delays, ensuring that inference systems can meet real-world responsiveness requirements. -#### Dynamic Workload Scaling {#sec-benchmarking-ai-scalability-dynamic-workload-handling-5c06} +#### Dynamic Workload Scaling {#sec-benchmarking-ai-dynamic-workload-scaling-53c9} While cold-start latency addresses initial responsiveness, scalability ensures that inference systems can handle fluctuating workloads and concurrent demands over time Inference workloads must scale effectively across different usage patterns. In cloud-based AI services, this means efficiently handling millions of concurrent users, while on mobile or embedded devices, it involves managing multiple AI models running simultaneously without overloading the system. @@ -1539,7 +1539,7 @@ Scalability measures how well inference performance improves when additional com For cloud-based AI, benchmarks evaluate how efficiently a system handles fluctuating demand, ensuring that inference servers can dynamically allocate resources without compromising latency. In mobile and embedded AI, efficient multi-model execution is essential for running multiple AI-powered features simultaneously without degrading system performance. -#### Energy Consumption & Efficiency {#sec-benchmarking-ai-power-consumption-energy-efficiency-c819} +#### Energy Consumption & Efficiency {#sec-benchmarking-ai-energy-consumption-efficiency-ad66} Since inference workloads run continuously in production, power consumption and energy efficiency are critical considerations. This is particularly important for mobile and edge devices, where battery life and thermal constraints limit available computational resources. Even in large-scale cloud environments, power efficiency directly impacts operational costs and sustainability goals. @@ -1581,7 +1581,7 @@ The deployment environment also plays a significant role in determining evaluati Ultimately, evaluating inference performance requires a holistic approach. Focusing on a single metric, such as latency or energy efficiency, provides an incomplete picture. Instead, all relevant dimensions must be considered together to ensure that the system meets its functional, resource, and performance goals in a balanced way. -#### Context-Dependent Metrics {#sec-benchmarking-ai-metric-prioritization-deployment-context-321e} +#### Context-Dependent Metrics {#sec-benchmarking-ai-contextdependent-metrics-620b} Different deployment scenarios require fundamentally different metric priorities, as the operational constraints and success criteria vary dramatically across contexts. Understanding these priorities enables engineers to focus benchmarking efforts effectively and interpret results within appropriate decision frameworks. @tbl-metric-priorities illustrates how performance priorities shift across five major deployment contexts, revealing the systematic relationship between operational constraints and optimization targets. @@ -1649,7 +1649,7 @@ Inference performance does not always scale proportionally with additional resou Generic benchmarking results may fail to account for the specific needs of an application. For instance, a benchmark optimized for cloud inference might be irrelevant for edge devices, where energy and memory constraints dominate. Tailoring benchmarks to the deployment context ensures that results are meaningful and actionable. -##### Statistical Significance & Noise {#sec-benchmarking-ai-statistical-significance-measurement-noise-dfff} +##### Statistical Significance & Noise {#sec-benchmarking-ai-statistical-significance-noise-5c2a} Distinguishing meaningful performance improvements from measurement noise requires proper statistical analysis. Following the evaluation methodology principles established earlier, MLPerf addresses measurement variability by requiring multiple benchmark runs and reporting percentile-based metrics rather than single measurements [@reddi2020mlperf]. For instance, MLPerf Inference reports 99th percentile latency alongside mean performance, capturing both typical behavior and worst-case scenarios that single-run measurements might miss. This approach recognizes that system performance naturally varies due to factors like thermal throttling, memory allocation patterns, and background processes. @@ -1892,7 +1892,7 @@ Shared infrastructure complexity is further compounded by dynamic power manageme Support infrastructure, particularly cooling systems, is a major component of total energy consumption in large-scale deployments. Data centers must maintain operational temperatures, typically between 20-25°C, to ensure system reliability. Cooling overhead is captured in the Power Usage Effectiveness (PUE) metric, which ranges from 1.1 in highly efficient facilities to over 2.0 in less optimized ones [@barroso2019datacenter]. The interaction between compute workloads and cooling infrastructure creates complex dependencies; for example, power management techniques like DVFS not only reduce direct processor power consumption but also decrease heat generation, creating cascading effects on cooling requirements. Even edge devices require basic thermal management. -### Performance vs. Energy Trade-offs {#sec-benchmarking-ai-performance-vs-energy-efficiency-b9ac} +### Performance vs. Energy Trade-offs {#sec-benchmarking-ai-performance-vs-energy-tradeoffs-4e34} The relationship between computational performance and energy efficiency is one of the most important tradeoffs in modern ML system design. As systems push for higher performance, they often encounter diminishing returns in energy efficiency due to fundamental physical limitations in semiconductor scaling and power delivery [@koomey2011web]. This relationship is particularly evident in processor frequency scaling, where increasing clock frequency by 20% typically yields only modest performance improvements (around 5%) while dramatically increasing power consumption by up to 50%, reflecting the cubic relationship between voltage, frequency, and power consumption [@le2010dynamic]. @@ -2311,7 +2311,7 @@ Analysis of these trends reveals two significant patterns: first, a plateauing o Effective benchmarking requires understanding its inherent limitations and implementing practices that mitigate these constraints. Rather than avoiding benchmarks due to their limitations, successful practitioners recognize these challenges and adapt their methodology accordingly. The following analysis examines four interconnected categories of benchmarking challenges while providing actionable guidance for addressing each limitation through improved design and interpretation practices. -### Statistical & Methodological Issues {#sec-benchmarking-ai-statistical-methodological-challenges-2c84} +### Statistical & Methodological Issues {#sec-benchmarking-ai-statistical-methodological-issues-56f4} The foundation of reliable benchmarking rests on sound statistical methodology. Three fundamental issues undermine this foundation if left unaddressed. @@ -3149,7 +3149,7 @@ Teams often continue using established benchmarks long after they cease to repre Many teams use academic benchmarks designed for research comparisons to evaluate production systems, overlooking fundamental differences between research and operational environments. Research benchmarks typically assume unlimited computational resources, optimal data quality, and idealized deployment conditions that rarely exist in production settings. Production systems must handle concurrent user loads, varying input quality, network latency, memory constraints, and system failures that significantly impact performance compared to controlled benchmark conditions. Additionally, production systems require optimization for multiple objectives simultaneously including cost efficiency, availability, and user experience that single-metric research benchmarks cannot capture. Effective production evaluation requires augmenting research benchmarks with operational metrics like sustained throughput under load, recovery time from failures, resource utilization efficiency, and end-to-end latency including data preprocessing and postprocessing overhead. -## Production Monitoring & Benchmarking {#sec-benchmarking-ai-production-benchmarking-continuous-monitoring-2d86} +## Production Monitoring & Benchmarking {#sec-benchmarking-ai-production-monitoring-benchmarking-64ed} The benchmarking methodologies discussed thus far—from micro to end-to-end granularity, from training to inference evaluation—primarily address system performance under controlled conditions. However, the deployment strategies introduced in @sec-ml-operations reveal that production environments introduce fundamentally different challenges requiring specialized evaluation approaches. Production machine learning systems must handle dynamic workloads, varying data quality, infrastructure failures, and concurrent user demands while maintaining consistent performance and reliability. This necessitates extending our benchmarking framework beyond single-point performance measurement to evaluate system behavior over time, under stress, and during failure scenarios. diff --git a/quarto/contents/core/conclusion/conclusion.qmd b/quarto/contents/core/conclusion/conclusion.qmd index 0910388bb..8a8d2af91 100644 --- a/quarto/contents/core/conclusion/conclusion.qmd +++ b/quarto/contents/core/conclusion/conclusion.qmd @@ -98,11 +98,11 @@ From @sec-sustainable-ai sustainability concerns to operational expenses, every Efficient AI systems require algorithm hardware co-optimization, not just individual component excellence. This comprehensive approach encompasses three critical dimensions: algorithm hardware matching ensures computational patterns align with target hardware capabilities (systolic arrays favor dense matrix operations while sparse accelerators require structured pruning patterns), memory hierarchy optimization provides frameworks for analyzing data movement costs and optimizing for cache locality, and energy efficiency modeling incorporates TOPS/W metrics to guide power-conscious design decisions essential for mobile and edge deployment. -## Principles Across the ML Systems Stack {#sec-conclusion-principles-manifest-across-ml-systems-stack-6b9c} +## Principles Across the ML Systems Stack {#sec-conclusion-principles-across-ml-systems-stack-59b8} Having established these six foundational principles, we turn to their practical application across the ML systems landscape. These principles are not abstract ideals but concrete guides that shaped every technical decision explored throughout our journey. Their manifestation varies by context yet remains consistent in purpose. We now examine how they operate across the three critical domains that structure ML systems engineering: first, building robust technical foundations where measurement and co-design establish the groundwork; second, engineering for performance at scale where optimization and planning enable growth; and finally, navigating production realities where all principles converge under operational constraints. -### Building Technical Foundations {#sec-conclusion-building-technical-foundations-808a} +### Building Technical Foundations {#sec-conclusion-building-technical-foundations-4c71} Machine learning systems engineering rests on solid technical foundations where multiple principles converge. @@ -176,15 +176,15 @@ As ML systems move beyond research labs, three deployment paradigms test differe These deployment contexts validate our core thesis: success depends on applying the six systems engineering principles systematically rather than pursuing isolated optimizations. -### Building Robust AI Systems {#sec-conclusion-building-robust-ai-systems-principle-4-plan-failure-1029} +### Building Robust AI Systems {#sec-conclusion-building-robust-ai-systems-827c} @sec-robust-ai demonstrates that robustness requires designing for failure from the ground up—Principle 4's core mandate. ML systems face unique failure modes: distribution shifts degrade accuracy, adversarial inputs exploit vulnerabilities, and edge cases reveal training data limitations. Resilient systems combine redundant hardware for fault tolerance (@sec-robust-ai), ensemble methods to reduce single-point failures (@sec-robust-ai), and uncertainty quantification to enable graceful degradation (@sec-robust-ai). As AI systems take on increasingly autonomous roles, planning for failure becomes the difference between safe deployment and catastrophic failure. -### AI for Societal Benefit {#sec-conclusion-realizing-ai-societal-benefit-principles-converge-147b} +### AI for Societal Benefit {#sec-conclusion-ai-societal-benefit-daba} @sec-ai-good demonstrates AI's transformative potential across healthcare, climate science, education, and accessibility—domains where all six principles converge. Climate modeling requires efficient inference (Principle 3: Optimize Bottleneck). Medical AI demands explainable decisions and continuous monitoring (Principle 1: Measure). Educational technology needs privacy-preserving personalization at global scale (Principles 2 & 4: Design for Scale, Plan for Failure). These applications validate that technical excellence alone proves insufficient—success requires interdisciplinary collaboration among technologists, domain experts, policymakers, and affected communities. -### The Path to AGI {#sec-conclusion-path-todays-systems-agi-5e0e} +### The Path to AGI {#sec-conclusion-path-agi-1c6d} The compound AI systems framework explored in @sec-agi-systems provides the architectural blueprint for advanced intelligence: modular components that can be updated independently, specialized models optimized for specific tasks, and decomposable architectures that enable interpretability and safety through multiple validation layers. The engineering challenges ahead require mastery across the full stack we have explored—from data engineering (@sec-data-engineering) and distributed training (@sec-ai-training) to model optimization (@sec-model-optimizations) and operational infrastructure (@sec-ml-operations). These systems engineering principles, not awaiting algorithmic breakthroughs, define the path toward artificial general intelligence. diff --git a/quarto/contents/core/efficient_ai/efficient_ai.qmd b/quarto/contents/core/efficient_ai/efficient_ai.qmd index 22047f61f..3417da543 100644 --- a/quarto/contents/core/efficient_ai/efficient_ai.qmd +++ b/quarto/contents/core/efficient_ai/efficient_ai.qmd @@ -112,7 +112,7 @@ With this understanding of efficiency dimension interactions, we can examine why Machine learning systems have followed a consistent pattern: increasing model scale through parameters, training data, and computational resources typically improves performance. This empirical observation has driven progress across natural language processing, computer vision, and speech recognition, where larger models trained on extensive datasets consistently achieve state-of-the-art results. -These scaling laws can be seen as the quantitative expression of Richard Sutton's "Bitter Lesson" from @sec-introduction-bitter-lesson-ml-systems-engineering-matters-b764: performance in machine learning is primarily driven by leveraging general methods at massive scale. The predictable power-law relationships show *how* computation, when scaled, yields better models. +These scaling laws can be seen as the quantitative expression of Richard Sutton's "Bitter Lesson" from @sec-introduction-bitter-lesson-systems-engineering-matters-dede: performance in machine learning is primarily driven by leveraging general methods at massive scale. The predictable power-law relationships show *how* computation, when scaled, yields better models. However, this scaling trajectory raises critical questions about efficiency and sustainability. As computational demands grow exponentially and data requirements increase, fundamental questions emerge regarding the point at which scaling costs outweigh performance benefits. To address these concerns systematically, researchers have developed scaling laws[^fn-scaling-laws]—empirical relationships that quantify how model performance relates to training resources, revealing why efficiency becomes increasingly important as systems expand in complexity. @@ -120,7 +120,7 @@ However, this scaling trajectory raises critical questions about efficiency and This section introduces scaling laws, examines their manifestation across different dimensions, and analyzes their implications for system design, establishing why the multi-dimensional efficiency optimization framework introduced earlier constitutes a fundamental requirement rather than an optional consideration. -### Scaling Laws in Practice {#sec-efficient-ai-scaling-laws-practice-48a6} +### Scaling Laws in Practice {#sec-efficient-ai-scaling-laws-practice-b7c2} The rapid evolution in AI capabilities over the past decade exemplifies this scaling trajectory. GPT-1 (2018) contained 117 million parameters and demonstrated basic sentence completion capabilities. GPT-2 (2019) scaled to 1.5 billion parameters and achieved coherent paragraph generation. GPT-3 (2020) expanded to 175 billion parameters and demonstrated sophisticated text generation across diverse domains. Each increase in model size brought dramatically improved capabilities, but at exponentially increasing costs. @@ -187,7 +187,7 @@ where loss $\mathcal{L}$ decreases as resource quantity $N$ increases, following These theoretical predictions find strong empirical support across multiple model configurations. As shown in @fig-loss-vs-n-d, the early-stopped test loss varies predictably with both dataset size and model size, and learning curves across configurations can be aligned through appropriate parameterization. -#### Resource-Constrained Scaling Regimes {#sec-efficient-ai-resource-scaling-regimes-7a2c} +#### Resource-Constrained Scaling Regimes {#sec-efficient-ai-resourceconstrained-scaling-regimes-bae9} Understanding how to apply scaling laws in practice requires recognizing three distinct resource allocation regimes that emerge from the fundamental trade-offs between compute budget, data availability, and optimal resource allocation. These regimes provide practical guidance for system designers navigating resource constraints. @@ -339,7 +339,7 @@ anchor=south,above=0pt,fill=white]at(axis description cs:0.1,0.45){Params}; Understanding scaling laws requires recognizing that performance improvements follow predictable patterns, but these patterns change depending on resource availability and exhibit distinct behaviors across different dimensions. Two important types of scaling regimes emerge: **data-driven regimes** that describe how performance changes with dataset size, and **temporal regimes** that describe when in the ML lifecycle we apply additional compute. -#### Data-Limited Scaling Regimes {#sec-efficient-ai-data-limited-scaling-regimes-c40f} +#### Data-Limited Scaling Regimes {#sec-efficient-ai-datalimited-scaling-regimes-af51} The relationship between generalization error and dataset size exhibits three distinct regimes, as shown in @fig-data-scaling-regimes. In the **Small Data Region**, limited examples produce high generalization error constrained by inadequate statistical estimates. As data availability increases, models enter the **Power-law Region**, where generalization error decreases predictably as a function of dataset size. This region provides the most practical benefit from data scaling. Eventually, performance reaches saturation in the **Irreducible Error Region**, approaching a performance floor determined by inherent data limitations or model capacity, beyond which additional data yields negligible improvements. @@ -478,7 +478,7 @@ The financial and environmental implications compound these challenges. Training These trade-offs demonstrate that while scaling laws provide valuable frameworks for understanding performance growth, they do not constitute unencumbered paths to improvement. Each incremental performance gain requires evaluation against corresponding resource requirements. As systems approach practical scaling limits, emphasis must transition from scaling alone to efficient scaling—a comprehensive approach balancing performance, cost, energy consumption, and environmental impact. -### When Scaling Fails {#sec-efficient-ai-when-scaling-fails-2247} +### When Scaling Fails {#sec-efficient-ai-scaling-fails-6609} While scaling laws exhibit remarkable consistency within specific operational regimes, they possess inherent limitations. As systems expand, they inevitably encounter boundaries where underlying assumptions of smooth, predictable scaling cease to hold. These breakdown points expose critical inefficiencies and emphasize the necessity for refined system design approaches. @@ -625,7 +625,7 @@ Three major approaches dominate modern algorithmic efficiency, each targeting di [^fn-knowledge-distillation]: **Knowledge Distillation**: Technique where a large "teacher" model transfers knowledge to a smaller "student" model by training the student to mimic the teacher's output probabilities. DistilBERT achieves 97% of BERT's performance with 40% fewer parameters and 60% faster inference through distillation. -#### Hardware-Algorithm Co-Design {#sec-efficient-ai-hardware-algorithm-codesign-aebf} +#### Hardware-Algorithm Co-Design {#sec-efficient-ai-hardwarealgorithm-codesign-c2b3} Algorithmic optimizations alone are insufficient; their practical benefits depend critically on hardware-software co-design. Optimization techniques must be tailored to target hardware characteristics—memory bandwidth, compute capabilities, and precision support—to achieve real-world speedups. For example, INT8 quantization achieves 2.3x speedup on NVIDIA V100 GPUs with tensor core support but may provide minimal benefit on hardware lacking specialized integer instructions. @@ -773,7 +773,7 @@ The evolution of algorithmic efficiency, from basic compression to hardware-awar Compute efficiency focuses on the effective use of hardware and computational resources to train and deploy machine learning models. It encompasses strategies for reducing energy consumption, optimizing processing speed, and leveraging hardware capabilities to achieve scalable and sustainable system performance. While this chapter focuses on efficiency principles and trade-offs, the detailed technical implementation of hardware acceleration—including GPU architectures, TPU design, memory systems, and custom accelerators—is covered in @sec-ai-acceleration. -#### Specialized Computing Evolution {#sec-efficient-ai-specialized-computing-evolution-e135} +#### Specialized Computing Evolution {#sec-efficient-ai-specialized-computing-evolution-a343} Understanding compute efficiency's evolution reveals why specialized hardware became essential. In the early days of machine learning, Central Processing Units (CPUs) shaped what was possible. CPUs excel at sequential processing and complex decision-making but have limited parallelism, typically 4-16 cores optimized for diverse tasks rather than the repetitive matrix operations that dominate machine learning. Training times for models were measured in days or weeks, as even relatively small datasets pushed hardware boundaries. @@ -1057,7 +1057,7 @@ Compute efficiency directly complements algorithmic and data efficiency. Compact Data efficiency focuses on optimizing the amount and quality of data required to train machine learning models effectively. While historically less emphasized than model or compute efficiency, data efficiency has emerged as a pivotal dimension, driven by rising costs of data collection, storage, and processing, as well as the fundamental limits of available high-quality data. -#### Data-Centric AI Approaches {#sec-efficient-ai-data-centric-ai-approaches-621b} +#### Data-Centric AI Approaches {#sec-efficient-ai-datacentric-ai-approaches-337c} In early machine learning, data efficiency was not a primary focus, as datasets were relatively small and manageable. The challenge was often acquiring enough labeled data to train models effectively. Researchers relied on curated datasets such as [UCI's Machine Learning Repository](https://archive.ics.uci.edu/)[^fn-uci], using feature selection and dimensionality reduction techniques like principal component analysis (PCA)[^fn-pca] to extract maximum value from limited data. @@ -1388,7 +1388,7 @@ Neural architecture search (NAS)[^fn-nas] takes automation further by designing Data efficiency also benefits from automation. Tools that automate dataset curation, augmentation, and active learning reduce training dataset size without sacrificing performance, prioritizing high-value data points to speed up training and reduce computational overhead [@settles2009active]. @sec-ai-frameworks explores how modern ML frameworks incorporate these automation capabilities. -### Systematic Evaluation and Assessment {#sec-efficient-ai-systematic-evaluation-assessment-3f43} +### Systematic Evaluation and Assessment {#sec-efficient-ai-systematic-evaluation-assessment-a83e} Beyond technical automation lies the broader challenge of systematic evaluation. Efficiency optimization necessitates a structured approach assessing trade-offs that extends beyond purely technical considerations. As systems transition from research to production, success criteria must encompass algorithmic performance, economic viability, and operational sustainability. @@ -1400,7 +1400,7 @@ This evaluation framework must be complemented by **ongoing assessment mechanism Designing an efficient machine learning system requires a holistic approach. True efficiency emerges when the entire system is considered as a whole, ensuring trade-offs are balanced across all stages of the ML pipeline from data collection to deployment. This end-to-end perspective transforms system design. -### System-Level Thinking {#sec-efficient-ai-system-level-thinking-fc95} +### System-Level Thinking {#sec-efficient-ai-systemlevel-thinking-4f8d} Efficiency is achieved not through isolated optimizations but by considering the entire pipeline as a unified whole. Each stage—data collection, model training, hardware deployment, and inference—contributes to overall system efficiency. Decisions at one stage ripple through the rest, influencing performance, resource use, and scalability. @@ -1412,7 +1412,7 @@ Deployment and inference demand precise hardware alignment. Each platform offers An end-to-end perspective ensures trade-offs are addressed holistically rather than shifting inefficiencies between pipeline stages. This systems thinking approach becomes particularly critical when deploying to resource-constrained environments, as explored in @sec-ondevice-learning. -### Context-Driven Design {#sec-efficient-ai-context-driven-design-2bd5} +### Context-Driven Design {#sec-efficient-ai-contextdriven-design-72ff} Efficiency needs differ significantly depending on lifecycle stage and deployment environment—from research prototypes to production systems, from high-performance cloud to resource-constrained edge. diff --git a/quarto/contents/core/hw_acceleration/hw_acceleration.qmd b/quarto/contents/core/hw_acceleration/hw_acceleration.qmd index 2a08eba49..23e03f9c4 100644 --- a/quarto/contents/core/hw_acceleration/hw_acceleration.qmd +++ b/quarto/contents/core/hw_acceleration/hw_acceleration.qmd @@ -277,61 +277,62 @@ Before examining these computational primitives in detail, it is valuable to und ::: {#fig-accelerator-anatomy fig-env="figure" fig-pos="htb"} ```{.tikz} \begin{tikzpicture}[ - box/.style={draw, rounded corners=3pt, minimum width=2.5cm, minimum height=1cm, align=center}, - pe/.style={box, fill=blue!20}, - mem/.style={box, fill=green!20}, - compute/.style={box, fill=yellow!20, minimum width=1.5cm, minimum height=0.6cm}, - control/.style={box, fill=orange!20}, - interface/.style={box, fill=red!20} + node distance=6mm and 10mm, + font=\small\sffamily, + box/.style={rectangle, draw, thick, minimum height=1.2cm, minimum width=2.2cm, align=center, rounded corners=3pt}, + pe/.style={box, fill=blue!10, minimum height=1.5cm, minimum width=2cm}, + mem/.style={box, fill=green!10}, + compute/.style={box, fill=yellow!20, minimum width=1.8cm, minimum height=0.8cm, rounded corners=2pt}, + host/.style={box, fill=gray!10}, + conn/.style={-latex, thick}, + bus/.style={latex-latex, line width=2.5mm, draw=gray!40, text=black, font=\tiny\sffamily} ] -% Memory hierarchy at top -\node[mem] (hbm) at (0,6) {High-Bandwidth\\Memory\\(HBM)}; -\node[mem] (l2) at (4,5) {L2 Cache\\(Shared)}; -\node[mem] (l1) at (8,4) {L1 Cache\\(Local)}; +% Main Chip Boundary +\node[draw, dashed, thick, minimum width=12cm, minimum height=8cm, label={[yshift=-3mm]north:AI Accelerator Chip}] (chip) {}; -% Processing element cluster -\node[pe] (pe1) at (2,2) {PE 0}; -\node[pe] (pe2) at (4,2) {PE 1}; -\node[pe] (pe3) at (6,2) {PE 2}; -\node[pe] (pe4) at (8,2) {PE 3}; +% Host CPU and DRAM (outside the chip, to the left) +\node[host, left=of chip, xshift=-2.5cm] (cpu) {Host CPU}; +\node[mem, below=of cpu] (dram) {Host DRAM}; +\draw[bus] (cpu.east) -- (chip.west |- cpu) node[midway, above, yshift=2mm] {Host Interface \\ (PCIe/NVLink)}; +\draw[conn, dashed, thin] (dram.north) -- (cpu.south); -% Compute units inside PEs (showing detail for PE1) -\node[compute] (tensor1) at (1.2,1.2) {Tensor\\Core}; -\node[compute] (vector1) at (2.8,1.2) {Vector\\Unit}; -\node[compute] (sfu1) at (2,0.4) {SFU}; +% High-Bandwidth Memory (HBM) (outside the chip, to the right) +\node[mem, right=of chip, xshift=2.5cm] (hbm) {High-Bandwidth\\Memory (HBM)}; +\draw[bus] (hbm.west) -- (chip.east |- hbm) node[midway, above, yshift=2mm] {Memory Interface}; -% Show interconnect -\draw[thick, <->] (pe1) -- (pe2); -\draw[thick, <->] (pe2) -- (pe3); -\draw[thick, <->] (pe3) -- (pe4); +% L2 Cache inside the chip +\node[mem, minimum width=10cm, fill=green!20] (l2) at (chip.center) {L2 Cache (Shared)}; -% Memory connections -\draw[thick, ->] (hbm) -- (l2); -\draw[thick, ->] (l2) -- (l1); -\draw[thick, ->] (l1) -- (pe4); +% Processing Elements (PEs) arranged in a grid +\matrix[row sep=5mm, column sep=5mm] at (l2.center) { + \node[pe] (pe1) {PE}; & \node[pe] (pe2) {PE}; & \node[font=\Huge] (dots) {...}; & \node[pe] (peN) {PE}; \\ + \node[pe] (peM) {PE}; & \node[pe] (peMN) {PE}; & \node[font=\Huge] (dots2) {...}; & \node[pe] (peLast) {PE}; \\ +}; -% Host interface -\node[interface] (pcie) at (10,4) {Host Interface\\(PCIe/NVLink)}; -\draw[thick, <->] (l1) -- (pcie); +% Connections from L2 to PEs +\foreach \p in {pe1, pe2, peN, peM, peMN, peLast} { + \draw[conn, thin, gray] (l2) -- (\p); +} -% Control unit -\node[control] (ctrl) at (6,0) {Control Unit\\& Scheduler}; -\draw[thick, ->] (ctrl) -- (pe2); -\draw[thick, ->] (ctrl) -- (pe3); +% Zoom-in on a Processing Element (PE1) +\node[draw, thick, fill=blue!5, rounded corners, fit=(pe1), label={[yshift=-1.2cm]below:Processing Element}] (pe_zoom) {}; -% Labels for key concepts -\node[anchor=west, font=\footnotesize] at (0,3.5) {\textbf{Memory Hierarchy:}}; -\node[anchor=west, font=\footnotesize] at (0,3.2) {Data flows from HBM}; -\node[anchor=west, font=\footnotesize] at (0,2.9) {through caches to PEs}; +% Components inside the PE1 +\node[compute] (tensor_core) at (pe1.center) [yshift=0.8cm] {Tensor Core}; +\node[compute] (vector_unit) at (pe1.center) [yshift=-0.8cm] {Vector Unit}; +\node[compute, fill=orange!20] (sfu) at (pe1.center) [xshift=2.2cm] {SFU}; +\node[mem, fill=green!30, minimum width=1.5cm] (l1) at (pe1.center) [xshift=-2.2cm] {L1 Cache /\\Scratchpad}; -\node[anchor=west, font=\footnotesize] at (9,1.2) {\textbf{Processing Elements:}}; -\node[anchor=west, font=\footnotesize] at (9,0.9) {Contain specialized}; -\node[anchor=west, font=\footnotesize] at (9,0.6) {compute units}; +% Connections inside the PE +\draw[conn, thin, dashed] (l1) -- (tensor_core); +\draw[conn, thin, dashed] (l1) -- (vector_unit); +\draw[conn, thin, dashed] (tensor_core) -- (sfu); +\draw[conn, thin, dashed] (vector_unit) -- (sfu); \end{tikzpicture} ``` -**Anatomy of a Modern AI Accelerator**: AI accelerators integrate specialized processing elements containing tensor cores, vector units, and special function units, supported by a hierarchical memory system from high-bandwidth memory down to local caches. This architecture maximizes data reuse and parallel execution while minimizing energy-intensive data movement, forming the foundation for 100-1000× performance improvements over general-purpose processors. +**Figure 2: Anatomy of a Modern AI Accelerator**: AI accelerators integrate specialized processing elements containing tensor cores, vector units, and special function units, supported by a hierarchical memory system from high-bandwidth memory down to local caches. This architecture maximizes data reuse and parallel execution while minimizing energy-intensive data movement, forming the foundation for 100-1000× performance improvements over general-purpose processors. ::: ## AI Compute Primitives {#sec-ai-acceleration-ai-compute-primitives-8471} @@ -3213,7 +3214,7 @@ While we an overview of the key concepts and challenges in multi-chip AI acceler Understanding the principles and trade-offs involved in multi-chip AI acceleration enables machine learning engineers and system designers to make informed decisions about how to best deploy and optimize their models. Whether training large language models on TPU pods or deploying computer vision applications on multi-GPU systems, the ability to efficiently map computations to hardware will continue to be a critical factor in realizing the full potential of AI. -## Heterogeneous SoC AI Acceleration {#sec-ai-acceleration-heterogeneous-soc-ai-acceleration-b1bb-ai-acceleration-b1bb} +## Heterogeneous SoC AI Acceleration {#sec-ai-acceleration-heterogeneous-soc-ai-acceleration-b1bb} The multi-chip architectures examined in previous sections focused primarily on maximizing computational throughput for data center workloads, where power budgets extend to kilowatts and cooling infrastructure supports rack-scale deployments. However, the hardware acceleration principles established—specialized compute units, memory hierarchy optimization, and workload mapping strategies—must adapt dramatically when deploying AI systems in mobile and edge environments. A smartphone operates within a 2 to 5 watt power budget, autonomous vehicles require deterministic real-time guarantees, and IoT sensors must function for years on battery power. These constraints necessitate heterogeneous System-on-Chip (SoC) architectures that coordinate multiple specialized processors within a single chip while meeting stringent power, thermal, and latency requirements fundamentally different from data center deployments. @@ -3229,7 +3230,7 @@ While Qualcomm's approach emphasizes diverse processor specialization, Apple's v Beyond these vertically integrated solutions from Qualcomm and Apple, ARM's IP licensing model offers a fundamentally different approach that enables SoC designers to customize processor combinations based on target applications. The Mali-G78 GPU's 24 cores can be paired with Ethos-N78 NPU for balanced general-purpose and AI acceleration, while the Cortex-M55 microcontroller integrates Ethos-U55 microNPU for ultra-low-power edge applications. This modular flexibility allows automotive SoCs to emphasize deterministic real-time processing while smartphone SoCs optimize for interactive performance and battery efficiency. -### Dynamic Workload Distribution Strategies {#sec-ai-acceleration-dynamic-workload-distribution-strategies-7e00-strategies-7e00} +### Dynamic Workload Distribution Strategies {#sec-ai-acceleration-dynamic-workload-distribution-strategies-7e00} With multiple specialized processors available on heterogeneous SoCs, the critical challenge becomes intelligently distributing neural network operations across these resources to maximize performance while respecting power and latency constraints. @@ -3254,7 +3255,7 @@ When DVFS alone cannot maintain the power envelope, mobile SoCs implement therma Beyond real-time power and thermal management, mobile AI systems must also adapt their computational strategies based on battery state and charging status. During low battery conditions, the system may switch from high-accuracy models to efficient approximations, migrate workloads from power-hungry NPU to energy-efficient DSP, or reduce inference frequency while maintaining application responsiveness. Conversely, during charging, the system can enable higher-performance models and increase processing frequency to deliver enhanced user experiences. -### Automotive Heterogeneous AI Systems {#sec-ai-acceleration-automotive-heterogeneous-ai-systems-deda-ai-systems-deda} +### Automotive Heterogeneous AI Systems {#sec-ai-acceleration-automotive-heterogeneous-ai-systems-deda} Automotive applications introduce unique heterogeneous computing challenges that combine mobile-style power efficiency with hard real-time guarantees and functional safety requirements—a combination that demands fundamentally different architectural approaches. diff --git a/quarto/contents/core/introduction/introduction.qmd b/quarto/contents/core/introduction/introduction.qmd index 034c52fa0..888f86061 100644 --- a/quarto/contents/core/introduction/introduction.qmd +++ b/quarto/contents/core/introduction/introduction.qmd @@ -256,7 +256,7 @@ These interdependencies become particularly clear when examining breakthrough mo With this three-component framework established, we must understand a fundamental difference that distinguishes ML systems from traditional software: how failures manifest and are detected. -### How ML Systems Differ: Silent Performance Degradation {#sec-introduction-ml-systems-differ-silent-performance-degradation} +### How ML Systems Differ: Silent Performance Degradation {#sec-introduction-ml-systems-differ-silent-performance-degradation-0623} Traditional software fails in obvious ways. When code breaks, applications crash, error messages appear, and monitoring systems trigger alerts. This immediate feedback enables rapid diagnosis and repair. Machine learning systems operate differently—they can continue functioning while their performance quietly degrades without triggering obvious errors. @@ -268,7 +268,7 @@ This fundamental difference in failure modes helps explain why ML models develop Understanding this challenge provides essential context as we explore how systems engineering has emerged as fundamental to AI advancement. With this foundation established, a natural question arises: which component of the AI Triangle matters most for advancing AI capabilities? -## The Bitter Lesson: Why Systems Engineering Matters {#sec-introduction-bitter-lesson-ml-systems-engineering-matters-b764} +## The Bitter Lesson: Why Systems Engineering Matters {#sec-introduction-bitter-lesson-systems-engineering-matters-dede} The single biggest lesson from 70 years of AI research is that systems that can leverage massive computation ultimately win. This is why systems engineering, not just algorithmic cleverness, has become the bottleneck for progress in AI. @@ -1329,7 +1329,7 @@ This interdependency explains why ML systems engineering requires holistic think The challenge landscape also explains why many research models fail to reach production. Academic ML often focuses on maximizing accuracy on benchmark datasets, potentially ignoring practical constraints like inference latency, training costs, data privacy, or operational monitoring. Production ML systems must balance accuracy against deployment feasibility, operational costs, ethical considerations, and long-term maintainability. This gap between research priorities and production realities motivates this book's emphasis on systems engineering rather than pure algorithmic innovation. -## From Challenges to Solutions: Five-Pillar Framework {#sec-introduction-challenges-solutions-fivepillar-framework-fdf0} +## From Challenges to Solutions: Five-Pillar Framework {#sec-introduction-challenges-solutions-fivepillar-framework-f4e1} The challenges we've explored—from silent performance degradation and data drift to model complexity and ethical concerns—reveal why ML systems engineering has emerged as a distinct discipline. The unique failure patterns we discussed earlier exemplify the need for specialized approaches: traditional software engineering practices cannot address systems that degrade quietly rather than failing obviously. These challenges cannot be addressed through algorithmic innovation alone; they require systematic engineering practices that span the entire system lifecycle from initial data collection through continuous operation and evolution. diff --git a/quarto/contents/core/ml_systems/emerging_topics_quizzes.json b/quarto/contents/core/ml_systems/emerging_topics_quizzes.json index 3497f47bd..00ca63422 100644 --- a/quarto/contents/core/ml_systems/emerging_topics_quizzes.json +++ b/quarto/contents/core/ml_systems/emerging_topics_quizzes.json @@ -7,7 +7,7 @@ }, "sections": [ { - "section_id": "#sec-ml-systems-overview-db10", + "section_id": "#sec-ml-systems-deployment-spectrum-38d0", "section_title": "Overview", "quiz_data": { "quiz_needed": true, @@ -62,7 +62,7 @@ } }, { - "section_id": "#sec-ml-systems-cloudbased-machine-learning-7606", + "section_id": "#sec-ml-systems-cloud-ml-maximizing-computational-power-f232", "section_title": "Cloud-Based Machine Learning", "quiz_data": { "quiz_needed": true, @@ -111,7 +111,7 @@ } }, { - "section_id": "#sec-ml-systems-edge-machine-learning-06ec", + "section_id": "#sec-ml-systems-edge-ml-reducing-latency-privacy-risk-31f9", "section_title": "Edge Machine Learning", "quiz_data": { "quiz_needed": true, @@ -166,7 +166,7 @@ } }, { - "section_id": "#sec-ml-systems-mobile-machine-learning-f5b5", + "section_id": "#sec-ml-systems-mobile-ml-personal-offline-intelligence-7905", "section_title": "Mobile Machine Learning", "quiz_data": { "quiz_needed": true, @@ -222,7 +222,7 @@ } }, { - "section_id": "#sec-ml-systems-tiny-machine-learning-9d4a", + "section_id": "#sec-ml-systems-tiny-ml-ubiquitous-sensing-scale-51d8", "section_title": "Tiny Machine Learning", "quiz_data": { "quiz_needed": true, @@ -271,7 +271,7 @@ } }, { - "section_id": "#sec-ml-systems-hybrid-machine-learning-1bbf", + "section_id": "#sec-ml-systems-hybrid-architectures-combining-paradigms-c1f2", "section_title": "Hybrid Machine Learning", "quiz_data": { "quiz_needed": true, @@ -326,7 +326,7 @@ } }, { - "section_id": "#sec-ml-systems-shared-principles-34fe", + "section_id": "#sec-ml-systems-universal-design-principles-9ec9", "section_title": "Shared Principles", "quiz_data": { "quiz_needed": true, @@ -381,7 +381,7 @@ } }, { - "section_id": "#sec-ml-systems-system-comparison-8b05", + "section_id": "#sec-ml-systems-paradigm-tradeoffs-selection-2015", "section_title": "System Comparison", "quiz_data": { "quiz_needed": true, diff --git a/quarto/contents/core/ml_systems/frontiers_quizzes.json b/quarto/contents/core/ml_systems/frontiers_quizzes.json index 920d8d56d..c24783b48 100644 --- a/quarto/contents/core/ml_systems/frontiers_quizzes.json +++ b/quarto/contents/core/ml_systems/frontiers_quizzes.json @@ -7,7 +7,7 @@ }, "sections": [ { - "section_id": "#sec-ml-systems-overview-db10", + "section_id": "#sec-ml-systems-deployment-spectrum-38d0", "section_title": "Overview", "quiz_data": { "quiz_needed": true, @@ -62,7 +62,7 @@ } }, { - "section_id": "#sec-ml-systems-cloudbased-machine-learning-7606", + "section_id": "#sec-ml-systems-cloud-ml-maximizing-computational-power-f232", "section_title": "Cloud-Based Machine Learning", "quiz_data": { "quiz_needed": true, @@ -117,7 +117,7 @@ } }, { - "section_id": "#sec-ml-systems-edge-machine-learning-06ec", + "section_id": "#sec-ml-systems-edge-ml-reducing-latency-privacy-risk-31f9", "section_title": "Edge Machine Learning", "quiz_data": { "quiz_needed": true, @@ -167,7 +167,7 @@ } }, { - "section_id": "#sec-ml-systems-mobile-machine-learning-f5b5", + "section_id": "#sec-ml-systems-mobile-ml-personal-offline-intelligence-7905", "section_title": "Mobile Machine Learning", "quiz_data": { "quiz_needed": true, @@ -223,7 +223,7 @@ } }, { - "section_id": "#sec-ml-systems-tiny-machine-learning-9d4a", + "section_id": "#sec-ml-systems-tiny-ml-ubiquitous-sensing-scale-51d8", "section_title": "Tiny Machine Learning", "quiz_data": { "quiz_needed": true, @@ -284,7 +284,7 @@ } }, { - "section_id": "#sec-ml-systems-hybrid-machine-learning-1bbf", + "section_id": "#sec-ml-systems-hybrid-architectures-combining-paradigms-c1f2", "section_title": "Hybrid Machine Learning", "quiz_data": { "quiz_needed": true, @@ -339,7 +339,7 @@ } }, { - "section_id": "#sec-ml-systems-shared-principles-34fe", + "section_id": "#sec-ml-systems-universal-design-principles-9ec9", "section_title": "Shared Principles", "quiz_data": { "quiz_needed": true, @@ -388,7 +388,7 @@ } }, { - "section_id": "#sec-ml-systems-system-comparison-8b05", + "section_id": "#sec-ml-systems-paradigm-tradeoffs-selection-2015", "section_title": "System Comparison", "quiz_data": { "quiz_needed": true, diff --git a/quarto/contents/core/ml_systems/ml_systems.qmd b/quarto/contents/core/ml_systems/ml_systems.qmd index 0a7ddf6a6..3fa59efe3 100644 --- a/quarto/contents/core/ml_systems/ml_systems.qmd +++ b/quarto/contents/core/ml_systems/ml_systems.qmd @@ -1380,7 +1380,7 @@ Technical constraints alone prove insufficient for deployment decisions. Organiz Successful deployment emerges from balancing technical optimization against organizational capability. Paradigm selection represents systems engineering challenges that extend well beyond pure technical requirements, encompassing team skills, operational capacity, and economic constraints. These decisions remain constrained by fundamental scaling laws explored in @sec-efficient-ai-ai-scaling-laws-a043, with operational aspects detailed in @sec-ml-operations and benchmarking approaches covered in @sec-benchmarking-ai. -## Fallacies and Pitfalls {#sec-ml-systems-fallacies-pitfalls-8074-8074} +## Fallacies and Pitfalls {#sec-ml-systems-fallacies-pitfalls-8074} Understanding deployment paradigms requires recognizing common misconceptions that can lead to poor architectural decisions. These fallacies often stem from oversimplified thinking about the fundamental trade-offs governing ML systems design. diff --git a/quarto/contents/core/ml_systems/ml_systems_quizzes.json b/quarto/contents/core/ml_systems/ml_systems_quizzes.json index f51cff47a..8df0b2b25 100644 --- a/quarto/contents/core/ml_systems/ml_systems_quizzes.json +++ b/quarto/contents/core/ml_systems/ml_systems_quizzes.json @@ -7,7 +7,7 @@ }, "sections": [ { - "section_id": "#sec-ml-systems-overview-db10", + "section_id": "#sec-ml-systems-deployment-spectrum-38d0", "section_title": "Overview", "quiz_data": { "quiz_needed": false, @@ -15,7 +15,7 @@ } }, { - "section_id": "#sec-ml-systems-cloudbased-machine-learning-7606", + "section_id": "#sec-ml-systems-cloud-ml-maximizing-computational-power-f232", "section_title": "Cloud-Based Machine Learning", "quiz_data": { "quiz_needed": true, @@ -52,7 +52,7 @@ { "question_type": "CALC", "question": "If a cloud-based ML system uses 1,000 NVIDIA V100 GPUs continuously for 355 days to train a model, calculate the total petaflop-days of compute used. Assume each V100 GPU provides 125 teraflops.", - "answer": "Each V100 GPU provides 125 teraflops. For 1,000 GPUs, the total teraflops is 125,000. Over 355 days, the compute used is 125,000 teraflops × 24 hours/day × 355 days = 1,065,000,000 teraflop-hours. Converting to petaflop-days: 1,065,000,000 teraflop-hours / (1,000 teraflops/petaflop) / 24 hours/day = 44,375 petaflop-days. This demonstrates the massive computational power required for training large ML models in the cloud.", + "answer": "Each V100 GPU provides 125 teraflops. For 1,000 GPUs, the total teraflops is 125,000. Over 355 days, the compute used is 125,000 teraflops \u00d7 24 hours/day \u00d7 355 days = 1,065,000,000 teraflop-hours. Converting to petaflop-days: 1,065,000,000 teraflop-hours / (1,000 teraflops/petaflop) / 24 hours/day = 44,375 petaflop-days. This demonstrates the massive computational power required for training large ML models in the cloud.", "learning_objective": "Apply knowledge of computational requirements to calculate the resources used in cloud-based ML training.", "_hidden_at": "2025-09-11T18:18:10.011235", "_manually_shown": true, @@ -74,7 +74,7 @@ } }, { - "section_id": "#sec-ml-systems-edge-machine-learning-06ec", + "section_id": "#sec-ml-systems-edge-ml-reducing-latency-privacy-risk-31f9", "section_title": "Edge Machine Learning", "quiz_data": { "quiz_needed": true, @@ -117,7 +117,7 @@ { "question_type": "CALC", "question": "An edge device processes 2.5 quintillion bytes of data daily. If it reduces cloud traffic by 90%, how much data is sent to the cloud each day?", - "answer": "The device processes 2.5 quintillion bytes daily. Reducing cloud traffic by 90% means only 10% of the data is sent to the cloud. Calculation: 2.5 quintillion bytes × 0.10 = 0.25 quintillion bytes. Therefore, 0.25 quintillion bytes of data is sent to the cloud each day, highlighting the efficiency of Edge ML in reducing bandwidth usage.", + "answer": "The device processes 2.5 quintillion bytes daily. Reducing cloud traffic by 90% means only 10% of the data is sent to the cloud. Calculation: 2.5 quintillion bytes \u00d7 0.10 = 0.25 quintillion bytes. Therefore, 0.25 quintillion bytes of data is sent to the cloud each day, highlighting the efficiency of Edge ML in reducing bandwidth usage.", "learning_objective": "Apply Edge ML concepts to calculate data reduction in cloud traffic.", "hidden": true, "_manually_hidden": true, @@ -127,7 +127,7 @@ } }, { - "section_id": "#sec-ml-systems-mobile-machine-learning-f5b5", + "section_id": "#sec-ml-systems-mobile-ml-personal-offline-intelligence-7905", "section_title": "Mobile Machine Learning", "quiz_data": { "quiz_needed": true, @@ -177,7 +177,7 @@ } }, { - "section_id": "#sec-ml-systems-tiny-machine-learning-9d4a", + "section_id": "#sec-ml-systems-tiny-ml-ubiquitous-sensing-scale-51d8", "section_title": "Tiny Machine Learning", "quiz_data": { "quiz_needed": true, @@ -213,8 +213,8 @@ }, { "question_type": "CALC", - "question": "A Tiny ML device consumes 30µW on average and is powered by a CR2032 coin-cell battery with a capacity of 225mAh. Calculate the expected operational lifespan of the device in years.", - "answer": "The battery capacity is 225mAh × 3V = 675mWh. The device consumes 30µW, which is 0.03mW. Operational lifespan = 675mWh / 0.03mW = 22,500 hours. Converting hours to years: 22,500 hours / 24 hours/day / 365 days/year ≈ 2.57 years. This calculation shows that the device can operate for approximately 2.57 years on a single CR2032 battery.", + "question": "A Tiny ML device consumes 30\u00b5W on average and is powered by a CR2032 coin-cell battery with a capacity of 225mAh. Calculate the expected operational lifespan of the device in years.", + "answer": "The battery capacity is 225mAh \u00d7 3V = 675mWh. The device consumes 30\u00b5W, which is 0.03mW. Operational lifespan = 675mWh / 0.03mW = 22,500 hours. Converting hours to years: 22,500 hours / 24 hours/day / 365 days/year \u2248 2.57 years. This calculation shows that the device can operate for approximately 2.57 years on a single CR2032 battery.", "learning_objective": "Apply energy consumption calculations to determine the operational lifespan of Tiny ML devices." }, { @@ -227,7 +227,7 @@ } }, { - "section_id": "#sec-ml-systems-hybrid-machine-learning-1bbf", + "section_id": "#sec-ml-systems-hybrid-architectures-combining-paradigms-c1f2", "section_title": "Hybrid Machine Learning", "quiz_data": { "quiz_needed": true, @@ -280,7 +280,7 @@ } }, { - "section_id": "#sec-ml-systems-shared-principles-34fe", + "section_id": "#sec-ml-systems-universal-design-principles-9ec9", "section_title": "Shared Principles", "quiz_data": { "quiz_needed": true, @@ -324,7 +324,7 @@ } }, { - "section_id": "#sec-ml-systems-system-comparison-8b05", + "section_id": "#sec-ml-systems-paradigm-tradeoffs-selection-2015", "section_title": "System Comparison", "quiz_data": { "quiz_needed": true, diff --git a/tools/scripts/format_tables.py b/tools/scripts/format_tables.py new file mode 100644 index 000000000..d839d7cd8 --- /dev/null +++ b/tools/scripts/format_tables.py @@ -0,0 +1,553 @@ +#!/usr/bin/env python3 +""" +Table Formatter for MLSysBook + +This script formats markdown grid tables to ensure: +1. All column headers (first row) are bolded +2. All first column entries are bolded +3. Column widths are properly calculated based on content +4. Alignment bars match the actual column widths +5. Content is left-aligned within cells + +Usage: + python format_tables.py --check # Check if tables are formatted correctly + python format_tables.py --fix # Format tables in place + python format_tables.py --check-all # Check all .qmd files in contents/core + python format_tables.py --fix-all # Format all .qmd files in contents/core +""" + +import argparse +import re +import sys +from pathlib import Path +from typing import List, Tuple, Optional +import unicodedata + + +def display_width(text: str) -> int: + """ + Calculate the display width of text, accounting for Unicode characters. + + Bold markers (**text**) are not counted in display width. + East Asian Wide and Fullwidth characters count as 2. + """ + # Remove bold markers for width calculation + text = text.replace('**', '') + + width = 0 + for char in text: + if unicodedata.east_asian_width(char) in ('F', 'W'): + width += 2 + else: + width += 1 + return width + + +def parse_table(lines: List[str]) -> Optional[dict]: + """ + Parse a markdown grid table into structured data. + + Returns: + dict with 'start_line', 'end_line', 'header', 'separator', 'rows', 'caption' + or None if not a valid table + """ + if not lines or not lines[0].startswith('+'): + return None + + table_data = { + 'start_line': 0, + 'end_line': 0, + 'header_border': '', + 'header': '', + 'separator': '', + 'rows': [], + 'footer_border': '', + 'caption': '' + } + + i = 0 + + # First line should be top border + if not lines[i].startswith('+'): + return None + table_data['header_border'] = lines[i] + i += 1 + + # Next should be header row + if i >= len(lines) or not lines[i].startswith('|'): + return None + table_data['header'] = lines[i] + i += 1 + + # Next should be separator with := for alignment + if i >= len(lines) or not lines[i].startswith('+'): + return None + table_data['separator'] = lines[i] + i += 1 + + # Parse data rows + while i < len(lines): + line = lines[i] + if line.startswith('|'): + table_data['rows'].append(line) + i += 1 + elif line.startswith('+'): + # Row separator or footer + if i + 1 < len(lines) and lines[i + 1].startswith('|'): + # This is a row separator, include it with the row + table_data['rows'].append(line) + i += 1 + else: + # This is the footer border + table_data['footer_border'] = line + i += 1 + break + else: + break + + # Check for caption (starts with : after table) + if i < len(lines) and lines[i].strip().startswith(':'): + table_data['caption'] = lines[i] + i += 1 + + table_data['end_line'] = i + + return table_data + + +def parse_row(row: str) -> List[str]: + """Parse a table row into individual cell contents.""" + # Remove leading and trailing pipes + row = row.strip() + if row.startswith('|'): + row = row[1:] + if row.endswith('|'): + row = row[:-1] + + # Split by pipes and strip whitespace + cells = [cell.strip() for cell in row.split('|')] + return cells + + +def bold_text(text: str) -> str: + """Add bold markers to text if not already bolded. Returns empty string if text is empty.""" + text = text.strip() + # Don't bold empty strings + if not text: + return '' + # Don't double-bold + if text.startswith('**') and text.endswith('**'): + return text + return f"**{text}**" + + +def is_bolded(text: str) -> bool: + """Check if text is already bolded.""" + text = text.strip() + return text.startswith('**') and text.endswith('**') + + +def calculate_column_widths(header: str, rows: List[str]) -> List[int]: + """ + Calculate the width needed for each column based on content. + + Returns list of widths for each column. + """ + # Parse all rows to get cell contents + all_rows = [parse_row(header)] + for row in rows: + if row.startswith('|'): + all_rows.append(parse_row(row)) + + # Find number of columns + num_cols = len(all_rows[0]) + + # Calculate max width for each column + widths = [0] * num_cols + for row in all_rows: + for col_idx, cell in enumerate(row): + if col_idx < num_cols: + width = display_width(cell) + widths[col_idx] = max(widths[col_idx], width) + + return widths + + +def extract_alignment(separator: str) -> List[str]: + """ + Extract alignment information from separator line. + + Returns list of alignments: 'left', 'center', or 'right' + """ + # Split by + to get each column separator + parts = separator.split('+')[1:-1] # Remove empty first and last + + alignments = [] + for part in parts: + part = part.strip() + if part.startswith(':') and part.endswith(':'): + alignments.append('center') + elif part.startswith(':'): + alignments.append('left') + elif part.endswith(':'): + alignments.append('right') + else: + alignments.append('left') # Default + + return alignments + + +def build_border(widths: List[int]) -> str: + """Build a border line like +----+----+----+""" + parts = ['-' * (w + 2) for w in widths] # +2 for spaces around content + return '+' + '+'.join(parts) + '+' + + +def build_separator(widths: List[int], alignments: List[str]) -> str: + """Build separator line like +:===+:===:+====:+""" + parts = [] + for width, align in zip(widths, alignments): + if align == 'center': + parts.append(':' + '=' * width + ':') + elif align == 'left': + parts.append(':' + '=' * width) + elif align == 'right': + parts.append('=' * width + ':') + else: + parts.append('=' * width) + return '+' + '+'.join(parts) + '+' + + +def format_cell(content: str, width: int, alignment: str = 'left') -> str: + """ + Format cell content to fit within the specified width. + + Pads content to match width, accounting for display width. + """ + content = content.strip() + display_w = display_width(content) + padding_needed = width - display_w + + if alignment == 'center': + left_pad = padding_needed // 2 + right_pad = padding_needed - left_pad + return ' ' * left_pad + content + ' ' * right_pad + elif alignment == 'right': + return ' ' * padding_needed + content + else: # left + return content + ' ' * padding_needed + + +def format_row(cells: List[str], widths: List[int], alignments: List[str], bold_first: bool = False) -> str: + """Format a row with proper cell widths and optional bolding of first column.""" + formatted_cells = [] + for idx, (cell, width, align) in enumerate(zip(cells, widths, alignments)): + # Bold first column if requested + if idx == 0 and bold_first and not is_bolded(cell): + cell = bold_text(cell) + formatted = format_cell(cell, width, align) + formatted_cells.append(formatted) + + return '| ' + ' | '.join(formatted_cells) + ' |' + + +def format_table(table_data: dict) -> List[str]: + """ + Format a complete table with proper bolding and column widths. + + Returns formatted table as list of lines. + """ + # Parse header and rows + header_cells = parse_row(table_data['header']) + alignments = extract_alignment(table_data['separator']) + + # Bold all header cells + header_cells = [bold_text(cell) for cell in header_cells] + + # Parse and prepare data rows (exclude border lines) + data_rows = [] + for row in table_data['rows']: + if row.startswith('|'): + cells = parse_row(row) + # Bold first column only if it's not empty + if cells and cells[0].strip() and not is_bolded(cells[0]): + cells[0] = bold_text(cells[0]) + data_rows.append(cells) + + # Calculate column widths based on all content + all_cells = [header_cells] + data_rows + num_cols = len(header_cells) + widths = [0] * num_cols + + for row in all_cells: + for col_idx, cell in enumerate(row): + if col_idx < num_cols: + width = display_width(cell) + widths[col_idx] = max(widths[col_idx], width) + + # Build formatted table + formatted = [] + + # Top border + formatted.append(build_border(widths)) + + # Header row + formatted.append(format_row(header_cells, widths, alignments, bold_first=False)) + + # Separator + formatted.append(build_separator(widths, alignments)) + + # Data rows with borders + for i, row_cells in enumerate(data_rows): + formatted.append(format_row(row_cells, widths, alignments, bold_first=False)) + # Add row separator (border) after each data row except the last + if i < len(data_rows) - 1: + formatted.append(build_border(widths)) + + # Footer border + formatted.append(build_border(widths)) + + # Caption + if table_data.get('caption'): + formatted.append('') # Empty line before caption + formatted.append(table_data['caption']) + + return formatted + + +def check_table_format(table_data: dict) -> List[str]: + """ + Check if a table is properly formatted. + + Returns list of issues found (empty if table is correct). + """ + issues = [] + + # Parse header + header_cells = parse_row(table_data['header']) + + # Check if all headers are bolded + for idx, cell in enumerate(header_cells): + if not is_bolded(cell): + issues.append(f"Header column {idx + 1} is not bolded: '{cell}'") + + # Parse data rows and check first column (skip empty cells) + row_num = 1 + for row in table_data['rows']: + if row.startswith('|'): + cells = parse_row(row) + # Only check non-empty first column cells + if cells and cells[0].strip() and not is_bolded(cells[0]): + issues.append(f"First column in row {row_num} is not bolded: '{cells[0]}'") + row_num += 1 + + # Check column width consistency + alignments = extract_alignment(table_data['separator']) + header_cells_bolded = [bold_text(cell) for cell in header_cells] + + data_rows = [] + for row in table_data['rows']: + if row.startswith('|'): + cells = parse_row(row) + # Bold first column only if it's not empty + if cells and cells[0].strip() and not is_bolded(cells[0]): + cells[0] = bold_text(cells[0]) + data_rows.append(cells) + + # Calculate expected widths + all_cells = [header_cells_bolded] + data_rows + num_cols = len(header_cells) + expected_widths = [0] * num_cols + + for row in all_cells: + for col_idx, cell in enumerate(row): + if col_idx < num_cols: + width = display_width(cell) + expected_widths[col_idx] = max(expected_widths[col_idx], width) + + # Check if current borders match expected widths + expected_border = build_border(expected_widths) + if table_data['header_border'].strip() != expected_border.strip(): + issues.append(f"Border widths don't match content widths") + + return issues + + +def process_file(file_path: Path, fix: bool = False) -> Tuple[int, int]: + """ + Process a single file to check or fix table formatting. + + Returns (tables_checked, tables_with_issues) + """ + content = file_path.read_text(encoding='utf-8') + lines = content.split('\n') + + tables_checked = 0 + tables_with_issues = 0 + modified = False + + i = 0 + new_lines = [] + + while i < len(lines): + # Check if this might be a table start + if lines[i].startswith('+'): + # Try to parse table + table_lines = [] + j = i + while j < len(lines): + if lines[j].strip() == '' and table_lines and not lines[j - 1].startswith(':'): + break + if lines[j].startswith(':') and table_lines: + table_lines.append(lines[j]) + j += 1 + break + if lines[j].startswith('+') or lines[j].startswith('|'): + table_lines.append(lines[j]) + j += 1 + else: + break + + table_data = parse_table(table_lines) + + if table_data: + tables_checked += 1 + issues = check_table_format(table_data) + + if issues: + tables_with_issues += 1 + if not fix: + print(f" Issues found in table at line {i + 1}:") + for issue in issues: + print(f" - {issue}") + else: + # Format the table + formatted = format_table(table_data) + new_lines.extend(formatted) + modified = True + else: + # Table is already correct + new_lines.extend(table_lines) + + i = j + continue + + # Not a table, keep line as is + new_lines.append(lines[i]) + i += 1 + + if fix and modified: + # Write back to file + file_path.write_text('\n'.join(new_lines), encoding='utf-8') + print(f" Fixed {tables_with_issues} tables") + + return tables_checked, tables_with_issues + + +def find_qmd_files(base_path: Path) -> List[Path]: + """Find all .qmd files in contents/core directory.""" + core_path = base_path / "quarto" / "contents" / "core" + if not core_path.exists(): + print(f"Error: {core_path} does not exist") + return [] + + qmd_files = list(core_path.rglob("*.qmd")) + return sorted(qmd_files) + + +def main(): + parser = argparse.ArgumentParser( + description="Format markdown grid tables in MLSysBook", + formatter_class=argparse.RawDescriptionHelpFormatter, + epilog=""" +Examples: + # Check a single file + python format_tables.py --check quarto/contents/core/optimizations/optimizations.qmd + + # Fix a single file + python format_tables.py --fix quarto/contents/core/optimizations/optimizations.qmd + + # Check all files + python format_tables.py --check-all + + # Fix all files + python format_tables.py --fix-all + """ + ) + + group = parser.add_mutually_exclusive_group(required=True) + group.add_argument('--check', metavar='FILE', help='Check table formatting in a file') + group.add_argument('--fix', metavar='FILE', help='Fix table formatting in a file') + group.add_argument('--check-all', action='store_true', help='Check all .qmd files') + group.add_argument('--fix-all', action='store_true', help='Fix all .qmd files') + + args = parser.parse_args() + + # Determine workspace root (assume script is in tools/scripts/) + script_path = Path(__file__).resolve() + workspace_root = script_path.parent.parent.parent + + if args.check or args.fix: + # Process single file + file_path = Path(args.check or args.fix) + if not file_path.is_absolute(): + file_path = workspace_root / file_path + + if not file_path.exists(): + print(f"Error: File {file_path} does not exist") + return 1 + + fix_mode = bool(args.fix) + print(f"{'Fixing' if fix_mode else 'Checking'} {file_path.relative_to(workspace_root)}") + + tables_checked, tables_with_issues = process_file(file_path, fix=fix_mode) + + print(f" Found {tables_checked} tables, {tables_with_issues} with issues") + + if not fix_mode and tables_with_issues > 0: + return 1 + + else: + # Process all files + qmd_files = find_qmd_files(workspace_root) + + if not qmd_files: + print("No .qmd files found") + return 1 + + fix_mode = args.fix_all + print(f"{'Fixing' if fix_mode else 'Checking'} {len(qmd_files)} files...") + print() + + total_tables = 0 + total_issues = 0 + files_with_issues = [] + + for qmd_file in qmd_files: + tables_checked, tables_with_issues = process_file(qmd_file, fix=fix_mode) + + if tables_checked > 0: + rel_path = qmd_file.relative_to(workspace_root) + print(f"{rel_path}: {tables_checked} tables, {tables_with_issues} with issues") + + total_tables += tables_checked + total_issues += tables_with_issues + + if tables_with_issues > 0: + files_with_issues.append(rel_path) + + print() + print(f"Total: {total_tables} tables checked, {total_issues} with issues") + + if not fix_mode and total_issues > 0: + print() + print("Files with formatting issues:") + for file_path in files_with_issues: + print(f" - {file_path}") + return 1 + + return 0 + + +if __name__ == '__main__': + sys.exit(main()) diff --git a/tools/scripts/test_format_tables.py b/tools/scripts/test_format_tables.py new file mode 100644 index 000000000..eed31e0b3 --- /dev/null +++ b/tools/scripts/test_format_tables.py @@ -0,0 +1,343 @@ +#!/usr/bin/env python3 +""" +Test cases for table formatter. + +Tests various edge cases including: +- Standard tables with multiple rows +- Tables with empty cells +- Tables with multi-row cells +- Tables with Unicode characters +- Tables with already bolded content +""" + +import sys +from pathlib import Path +from format_tables import ( + display_width, + parse_table, + parse_row, + bold_text, + is_bolded, + calculate_column_widths, + extract_alignment, + build_border, + build_separator, + format_cell, + format_row, + format_table, + check_table_format +) + + +def test_display_width(): + """Test display width calculation.""" + print("Testing display_width...") + + # Basic ASCII + assert display_width("hello") == 5 + + # Bold markers should not count + assert display_width("**hello**") == 5 + + # Unicode characters + assert display_width("↑↑ High") == 7 + + # Mixed + assert display_width("**↑↑ High**") == 7 + + print(" ✓ display_width tests passed") + + +def test_bold_text(): + """Test bolding text.""" + print("Testing bold_text...") + + # Basic text + assert bold_text("hello") == "**hello**" + + # Already bolded + assert bold_text("**hello**") == "**hello**" + + # Empty text + assert bold_text("") == "" + assert bold_text(" ") == "" + + # Text with spaces + assert bold_text(" hello ") == "**hello**" + + print(" ✓ bold_text tests passed") + + +def test_is_bolded(): + """Test checking if text is bolded.""" + print("Testing is_bolded...") + + assert is_bolded("**hello**") == True + assert is_bolded("hello") == False + assert is_bolded("**hello") == False + assert is_bolded("hello**") == False + assert is_bolded("") == False + + print(" ✓ is_bolded tests passed") + + +def test_parse_row(): + """Test parsing table rows.""" + print("Testing parse_row...") + + row = "| Header 1 | Header 2 | Header 3 |" + cells = parse_row(row) + assert cells == ["Header 1", "Header 2", "Header 3"] + + # Empty cells + row = "| Value 1 | | Value 3 |" + cells = parse_row(row) + assert cells == ["Value 1", "", "Value 3"] + + print(" ✓ parse_row tests passed") + + +def test_extract_alignment(): + """Test extracting alignment from separator.""" + print("Testing extract_alignment...") + + # Left aligned + sep = "+:===+:===+:===+" + alignments = extract_alignment(sep) + assert alignments == ["left", "left", "left"] + + # Center aligned + sep = "+:===:+:===:+:===:+" + alignments = extract_alignment(sep) + assert alignments == ["center", "center", "center"] + + # Mixed + sep = "+:===+:===:+===:+" + alignments = extract_alignment(sep) + assert alignments == ["left", "center", "right"] + + print(" ✓ extract_alignment tests passed") + + +def test_build_border(): + """Test building border lines.""" + print("Testing build_border...") + + widths = [10, 15, 20] + border = build_border(widths) + expected = "+------------+-----------------+----------------------+" + assert border == expected + + print(" ✓ build_border tests passed") + + +def test_build_separator(): + """Test building separator lines.""" + print("Testing build_separator...") + + widths = [10, 15, 20] + alignments = ["left", "center", "right"] + sep = build_separator(widths, alignments) + expected = "+:==========+:===============:+====================:+" + assert sep == expected + + print(" ✓ build_separator tests passed") + + +def test_format_cell(): + """Test formatting cell content.""" + print("Testing format_cell...") + + # Left aligned + cell = format_cell("Hello", 10, "left") + assert cell == "Hello " + assert len(cell) == 10 + + # Center aligned + cell = format_cell("Hi", 10, "center") + assert cell == " Hi " + assert len(cell) == 10 + + # Right aligned + cell = format_cell("Bye", 10, "right") + assert cell == " Bye" + assert len(cell) == 10 + + # With Unicode + cell = format_cell("↑ High", 10, "left") + assert len(cell) == 10 + + print(" ✓ format_cell tests passed") + + +def test_simple_table(): + """Test formatting a simple table.""" + print("Testing simple table formatting...") + + table_lines = [ + "+----------+----------+", + "| Header 1 | Header 2 |", + "+:=========+:=========+", + "| Value 1 | Value 2 |", + "+----------+----------+", + "| Value 3 | Value 4 |", + "+----------+----------+", + "", + ": Test table caption {#tbl-test}" + ] + + table_data = parse_table(table_lines) + assert table_data is not None + + # Check for issues (should find unbolded headers) + issues = check_table_format(table_data) + assert len(issues) > 0 + + # Format the table + formatted = format_table(table_data) + + # Check that headers are bolded + assert "**Header 1**" in formatted[1] + assert "**Header 2**" in formatted[1] + + # Check that first column is bolded + assert "**Value 1**" in formatted[3] + assert "**Value 3**" in formatted[5] + + print(" ✓ Simple table formatting passed") + + +def test_table_with_empty_cells(): + """Test table with empty cells in first column.""" + print("Testing table with empty cells...") + + table_lines = [ + "+-----------+---------+", + "| Technique | Goal |", + "+:==========+:=======:+", + "| Pruning | Reduce |", + "+-----------+---------+", + "| | Size |", + "+-----------+---------+" + ] + + table_data = parse_table(table_lines) + assert table_data is not None + + # Format the table + formatted = format_table(table_data) + + # Check that headers are bolded + assert "**Technique**" in formatted[1] + + # Check that first column with content is bolded + assert "**Pruning**" in formatted[3] + + # Empty cell should remain empty (no bold markers) + # Should not have "****" for empty cells + assert "****" not in formatted[5] + + print(" ✓ Table with empty cells passed") + + +def test_table_with_unicode(): + """Test table with Unicode characters.""" + print("Testing table with Unicode characters...") + + table_lines = [ + "+----------+----------+", + "| Type | Status |", + "+:=========+:========:+", + "| Memory | ↑↑ High |", + "+----------+----------+", + "| Speed | → Neutral|", + "+----------+----------+" + ] + + table_data = parse_table(table_lines) + assert table_data is not None + + formatted = format_table(table_data) + + # Check formatting preserved Unicode + assert "↑↑ High" in " ".join(formatted) + assert "→ Neutral" in " ".join(formatted) + + print(" ✓ Table with Unicode passed") + + +def test_already_formatted_table(): + """Test table that's already properly formatted.""" + print("Testing already formatted table...") + + table_lines = [ + "+--------------+--------------+", + "| **Header 1** | **Header 2** |", + "+:============:+:============:+", + "| **Row 1** | Value |", + "+--------------+--------------+", + "| **Row 2** | Value |", + "+--------------+--------------+" + ] + + table_data = parse_table(table_lines) + assert table_data is not None + + # Should have no issues + issues = check_table_format(table_data) + # Note: May still have border width issues to check + + formatted = format_table(table_data) + + # Headers should stay bolded (not double-bolded) + assert "**Header 1**" in formatted[1] + assert "****Header 1****" not in formatted[1] + + print(" ✓ Already formatted table passed") + + +def run_all_tests(): + """Run all test cases.""" + print("=" * 60) + print("Running Table Formatter Tests") + print("=" * 60) + print() + + try: + test_display_width() + test_bold_text() + test_is_bolded() + test_parse_row() + test_extract_alignment() + test_build_border() + test_build_separator() + test_format_cell() + test_simple_table() + test_table_with_empty_cells() + test_table_with_unicode() + test_already_formatted_table() + + print() + print("=" * 60) + print("All tests passed! ✅") + print("=" * 60) + return 0 + + except AssertionError as e: + print() + print("=" * 60) + print(f"Test failed: {e}") + print("=" * 60) + return 1 + except Exception as e: + print() + print("=" * 60) + print(f"Error running tests: {e}") + import traceback + traceback.print_exc() + print("=" * 60) + return 1 + + +if __name__ == '__main__': + sys.exit(run_all_tests())