Merge pull request #579 from harvard-edge/dev

Preparing major v0.3.0 release
This commit is contained in:
Vijay Janapa Reddi
2025-01-02 20:51:36 -05:00
committed by GitHub
9 changed files with 5068 additions and 458 deletions

View File

@@ -41,13 +41,6 @@
"profile": "https://github.com/Mjrovai",
"contributions": []
},
{
"login": "Sara-Khosravi",
"name": "Sara Khosravi",
"avatar_url": "https://avatars.githubusercontent.com/Sara-Khosravi",
"profile": "https://github.com/Sara-Khosravi",
"contributions": []
},
{
"login": "kai4avaya",
"name": "Kai Kleinbard",
@@ -55,6 +48,13 @@
"profile": "https://github.com/kai4avaya",
"contributions": []
},
{
"login": "Sara-Khosravi",
"name": "Sara Khosravi",
"avatar_url": "https://avatars.githubusercontent.com/Sara-Khosravi",
"profile": "https://github.com/Sara-Khosravi",
"contributions": []
},
{
"login": "V0XNIHILI",
"name": "Douwe den Blanken",
@@ -69,13 +69,6 @@
"profile": "https://github.com/shanzehbatool",
"contributions": []
},
{
"login": "mpstewart1",
"name": "Matthew Stewart",
"avatar_url": "https://avatars.githubusercontent.com/mpstewart1",
"profile": "https://github.com/mpstewart1",
"contributions": []
},
{
"login": "eliasab16",
"name": "Elias",
@@ -90,6 +83,13 @@
"profile": "https://github.com/JaredP94",
"contributions": []
},
{
"login": "mpstewart1",
"name": "Matthew Stewart",
"avatar_url": "https://avatars.githubusercontent.com/mpstewart1",
"profile": "https://github.com/mpstewart1",
"contributions": []
},
{
"login": "ishapira1",
"name": "Itai Shapira",

View File

@@ -125,15 +125,15 @@ This project follows the [all-contributors](https://allcontributors.org) specifi
<td align="center" valign="top" width="20%"><a href="https://github.com/Mjrovai"><img src="https://avatars.githubusercontent.com/Mjrovai?s=100" width="100px;" alt="Marcelo Rovai"/><br /><sub><b>Marcelo Rovai</b></sub></a><br /></td>
</tr>
<tr>
<td align="center" valign="top" width="20%"><a href="https://github.com/Sara-Khosravi"><img src="https://avatars.githubusercontent.com/Sara-Khosravi?s=100" width="100px;" alt="Sara Khosravi"/><br /><sub><b>Sara Khosravi</b></sub></a><br /></td>
<td align="center" valign="top" width="20%"><a href="https://github.com/kai4avaya"><img src="https://avatars.githubusercontent.com/kai4avaya?s=100" width="100px;" alt="Kai Kleinbard"/><br /><sub><b>Kai Kleinbard</b></sub></a><br /></td>
<td align="center" valign="top" width="20%"><a href="https://github.com/Sara-Khosravi"><img src="https://avatars.githubusercontent.com/Sara-Khosravi?s=100" width="100px;" alt="Sara Khosravi"/><br /><sub><b>Sara Khosravi</b></sub></a><br /></td>
<td align="center" valign="top" width="20%"><a href="https://github.com/V0XNIHILI"><img src="https://avatars.githubusercontent.com/V0XNIHILI?s=100" width="100px;" alt="Douwe den Blanken"/><br /><sub><b>Douwe den Blanken</b></sub></a><br /></td>
<td align="center" valign="top" width="20%"><a href="https://github.com/shanzehbatool"><img src="https://avatars.githubusercontent.com/shanzehbatool?s=100" width="100px;" alt="shanzehbatool"/><br /><sub><b>shanzehbatool</b></sub></a><br /></td>
<td align="center" valign="top" width="20%"><a href="https://github.com/mpstewart1"><img src="https://avatars.githubusercontent.com/mpstewart1?s=100" width="100px;" alt="Matthew Stewart"/><br /><sub><b>Matthew Stewart</b></sub></a><br /></td>
<td align="center" valign="top" width="20%"><a href="https://github.com/eliasab16"><img src="https://avatars.githubusercontent.com/eliasab16?s=100" width="100px;" alt="Elias"/><br /><sub><b>Elias</b></sub></a><br /></td>
</tr>
<tr>
<td align="center" valign="top" width="20%"><a href="https://github.com/eliasab16"><img src="https://avatars.githubusercontent.com/eliasab16?s=100" width="100px;" alt="Elias"/><br /><sub><b>Elias</b></sub></a><br /></td>
<td align="center" valign="top" width="20%"><a href="https://github.com/JaredP94"><img src="https://avatars.githubusercontent.com/JaredP94?s=100" width="100px;" alt="Jared Ping"/><br /><sub><b>Jared Ping</b></sub></a><br /></td>
<td align="center" valign="top" width="20%"><a href="https://github.com/mpstewart1"><img src="https://avatars.githubusercontent.com/mpstewart1?s=100" width="100px;" alt="Matthew Stewart"/><br /><sub><b>Matthew Stewart</b></sub></a><br /></td>
<td align="center" valign="top" width="20%"><a href="https://github.com/ishapira1"><img src="https://avatars.githubusercontent.com/ishapira1?s=100" width="100px;" alt="Itai Shapira"/><br /><sub><b>Itai Shapira</b></sub></a><br /></td>
<td align="center" valign="top" width="20%"><a href="https://github.com/harvard-edge/cs249r_book/graphs/contributors"><img src="https://www.gravatar.com/avatar/8863743b4f26c1a20e730fcf7ebc3bc0?d=identicon&s=100?s=100" width="100px;" alt="Maximilian Lam"/><br /><sub><b>Maximilian Lam</b></sub></a><br /></td>
<td align="center" valign="top" width="20%"><a href="https://github.com/jaysonzlin"><img src="https://avatars.githubusercontent.com/jaysonzlin?s=100" width="100px;" alt="Jayson Lin"/><br /><sub><b>Jayson Lin</b></sub></a><br /></td>

View File

@@ -244,7 +244,7 @@ format:
reference-location: margin
citation-location: margin
sidenote: true Enable sidenotes for Tufte style
sidenote: true #Enable sidenotes for Tufte style
linkcolor: "#A51C30"
urlcolor: "#A51C30"
highlight-style: github

View File

@@ -44,7 +44,7 @@ In addition to distributed training, we discussed techniques for optimizing the
Deploying trained ML models is more complex than simply running the networks; efficiency is critical (@sec-efficient_ai). In this chapter on AI efficiency, we emphasized that efficiency is not merely a luxury but a necessity in artificial intelligence systems. We dug into the key concepts underpinning AI systems' efficiency, recognizing that the computational demands on neural networks can be daunting, even for minimal systems. For AI to be seamlessly integrated into everyday devices and essential systems, it must perform optimally within the constraints of limited resources while maintaining its efficacy.
Throughout the book, we have highlighted the importance of pursuing efficiency to ensure that AI models are streamlined, rapid, and sustainable. By optimizing models for efficiency, we can widen their applicability across various platforms and scenarios, enabling AI to be deployed in resource-constrained environments such as embedded systems and edge devices. This pursuit of efficiency is crucial for the widespread adoption and practical implementation of AI technologies in real-world applications.
Throughout the book, we have highlighted the importance of pursuing efficiency to ensure that AI models are streamlined, rapid, and sustainable. By optimizing models for efficiency, we can widen their applicability across various platforms and scenarios, enabling AI to be deployed in resource-constrained environments such as embedded systems and edge devices. This pursuit of efficiency is necessary for the widespread adoption and practical implementation of AI technologies in real-world applications.
## Optimizing ML Model Architectures
@@ -90,7 +90,7 @@ In addition to security, we addressed the critical issue of data privacy. Techni
## Upholding Ethical Considerations
As we embrace ML advancements in all facets of our lives, it is crucial to remain mindful of the ethical considerations that will shape the future of AI (@sec-responsible_ai). Fairness, transparency, accountability, and privacy in AI systems will be paramount as they become more integrated into our lives and decision-making processes.
As we embrace ML advancements in all facets of our lives, it is essential to remain mindful of the ethical considerations that will shape the future of AI (@sec-responsible_ai). Fairness, transparency, accountability, and privacy in AI systems will be paramount as they become more integrated into our lives and decision-making processes.
As AI systems become more pervasive and influential, it is important to ensure that they are designed and deployed in a manner that upholds ethical principles. This means actively mitigating biases, promoting fairness, and preventing discriminatory outcomes. Additionally, ethical AI design ensures transparency in how AI systems make decisions, enabling users to understand and trust their outputs.
@@ -98,11 +98,11 @@ Accountability is another critical ethical consideration. As AI systems take on
Ethical frameworks, regulations, and standards will be essential to address these ethical challenges. These frameworks should guide the responsible development and deployment of AI technologies, ensuring that they align with societal values and promote the well-being of individuals and communities.
Moreover, ongoing discussions and collaborations among researchers, practitioners, policymakers, and society will be crucial in navigating the ethical landscape of AI. These conversations should be inclusive and diverse, bringing together different perspectives and expertise to develop comprehensive and equitable solutions. As we move forward, it is the collective responsibility of all stakeholders to prioritize ethical considerations in the development and deployment of AI systems.
Moreover, ongoing discussions and collaborations among researchers, practitioners, policymakers, and society will be important in navigating the ethical landscape of AI. These conversations should be inclusive and diverse, bringing together different perspectives and expertise to develop comprehensive and equitable solutions. As we move forward, it is the collective responsibility of all stakeholders to prioritize ethical considerations in the development and deployment of AI systems.
## Promoting Sustainability
The increasing computational demands of machine learning, particularly for training large models, have raised concerns about their environmental impact due to high energy consumption and carbon emissions (@sec-sustainable_ai). As the scale and complexity of models continue to grow, addressing the sustainability challenges associated with AI development becomes imperative. To mitigate the environmental footprint of AI, the development of energy-efficient algorithms is crucial. This involves optimizing models and training procedures to minimize computational requirements while maintaining performance. Techniques such as model compression, quantization, and efficient neural architecture search can help reduce the energy consumption of AI systems.
The increasing computational demands of machine learning, particularly for training large models, have raised concerns about their environmental impact due to high energy consumption and carbon emissions (@sec-sustainable_ai). As the scale and complexity of models continue to grow, addressing the sustainability challenges associated with AI development becomes imperative. To mitigate the environmental footprint of AI, the development of energy-efficient algorithms is necessary. This involves optimizing models and training procedures to minimize computational requirements while maintaining performance. Techniques such as model compression, quantization, and efficient neural architecture search can help reduce the energy consumption of AI systems.
Using renewable energy sources to power AI infrastructure is another important step towards sustainability. By transitioning to clean energy sources such as solar, wind, and hydropower, the carbon emissions associated with AI development can be significantly reduced. This requires a concerted effort from the AI community and support from policymakers and industry leaders to invest in and adopt renewable energy solutions. In addition, exploring alternative computing paradigms, such as neuromorphic and photonic computing, holds promise for developing more energy-efficient AI systems. By developing hardware and algorithms that emulate the brain's processing mechanisms, we can potentially create AI systems that are both powerful and sustainable.
@@ -124,17 +124,17 @@ As we look to the future, the trajectory of ML systems points towards a paradigm
We anticipate a growing emphasis on data curation, labeling, and augmentation techniques in the coming years. These practices aim to ensure that models are trained on high-quality, representative data that accurately reflects the complexities and nuances of real-world scenarios. By focusing on data quality and diversity, we can mitigate the risks of biased or skewed models that may perpetuate unfair or discriminatory outcomes.
This data-centric approach will be crucial in addressing the challenges of bias, fairness, and generalizability in ML systems. By actively seeking out and incorporating diverse and inclusive datasets, we can develop more robust, equitable, and applicable models for various contexts and populations. Moreover, the emphasis on data will drive advancements in techniques such as data augmentation, where existing datasets are expanded and diversified through data synthesis, translation, and generation. These techniques can help overcome the limitations of small or imbalanced datasets, enabling the development of more accurate and generalizable models.
This data-centric approach will be vital in addressing the challenges of bias, fairness, and generalizability in ML systems. By actively seeking out and incorporating diverse and inclusive datasets, we can develop more robust, equitable, and applicable models for various contexts and populations. Moreover, the emphasis on data will drive advancements in techniques such as data augmentation, where existing datasets are expanded and diversified through data synthesis, translation, and generation. These techniques can help overcome the limitations of small or imbalanced datasets, enabling the development of more accurate and generalizable models.
In recent years, generative AI has taken the field by storm, demonstrating remarkable capabilities in creating realistic images, videos, and text. However, the rise of generative AI also brings new challenges for ML systems (@sec-generative_ai). Unlike traditional ML systems, generative models often demand more computational resources and pose challenges in terms of scalability and efficiency. Furthermore, evaluating and benchmarking generative models presents difficulties, as traditional metrics used for classification tasks may not be directly applicable. Developing robust evaluation frameworks for generative models is an active area of research.
In recent years, generative AI has taken the field by storm, demonstrating remarkable capabilities in creating realistic images, videos, and text. However, the rise of generative AI also brings new challenges for ML sysatem. Unlike traditional ML systems, generative models often demand more computational resources and pose challenges in terms of scalability and efficiency. Furthermore, evaluating and benchmarking generative models presents difficulties, as traditional metrics used for classification tasks may not be directly applicable. Developing robust evaluation frameworks for generative models is an active area of research, and something we hope to write about soon!
Understanding and addressing these system challenges and ethical considerations will be crucial in shaping the future of generative AI and its impact on society. As ML practitioners and researchers, we are responsible for advancing the technical capabilities of generative models and developing robust systems and frameworks that can mitigate potential risks and ensure the beneficial application of this powerful technology.
Understanding and addressing these system challenges and ethical considerations will be important in shaping the future of generative AI and its impact on society. As ML practitioners and researchers, we are responsible for advancing the technical capabilities of generative models and developing robust systems and frameworks that can mitigate potential risks and ensure the beneficial application of this powerful technology.
## Applying AI for Good
The potential for AI to be used for social good is vast, provided that responsible ML systems are developed and deployed at scale across various use cases (@sec-ai_for_good). To realize this potential, it is essential for researchers and practitioners to actively engage in the process of learning, experimentation, and pushing the boundaries of what is possible.
Throughout the development of ML systems, it is crucial to remember the key themes and lessons explored in this book. These include the importance of data quality and diversity, the pursuit of efficiency and robustness, the potential of TinyML and neuromorphic computing, and the imperative of security and privacy. These insights inform the work and guide the decisions of those involved in developing AI systems.
Throughout the development of ML systems, it is important to remember the key themes and lessons explored in this book. These include the importance of data quality and diversity, the pursuit of efficiency and robustness, the potential of TinyML and neuromorphic computing, and the imperative of security and privacy. These insights inform the work and guide the decisions of those involved in developing AI systems.
It is important to recognize that the development of AI is not solely a technical endeavor but also a deeply human one. It requires collaboration, empathy, and a commitment to understanding the societal implications of the systems being created. Engaging with experts from diverse fields, such as ethics, social sciences, and policy, is essential to ensure that the AI systems developed are technically sound, socially responsible, and beneficial. Embracing the opportunity to be part of this transformative field and shaping its future is a privilege and a responsibility. By working together, we can create a world where ML systems serve as tools for positive change and improving the human condition.

View File

@@ -38,7 +38,7 @@ Deep learning architecture stands for specific representation or organizations o
Neural network architectures have evolved to address specific pattern processing challenges. Whether processing arbitrary feature relationships, exploiting spatial patterns, managing temporal dependencies, or handling dynamic information flow, each architectural pattern emerged from particular computational needs. These architectures, from a computer systems perspective, require an examination of how their computational patterns map to system resources.
Most often the architectures are discussed in terms of their algorithmic structures (MLPs, CNNs, RNNs, Transformers). However, in this chapter we take a more fundamental approach by examining how their computational patterns map to hardware resources. Each section analyzes how specific pattern processing needs influence algorithmic structure and how these structures map to computer system resources. The implications for computer system design require examining how their computational patterns map to hardware resources. The mapping from algorithmic requirements to computer system design involves several key considerations:
Most often the architectures are discussed in terms of their algorithmic structures (MLPs, CNNs, RNNs, Transformers). However, in this chapter we take a more fundamental approach by examining how their computational patterns map to hardware resources. Each section analyzes how specific Pattern Processing Needss influence algorithmic structure and how these structures map to computer system resources. The implications for computer system design require examining how their computational patterns map to hardware resources. The mapping from algorithmic requirements to computer system design involves several key considerations:
1. Memory access patterns: How data moves through the memory hierarchy
2. Computation characteristics: The nature and organization of arithmetic operations
@@ -53,7 +53,7 @@ Multi-Layer Perceptrons (MLPs) represent the most direct extension of neural net
When applied to the MNIST handwritten digit recognition challenge, an MLP reveals its computational power by transforming a complex 28×28 pixel image into a precise digit classification. By treating each of the 784 pixels as an equally weighted input, the network learns to decompose visual information through a systematic progression of layers, converting raw pixel intensities into increasingly abstract representations that capture the essential characteristics of handwritten digits.
### Pattern Processing Need
### Pattern Processing Needs
Deep learning systems frequently encounter problems where any input feature could potentially influence any output---there are no inherent constraints on these relationships. Consider analyzing financial market data: any economic indicator might affect any market outcome or in natural language processing, where the meaning of a word could depend on any other word in the sentence. These scenarios demand an architectural pattern capable of learning arbitrary relationships across all input features.
@@ -61,15 +61,6 @@ Dense pattern processing addresses this fundamental need by enabling several key
For example, in the MNIST digit recognition task, while humans might focus on specific parts of digits (like loops in '6' or crossings in '8'), we cannot definitively say which pixel combinations are important for classification. A '7' written with a serif could share pixel patterns with a '2', while variations in handwriting mean discriminative features might appear anywhere in the image. This uncertainty about feature relationships necessitates a dense processing approach where every pixel can potentially influence the classification decision.
<!-- The need for processing arbitrary relationships, however, comes with significant computational implications. When every output potentially depends on every input, the system must:
* Access all input values for each computation
* Store weights for all possible connections
* Compute across all these connections
* Move data between all elements of the network
These requirements directly influence how we structure both algorithms and computer systems to handle dense pattern processing efficiently. -->
### Algorithmic Structure
To enable unrestricted feature interactions, MLPs implement a direct algorithmic solution: connect everything to everything. This is realized through a series of fully-connected layers, where each neuron connects to every neuron in adjacent layers. The dense connectivity pattern translates mathematically into matrix multiplication operations. As shown in @fig-mlp, each layer transforms its input through matrix multiplication followed by element-wise activation:
@@ -133,15 +124,6 @@ This translation from mathematical abstraction to concrete computation exposes h
In the MNIST example, each output neuron requires 784 multiply-accumulate operations and at least 1,568 memory accesses (784 for inputs, 784 for weights). While actual implementations use sophisticated optimizations through libraries like [BLAS](https://www.netlib.org/blas/) or [cuBLAS](https://developer.nvidia.com/cublas), these fundamental patterns drive key system design decisions.
<!-- The computational mapping reveals several critical patterns that influence system design:
1. Each output depends on every input, creating an all-to-all communication pattern
2. Memory access is extensive and regular, with complete rows and columns being accessed
3. The basic operation (multiply-accumulate) repeats many times with different data
4. Computation can be parallelized across batches and output neurons
These patterns create both challenges in implementation and opportunities for optimization, which we'll examine in the next [section](#system-implications) -->
### System Implications
When analyzing how computational patterns impact computer systems, we typically examine three fundamental dimensions: memory requirements, computation needs, and data movement. This framework enables a systematic analysis of how algorithmic patterns influence system design decisions. We will use this framework for analyzing other network architectures, allowing us to compare and contrast their different characteristics.
@@ -185,7 +167,7 @@ We've just scratched the surface of neural networks. Now, you'll get to try and
While MLPs treat each input element independently, many real-world data types exhibit strong spatial relationships. Images, for example, derive their meaning from the spatial arrangement of pixels—a pattern of edges and textures that form recognizable objects. Audio signals show temporal patterns of frequency components, and sensor data often contains spatial or temporal correlations. These spatial relationships suggest that treating every input-output connection with equal importance, as MLPs do, might not be the most effective approach.
### Pattern Processing Need
### Pattern Processing Needs
Spatial pattern processing addresses scenarios where the relationship between data points depends on their relative positions or proximity. Consider processing a natural image: a pixel's relationship with its neighbors is important for detecting edges, textures, and shapes. These local patterns then combine hierarchically to form more complex features—edges form shapes, shapes form objects, and objects form scenes.
@@ -195,8 +177,6 @@ Taking image processing as an example, if we want to detect a cat in an image, c
This leads us to the convolutional neural network architecture (CNN), introduced by @lecun1989backpropagation. CNNs address spatial pattern processing through a fundamentally different connection pattern than MLPs. Instead of connecting every input to every output, CNNs use a local connection pattern where each output connects only to a small, spatially contiguous region of the input. This local receptive field moves across the input space, applying the same set of weights at each position—a process known as convolution.
<!-- These requirements create specific demands on our processing architecture. The system needs to support local connectivity to detect spatial patterns while enabling parameter sharing to recognize patterns independent of position. It must facilitate hierarchical processing to combine simple patterns into complex features, and efficiently handle shifting patterns across the input space. Unlike the dense connectivity of MLPs, spatial pattern processing suggests an architecture that explicitly encodes these spatial relationships while maintaining computational efficiency. This leads us to the convolutional neural network architecture, which we'll examine next. -->
### Algorithmic Structure
The core operation in a CNN can be expressed mathematically as:
@@ -309,7 +289,7 @@ The predictable spatial access pattern enables strategic data movement optimizat
While MLPs handle arbitrary relationships and CNNs process spatial patterns, many real-world problems involve sequential data where the order and relationship between elements over time matters. Text processing requires understanding how words relate to previous context, speech recognition needs to track how sounds form coherent patterns, and time-series analysis must capture how values evolve over time. These sequential relationships suggest that treating each time step independently misses crucial temporal patterns.
### Pattern Processing Need
### Pattern Processing Needs
Sequential pattern processing addresses scenarios where the meaning of current input depends on what came before it. Consider natural language processing: the meaning of a word often depends heavily on previous words in the sentence. The word "bank" means something different in "river bank" versus "bank account." Similarly, in speech recognition, a phoneme's interpretation often depends on surrounding sounds, and in financial forecasting, future predictions require understanding patterns in historical data.
@@ -407,17 +387,11 @@ For our example with a 128-dimensional hidden state, each time step must: load t
Different architectures handle this sequential data movement through specialized mechanisms. CPUs maintain weight matrices in cache while streaming through sequence elements and managing hidden state updates. GPUs employ memory architectures optimized for maintaining state information across sequential operations while processing multiple sequences in parallel. Deep learning frameworks orchestrate these movements by managing data transfers between time steps and optimizing batch operations.
<!-- ### Summary and Next Steps
The analysis of RNNs demonstrates how sequential pattern processing creates fundamentally different computational patterns from both the dense connectivity of MLPs and the spatial operations of CNNs. While MLPs process all inputs simultaneously and CNNs reuse weights across spatial positions, RNNs must handle temporal dependencies that create inherent sequential processing requirements. This sequential nature manifests in distinct system demands: memory systems must manage both weight reuse across time steps and hidden state updates, computation must balance sequential dependencies with parallel execution, and data movement centers around maintaining and updating state information efficiently.
These characteristics illustrate why different optimization strategies have evolved for RNN processing, and why certain applications began shifting toward alternative architectures like attention mechanisms, which we'll examine next. As we explore these newer architectural patterns, we'll see how they address some of the fundamental challenges of sequential processing while creating their own unique demands on computer systems. -->
## Attention Mechanisms: Dynamic Pattern Processing
While previous architectures process patterns in fixed ways—MLPs with dense connectivity, CNNs with spatial operations, and RNNs with sequential updates—many tasks require dynamic relationships between elements that change based on content. Language understanding, for instance, needs to capture relationships between words that depend on meaning rather than just position. Graph analysis requires understanding connections that vary by node. These dynamic relationships suggest we need an architecture that can learn and adapt its processing patterns based on the data itself.
### Pattern Processing Need
### Pattern Processing Needs
Dynamic pattern processing addresses scenarios where relationships between elements aren't fixed by architecture but instead emerge from content. Consider language translation: when translating "the bank by the river," understanding "bank" requires attending to "river," but in "the bank approved the loan," the important relationship is with "approved" and "loan." Unlike RNNs that process information sequentially or CNNs that use fixed spatial patterns, we need an architecture that can dynamically determine which relationships matter.
@@ -567,16 +541,6 @@ Finally, self-attention generates memory-intensive intermediate results. The att
These computational patterns create a unique profile for Transformer self-attention, distinct from previous architectures. The parallel nature of the computations makes Transformers well-suited for modern parallel processing hardware, but the quadratic complexity with sequence length poses challenges for processing long sequences. As a result, much research has focused on developing optimization techniques, such as sparse attention patterns or low-rank approximations, to address these challenges. Each of these optimizations presents its own trade-offs between computational efficiency and model expressiveness, a balance that must be carefully considered in practical applications.
<!-- ### Summary
Attention mechanisms and Transformers have ushered in a paradigm shift in neural network information processing. Unlike the fixed patterns of MLPs, CNNs, or RNNs, these architectures introduce dynamic, content-dependent computation, bringing both unprecedented capabilities and unique system challenges. The basic attention mechanism laid the groundwork for content-based weighting of information, allowing models to dynamically focus on relevant parts of the input. Transformers then extended this concept with self-attention, enabling each element in a sequence to interact with every other element, capturing complex dependencies regardless of positional distance.
This dynamic pattern processing manifests in distinctive system demands. Memory systems must contend with the quadratic scaling of attention weights with sequence length, a challenge that becomes particularly acute for longer sequences. Computation needs center around intensive matrix multiplications for query-key interactions and value combining, operations that benefit from parallelization but scale quadratically with sequence length. Data movement patterns revolve around the frequent access and update of dynamically generated weights and intermediate results, creating unique bandwidth requirements.
These characteristics elucidate both the strengths and challenges of Transformer architectures. Their ability to capture dynamic, long-range relationships has enabled breakthrough performance across a wide range of tasks, from natural language processing to computer vision and beyond. However, their computational intensity necessitates specialized hardware and optimized implementations to manage their resource demands effectively. This trade-off between expressive power and memory efficiency is a key consideration when choosing architectures for different tasks. While Transformers excel at capturing complex dependencies, their memory demands necessitate careful system design and optimization, especially for resource-constrained environments.
The advent of attention mechanisms and Transformers has opened new frontiers in machine learning, challenging our previous notions of architectural design and efficiency. As we continue to push the boundaries of what's possible with these models, a deep understanding of their computational patterns and system implications will be important in guiding future innovations in both model architecture and hardware design. -->
## Architectural Building Blocks
Deep learning architectures, while we presented them as distinct approaches in the previous sections, are better understood as compositions of fundamental building blocks that evolved over time. Much like how complex LEGO structures are built from basic bricks, modern neural networks combine and iterate on core computational patterns that emerged through decades of research [@lecun2015deep]. Each architectural innovation introduced new building blocks while finding novel ways to use existing ones.
@@ -679,22 +643,6 @@ Dynamic computation, where the operation itself depends on the input data, emerg
These primitives combine in sophisticated ways in modern architectures. A Transformer layer processing a sequence of 512 tokens demonstrates this clearly: it uses matrix multiplications for feature projections (512×512 operations implemented through tensor cores), may employ sliding windows for efficient attention over long sequences (using specialized memory access patterns for local regions), and requires dynamic computation for attention weights (computing 512×512 attention patterns at runtime). The way these primitives interact creates specific demands on system design---from memory hierarchy organization to computation scheduling.
Different neural network architectures leverage these core computational primitives in varying ways, as illustrated in @tbl-nn-arch-primitives:
+----------------+-----------------------+----------------------------+-----------------------------+--------------------------+
| Primitive Type | MLP | CNN | RNN | Transformer |
+:===============+:======================+:===========================+:============================+:=========================+
| Computational | Matrix Multiplication | Convolution (Matrix Mult.) | Matrix Mult. + State Update | Matrix Mult. + Attention |
+----------------+-----------------------+----------------------------+-----------------------------+--------------------------+
| Memory Access | Sequential | Strided | Sequential + Random | Random (Attention) |
+----------------+-----------------------+----------------------------+-----------------------------+--------------------------+
| Data Movement | Broadcast | Sliding Window | Sequential | Broadcast + Gather |
+----------------+-----------------------+----------------------------+-----------------------------+--------------------------+
: Utilization of primitives across neural network architectures. {#tbl-nn-arch-primitives .hover .striped}
@tbl-nn-arch-primitives highlights how the fundamental operations we've discussed manifest in different architectures, showcasing both the commonalities and differences in their computational needs. For instance, while all architectures rely on matrix multiplication as a core computational primitive, they differ significantly in their memory access and data movement patterns.
The building blocks we've discussed help explain why certain hardware features exist (like tensor cores for matrix multiplication) and why software frameworks organize computations in particular ways (like batching similar operations together). As we move from computational primitives to consider memory access and data movement patterns, it's important to recognize how these fundamental operations shape the demands placed on memory systems and data transfer mechanisms. The way computational primitives are implemented and combined has direct implications for how data needs to be stored, accessed, and moved within the system.
### Memory Access Primitives

View File

@@ -11,12 +11,3 @@ Imagine a chapter that writes itself and adapts to your curiosity, generating ne
This chapter will transform how you read and learn, dynamically generating content as you go. While we fine-tune this exciting new feature, we hope users get ready for an educational experience that's as dynamic and unique as you are. Mark your calendars for the big reveal and bookmark this page.
_The future of **generative learning** is here! — Vijay Janapa Reddi_
<!--
::: {.callout-tip}
## Learning Objectives
* *Coming soon.*
:::
-->

View File

@@ -117,15 +117,15 @@ A comprehensive list of all GitHub contributors, automatically updated with each
<td align="center" valign="top" width="20%"><a href="https://github.com/Mjrovai"><img src="https://avatars.githubusercontent.com/Mjrovai?s=100" width="100px;" alt="Marcelo Rovai"/><br /><sub><b>Marcelo Rovai</b></sub></a><br /></td>
</tr>
<tr>
<td align="center" valign="top" width="20%"><a href="https://github.com/Sara-Khosravi"><img src="https://avatars.githubusercontent.com/Sara-Khosravi?s=100" width="100px;" alt="Sara Khosravi"/><br /><sub><b>Sara Khosravi</b></sub></a><br /></td>
<td align="center" valign="top" width="20%"><a href="https://github.com/kai4avaya"><img src="https://avatars.githubusercontent.com/kai4avaya?s=100" width="100px;" alt="Kai Kleinbard"/><br /><sub><b>Kai Kleinbard</b></sub></a><br /></td>
<td align="center" valign="top" width="20%"><a href="https://github.com/Sara-Khosravi"><img src="https://avatars.githubusercontent.com/Sara-Khosravi?s=100" width="100px;" alt="Sara Khosravi"/><br /><sub><b>Sara Khosravi</b></sub></a><br /></td>
<td align="center" valign="top" width="20%"><a href="https://github.com/V0XNIHILI"><img src="https://avatars.githubusercontent.com/V0XNIHILI?s=100" width="100px;" alt="Douwe den Blanken"/><br /><sub><b>Douwe den Blanken</b></sub></a><br /></td>
<td align="center" valign="top" width="20%"><a href="https://github.com/shanzehbatool"><img src="https://avatars.githubusercontent.com/shanzehbatool?s=100" width="100px;" alt="shanzehbatool"/><br /><sub><b>shanzehbatool</b></sub></a><br /></td>
<td align="center" valign="top" width="20%"><a href="https://github.com/mpstewart1"><img src="https://avatars.githubusercontent.com/mpstewart1?s=100" width="100px;" alt="Matthew Stewart"/><br /><sub><b>Matthew Stewart</b></sub></a><br /></td>
<td align="center" valign="top" width="20%"><a href="https://github.com/eliasab16"><img src="https://avatars.githubusercontent.com/eliasab16?s=100" width="100px;" alt="Elias"/><br /><sub><b>Elias</b></sub></a><br /></td>
</tr>
<tr>
<td align="center" valign="top" width="20%"><a href="https://github.com/eliasab16"><img src="https://avatars.githubusercontent.com/eliasab16?s=100" width="100px;" alt="Elias"/><br /><sub><b>Elias</b></sub></a><br /></td>
<td align="center" valign="top" width="20%"><a href="https://github.com/JaredP94"><img src="https://avatars.githubusercontent.com/JaredP94?s=100" width="100px;" alt="Jared Ping"/><br /><sub><b>Jared Ping</b></sub></a><br /></td>
<td align="center" valign="top" width="20%"><a href="https://github.com/mpstewart1"><img src="https://avatars.githubusercontent.com/mpstewart1?s=100" width="100px;" alt="Matthew Stewart"/><br /><sub><b>Matthew Stewart</b></sub></a><br /></td>
<td align="center" valign="top" width="20%"><a href="https://github.com/ishapira1"><img src="https://avatars.githubusercontent.com/ishapira1?s=100" width="100px;" alt="Itai Shapira"/><br /><sub><b>Itai Shapira</b></sub></a><br /></td>
<td align="center" valign="top" width="20%"><a href="https://github.com/harvard-edge/cs249r_book/graphs/contributors"><img src="https://www.gravatar.com/avatar/8863743b4f26c1a20e730fcf7ebc3bc0?d=identicon&s=100?s=100" width="100px;" alt="Maximilian Lam"/><br /><sub><b>Maximilian Lam</b></sub></a><br /></td>
<td align="center" valign="top" width="20%"><a href="https://github.com/jaysonzlin"><img src="https://avatars.githubusercontent.com/jaysonzlin?s=100" width="100px;" alt="Jayson Lin"/><br /><sub><b>Jayson Lin</b></sub></a><br /></td>

View File

@@ -13,6 +13,22 @@ We've created this open-source book to demystify the process of building efficie
As a living and breathing resource, this book is a continual work in progress, reflecting the ever-evolving nature of machine learning systems. Advancements in the ML landscape drive our commitment to keeping this resource updated with the latest insights, techniques, and best practices. We warmly invite you to join us on this journey by contributing your expertise, feedback, and ideas.
## Global Reach
Thank you to all our readers and visitors. Your engagement with the material keeps us motivated.
```{=html}
<div style="position: relative; padding-top: 56.25%; /* Aspect Ratio 16:9 */">
<iframe
src="https://lookerstudio.google.com/embed/reporting/e7192975-a8a0-453d-b6fe-1580ac054dbf/page/0pNbE"
frameborder="0"
style="position: absolute; top: 0; left: 0; width: 100%; height: 100%; border:0;"
allowfullscreen
sandbox="allow-storage-access-by-user-activation allow-scripts allow-same-origin allow-popups allow-popups-to-escape-sandbox">
</iframe>
</div>
```
## Why We Wrote This Book
While there are plenty of resources that focus on the algorithmic side of machine learning, resources on the systems side of things are few and far between. This gap inspired us to create this book—a resource dedicated to the principles and practices of building efficient and scalable ML systems.
@@ -25,4 +41,4 @@ This is a collaborative project, and your input matters! If you'd like to contri
## What's Next?
If you're ready to dive deeper into the book's structure, learning objectives, and practical use, visit the [About the Book](contents/frontmatter/about/about.qmd# about-the-book-unnumbered) section for more details.
If you're ready to dive deeper into the book's structure, learning objectives, and practical use, visit the [About the Book](contents/frontmatter/about/about.qmd#about-the-book-unnumbered) section for more details.

File diff suppressed because one or more lines are too long