mirror of
https://github.com/harvard-edge/cs249r_book.git
synced 2026-05-04 00:29:10 -05:00
Created subfolders within images/ based on filetype
Better organization for the future to build a PDF etc. cause images need to be pulled from the right type for quality rendering. Currently, not being used but will be useful in the future and plus the organization now doesn't hurt by any means, only makes the "code" cleaner.
This commit is contained in:
@@ -1,6 +1,6 @@
|
||||
# AI Frameworks
|
||||
|
||||

|
||||

|
||||
|
||||
In this chapter, we explore the landscape of AI frameworks that serve as the foundation for developing machine learning systems. AI frameworks provide the essential tools, libraries, and environments necessary to design, train, and deploy machine learning models. We delve into the evolutionary trajectory of these frameworks, dissect the workings of TensorFlow, and provide insights into the core components and advanced features that define these frameworks.
|
||||
|
||||
@@ -68,7 +68,7 @@ Each generation of frameworks unlocked new capabilities that powered advancement
|
||||
|
||||
In recent years, there has been a convergence on the frameworks. @fig-ml-framework shows that TensorFlow and PyTorch have become the overwhelmingly dominant ML frameworks, representing more than 95% of ML frameworks used in research and production. Keras was integrated into TensorFlow in 2019; Preferred Networks transitioned Chainer to PyTorch in 2019; and Microsoft stopped actively developing CNTK in 2022 in favor of supporting PyTorch on Windows.
|
||||
|
||||
{#fig-ml-framework}
|
||||
{#fig-ml-framework}
|
||||
|
||||
However, a one-size-fits-all approach does not work well across the spectrum from cloud to tiny edge devices. Different frameworks represent various philosophies around graph execution, declarative versus imperative APIs, and more. Declarative defines what the program should do while imperative focuses on how it should do it step-by-step. For instance, TensorFlow uses graph execution and declarative-style modeling while PyTorch adopts eager execution and imperative modeling for more Pythonic flexibility. Each approach carries tradeoffs that we will discuss later in the Basic Components section.
|
||||
|
||||
@@ -100,7 +100,7 @@ TensorFlow is both a training and inference framework and provides built-in func
|
||||
|
||||
9. [TensorFlow Extended (TFX)](https://www.tensorflow.org/tfx): end-to-end platform designed to deploy and manage machine learning pipelines in production settings. TFX encompasses components for data validation, preprocessing, model training, validation, and serving.
|
||||
|
||||
)](images/tensorflow.png){#fig-tensorflow-architecture}
|
||||
)](images/png/tensorflow.png){#fig-tensorflow-architecture}
|
||||
|
||||
TensorFlow was developed to address the limitations of DistBelief [@abadi2016tensorflow]---the framework in use at Google from 2011 to 2015---by providing flexibility along three axes: 1) defining new layers, 2) refining training algorithms, and 3) defining new training algorithms. To understand what limitations in DistBelief led to the development of TensorFlow, we will first give a brief overview of the Parameter Server Architecture that DistBelief employed [@dean2012large].
|
||||
|
||||
@@ -169,7 +169,7 @@ Here's a summarizing comparative analysis:
|
||||
|
||||
To understand tensors, let us start from the familiar concepts in linear algebra. As demonstrated in @fig-tensor-data-structure, vectors can be represented as a stack of numbers in a 1-dimensional array. Matrices follow the same idea, and one can think of them as many vectors being stacked on each other, making it 2 dimensional. Higher dimensional tensors work the same way. A 3-dimensional tensor is simply a set of matrices stacked on top of each other in another direction. Therefore, vectors and matrices can be considered special cases of tensors, with 1D and 2D dimensions respectively.
|
||||
|
||||
{#fig-tensor-data-structure}
|
||||
{#fig-tensor-data-structure}
|
||||
|
||||
Defining formally, in machine learning, tensors are a multi-dimensional array of numbers. The number of dimensions defines the rank of the tensor. As a generalization of linear algebra, the study of tensors is called multilinear algebra. There are noticeable similarities between matrices and higher ranked tensors. First, it is possible to extend the definitions given in linear algebra to tensors, such as with eigenvalues, eigenvectors, and rank (in the linear algebra sense) . Furthermore, with the way that we have defined tensors, it is possible to turn higher dimensional tensors into matrices. This turns out to be very critical in practice, as multiplication of abstract representations of higher dimensional tensors are often completed by first converting them into matrices for multiplication.
|
||||
|
||||
@@ -183,7 +183,7 @@ Computational graphs are a key component of deep learning frameworks like Tensor
|
||||
|
||||
For example, a node might represent a matrix multiplication operation, taking two input matrices (or tensors) and producing an output matrix (or tensor). To visualize this, consider the simple example in @fig-computational-graph. The directed acyclic graph above computes $z = x \times y$, where each of the variables are just numbers.
|
||||
|
||||
{#fig-computational-graph width="50%" height="auto" align="center"}
|
||||
{#fig-computational-graph width="50%" height="auto" align="center"}
|
||||
|
||||
Underneath the hood, the computational graphs represent abstractions for common layers like convolutional, pooling, recurrent, and dense layers, with data including activations, weights, biases, are represented in tensors. Convolutional layers form the backbone of CNN models for computer vision. They detect spatial patterns in input data through learned filters. Recurrent layers like LSTMs and GRUs enable processing sequential data for tasks like language translation. Attention layers are used in transformers to draw global context from the entire input.
|
||||
|
||||
@@ -370,7 +370,7 @@ While TPU's can drastically reduce training times, it also has disadvantages. Fo
|
||||
|
||||
Today, NVIDIA GPUs dominate training, aided by software libraries like [CUDA](https://developer.nvidia.com/cuda-toolkit), [cuDNN](https://developer.nvidia.com/cudnn), and [TensorRT.](https://developer.nvidia.com/tensorrt#:~:text=NVIDIA TensorRT-LLM is an,knowledge of C++ or CUDA.) Frameworks also tend to include optimizations to maximize performance on these hardware types, like pruning unimportant connections and fusing layers. Combining these techniques with hardware acceleration provides greater efficiency. For inference, hardware is increasingly moving towards optimized ASICs and SoCs. Google\'s TPUs accelerate models in data centers. Apple, Qualcomm, and others now produce AI-focused mobile chips. The NVIDIA Jetson family targets autonomous robots.
|
||||
|
||||
)](images/hardware_accelerator.png){#fig-hardware-accelerator}
|
||||
)](images/png/hardware_accelerator.png){#fig-hardware-accelerator}
|
||||
|
||||
## Advanced Features {#sec-ai_frameworks-advanced}
|
||||
|
||||
@@ -422,7 +422,7 @@ There are additional challenges associated with federated learning. The number o
|
||||
|
||||
The heterogeneity of device resources is another hurdle. Devices participating in Federated Learning can have varying computational powers and memory capacities. This diversity makes it challenging to design algorithms that are efficient across all devices. Privacy and security issues are not a guarantee for federated learning. Techniques such as inversion gradient attacks can be used to extract information about the training data from the model parameters. Despite these challenges, the large amount of potential benefits continue to make it a popular research area. Open source programs such as [Flower](https://flower.dev/) have been developed to make it simpler to implement federated learning with a variety of machine learning frameworks.
|
||||
|
||||
)](images/federated_learning.png){#fig-federated-learning}
|
||||
)](images/png/federated_learning.png){#fig-federated-learning}
|
||||
|
||||
## Framework Specialization
|
||||
|
||||
@@ -600,7 +600,7 @@ Through all these various custom techniques like static compilation, model-based
|
||||
|
||||
Choosing the right machine learning framework for a given application requires carefully evaluating models, hardware, and software considerations. By analyzing these three aspects - models, hardware, and software - ML engineers can select the optimal framework and customize as needed for efficient and performant on-device ML applications. The goal is to balance model complexity, hardware limitations, and software integration to design a tailored ML pipeline for embedded and edge devices.
|
||||
|
||||
{#fig-tf-comparison width="100%" height="auto" align="center" caption="TensorFlow Framework Comparison - General"}
|
||||
{#fig-tf-comparison width="100%" height="auto" align="center" caption="TensorFlow Framework Comparison - General"}
|
||||
|
||||
### Model
|
||||
|
||||
@@ -608,13 +608,13 @@ TensorFlow supports significantly more ops than TensorFlow Lite and TensorFlow L
|
||||
|
||||
### Software
|
||||
|
||||
{#fig-tf-sw-comparison width="100%" height="auto" align="center" caption="TensorFlow Framework Comparison - Model"}
|
||||
{#fig-tf-sw-comparison width="100%" height="auto" align="center" caption="TensorFlow Framework Comparison - Model"}
|
||||
|
||||
TensorFlow Lite Micro does not have OS support, while TensorFlow and TensorFlow Lite do, in order to reduce memory overhead, make startup times faster, and consume less energy (@fig-tf-sw-comparison). TensorFlow Lite Micro can be used in conjunction with real-time operating systems (RTOS) like FreeRTOS, Zephyr, and Mbed OS. TensorFlow Lite and TensorFlow Lite Micro support model memory mapping, allowing models to be directly accessed from flash storage rather than loaded into RAM, whereas TensorFlow does not. TensorFlow and TensorFlow Lite support accelerator delegation to schedule code to different accelerators, whereas TensorFlow Lite Micro does not, as embedded systems tend not to have a rich array of specialized accelerators.
|
||||
|
||||
### Hardware
|
||||
|
||||
{#fig-tf-hw-comparison width="100%" height="auto" align="center" caption="TensorFlow Framework Comparison - Hardware"}
|
||||
{#fig-tf-hw-comparison width="100%" height="auto" align="center" caption="TensorFlow Framework Comparison - Hardware"}
|
||||
|
||||
TensorFlow Lite and TensorFlow Lite Micro have significantly smaller base binary sizes and base memory footprints compared to TensorFlow (@fig-tf-hw-comparison). For example, a typical TensorFlow Lite Micro binary is less than 200KB, whereas TensorFlow is much larger. This is due to the resource-constrained environments of embedded systems. TensorFlow provides support for x86, TPUs, and GPUs like NVIDIA, AMD, and Intel. TensorFlow Lite provides support for Arm Cortex A and x86 processors commonly used in mobile and tablets. The latter is stripped out of all the training logic that is not necessary for ondevice deployment. TensorFlow Lite Micro provides support for microcontroller-focused Arm Cortex M cores like M0, M3, M4, and M7, as well as DSPs like Hexagon and SHARC and MCUs like STM32, NXP Kinetis, Microchip AVR.
|
||||
|
||||
@@ -654,7 +654,7 @@ Community support plays another essential factor. Frameworks with active and eng
|
||||
|
||||
Currently, the ML system stack consists of four abstractions (@fig-mlsys-stack), namely (1) computational graphs, (2) tensor programs, (3) libraries and runtimes, and (4) hardware primitives.
|
||||
|
||||
{#fig-mlsys-stack align="center" caption="Four Abstractions in Current ML System Stack"}
|
||||
{#fig-mlsys-stack align="center" caption="Four Abstractions in Current ML System Stack"}
|
||||
|
||||
This has led to vertical (i.e. between abstraction levels) and horizontal (i.e. library-driven vs. compilation-driven approaches to tensor computation) boundaries, which hinder innovation for ML. Future work in ML frameworks can look toward breaking these boundaries. In December 2021, [Apache TVM](https://tvm.apache.org/2021/12/15/tvm-unity) Unity was proposed, which aimed to facilitate interactions between the different abstraction levels (as well as the people behind them, such as ML scientists, ML engineers, and hardware engineers) and co-optimize decisions in all four abstraction levels.
|
||||
|
||||
|
||||
Reference in New Issue
Block a user