This commit is contained in:
Vijay Janapa Reddi
2025-03-27 15:38:15 -04:00
parent 53ea90ddc5
commit 900a8fe492
5 changed files with 11 additions and 11 deletions

View File

@@ -1979,7 +1979,7 @@ Execution control operations coordinate computation across multiple execution un
#### Basic Numerical Operations
Building upon hardware abstractions, frameworks implement fundamental numerical operations that form the building blocks of machine learning computations. These operations must balance mathematical precision with computational efficiency. General Matrix Multiply (GEMM) operations, which dominate the computational cost of most machine learning workloads. GEMM operations follow the pattern C = αAB + βC, where A, B, and C are matrices, and α and β are scaling factors.
Building upon hardware abstractions, frameworks implement fundamental numerical operations that form the building blocks of machine learning computations. These operations must balance mathematical precision with computational efficiency. General Matrix Multiply (GEMM) operations, which dominate the computational cost of most machine learning workloads. GEMM operations follow the pattern C = $\alpha$AB + $\beta$C, where A, B, and C are matrices, and $\alpha$ and $\beta$ are scaling factors.
The implementation of GEMM operations requires sophisticated optimization techniques. These include blocking[^fn-frameworks-10] for cache efficiency, where matrices are divided into smaller tiles that fit in cache memory; loop unrolling[^fn-frameworks-11] to increase instruction-level parallelism; and specialized implementations for different matrix shapes and sparsity patterns. For example, fully-connected neural network layers typically use regular dense GEMM operations, while convolutional layers often employ specialized GEMM variants that exploit input locality patterns.

View File

@@ -172,7 +172,7 @@ MobileNet V1 and MobileNet V2 aim at mobile efficiency and embedded vision appli
Although the base MobileNet architecture is already tiny and has low latency, many times, a specific use case or application may require the model to be even smaller and faster. MobileNets introduces a straightforward parameter $\alpha$ (alpha) called width multiplier to construct these smaller, less computationally expensive models. The role of the width multiplier $\alpha$ is that of thinning a network uniformly at each layer.
Edge Impulse Studio can use both MobileNetV1 ($96\times 96$ images) and V2 ($96\times 96$ or $160\times 160$ images), with several different **α** values (from 0.05 to 1.0). For example, you will get the highest accuracy with V2, $160\times 160$ images, and $\alpha=1.0$. Of course, there is a trade-off. The higher the accuracy, the more memory (around 1.3 MB RAM and 2.6 MB ROM) will be needed to run the model, implying more latency. The smaller footprint will be obtained at the other extreme with MobileNetV1 and $\alpha=0.10$ (around 53.2 K RAM and 101 K ROM).
Edge Impulse Studio can use both MobileNetV1 ($96\times 96$ images) and V2 ($96\times 96$ or $160\times 160$ images), with several different **$\alpha$** values (from 0.05 to 1.0). For example, you will get the highest accuracy with V2, $160\times 160$ images, and $\alpha=1.0$. Of course, there is a trade-off. The higher the accuracy, the more memory (around 1.3 MB RAM and 2.6 MB ROM) will be needed to run the model, implying more latency. The smaller footprint will be obtained at the other extreme with MobileNetV1 and $\alpha=0.10$ (around 53.2 K RAM and 101 K ROM).
\noindent
![](images/jpg/image27.jpg){width="80%" fig-align="center"}

View File

@@ -50,7 +50,7 @@ Our KWS application will recognize four classes of sound:
- **YES** (Keyword 1)
- **NO** (Keyword 2)
- **NOISE** (no words spoken; only background noise is present)
- **UNKNOWNN** (a mix of different words than YES and NO)
- **UNKNOWN** (a mix of different words than YES and NO)
> For real-world projects, it is always advisable to include other sounds besides the keywords, such as "Noise" (or Background) and "Unknown."
@@ -170,7 +170,7 @@ We could keep the default parameter values, but we will use the DSP `Autotune pa
\noindent
![](images/jpg/ei_MFCC.jpg)
We will take the `Raw features` (our 1-second, 16 KHz sampled audio data) and use the MFCC processing block to calculate the `Processed features`. For every 16,000 raw features (16,000 $\times$ 1 second), we will get 637 processed features $(13\times 49)$.$
We will take the `Raw features` (our 1-second, 16 KHz sampled audio data) and use the MFCC processing block to calculate the `Processed features`. For every 16,000 raw features (16,000 $\times$ 1 second), we will get 637 processed features $(13\times 49)$.
\noindent
![](images/jpg/MFCC.jpg){width="90%" fig-align="center"}
@@ -223,7 +223,7 @@ Testing the model with the data reserved for training (Test Data), we got an acc
\noindent
![](images/jpg/test.jpg){width="70%" fig-align="center"}
Inspecting the F1 score, we can see that for YES, we got 0.90, an excellent result since we expect to use this keyword as the primary "trigger" for our KWS project. The worst result (0.70) is for UNKNOWNNN, which is OK.
Inspecting the F1 score, we can see that for YES, we got 0.90, an excellent result since we expect to use this keyword as the primary "trigger" for our KWS project. The worst result (0.70) is for UNKNOWN, which is OK.
For NO, we got 0.72, which was expected, but to improve this result, we can move the samples that were not correctly classified to the training dataset and then repeat the training process.
@@ -259,7 +259,7 @@ Press the reset button twice to put the NiclaV in boot mode, upload the sketch t
Now that we know the model is working since it detects our keywords, let's modify the code to see the result with the NiclaV completely offline (disconnected from the PC and powered by a battery, a power bank, or an independent 5V power supply).
The idea is that whenever the keyword YES is detected, the Green LED will light; if a NO is heard, the Red LED will light, if it is a UNKNOWNN, the Blue LED will light; and in the presence of noise (No Keyword), the LEDs will be OFF.
The idea is that whenever the keyword YES is detected, the Green LED will light; if a NO is heard, the Red LED will light, if it is a UNKNOWN, the Blue LED will light; and in the presence of noise (No Keyword), the LEDs will be OFF.
We should modify one of the code examples. Let's do it now with the `nicla-vision_microphone_continuous`.

View File

@@ -126,7 +126,7 @@ We will simulate container (or better package) transportation through different
\noindent
![](images/jpg/classes.jpg){width="80%" fig-align="center"}
From the above images, we can define for our simulation that primarily horizontal movements ($x$ or y axis) should be associated with the "Terrestrial class," Vertical movements ($z$-axis) with the "Lift Class," no activity with the "Idle class," and movement on all three axes to [Maritime class.](https://www.containerhandbuch.de/chb_e/stra/index.html?/chb_e/stra/stra_02_03_03.htm)
From the above images, we can define for our simulation that primarily horizontal movements ($x$ or $y$ axis) should be associated with the "Terrestrial class," Vertical movements ($z$-axis) with the "Lift Class," no activity with the "Idle class," and movement on all three axes to [Maritime class.](https://www.containerhandbuch.de/chb_e/stra/index.html?/chb_e/stra/stra_02_03_03.htm)
\noindent
![](images/jpg/classes_mov_def.jpg){width="80%" fig-align="center"}

View File

@@ -85,13 +85,13 @@ Remember to `[Save parameters]`. This will generate the features to be used in t
In 2007, Google introduced [MobileNetV1,](https://research.googleblog.com/2017/06/mobilenets-open-source-models-for.html) a family of general-purpose computer vision neural networks designed with mobile devices in mind to support classification, detection, and more. MobileNets are small, low-latency, low-power models parameterized to meet the resource constraints of various use cases.
Although the base MobileNet architecture is already tiny and has low latency, many times, a specific use case or application may require the model to be smaller and faster. MobileNet introduces a straightforward parameter α (alpha) called width multiplier to construct these smaller, less computationally expensive models. The role of the width multiplier α is to thin a network uniformly at each layer.
Although the base MobileNet architecture is already tiny and has low latency, many times, a specific use case or application may require the model to be smaller and faster. MobileNet introduces a straightforward parameter $\alpha$ (alpha) called width multiplier to construct these smaller, less computationally expensive models. The role of the width multiplier $\alpha$ is to thin a network uniformly at each layer.
Edge Impulse Studio has **MobileNet V1 (96x96 images)** and **V2 (96x96 and 160x160 images)** available, with several different **α** values (from 0.05 to 1.0). For example, you will get the highest accuracy with V2, 160x160 images, and α=1.0. Of course, there is a trade-off. The higher the accuracy, the more memory (around 1.3M RAM and 2.6M ROM) will be needed to run the model, implying more latency.
Edge Impulse Studio has **MobileNet V1 (96x96 images)** and **V2 (96x96 and 160x160 images)** available, with several different **$\alpha$** values (from 0.05 to 1.0). For example, you will get the highest accuracy with V2, 160x160 images, and $\alpha$=1.0. Of course, there is a trade-off. The higher the accuracy, the more memory (around 1.3M RAM and 2.6M ROM) will be needed to run the model, implying more latency.
The smaller footprint will be obtained at another extreme with **MobileNet V1** and α=0.10 (around 53.2K RAM and 101K ROM).
The smaller footprint will be obtained at another extreme with **MobileNet V1** and $\alpha$=0.10 (around 53.2K RAM and 101K ROM).
For this first pass, we will use **MobileNet V1** and α=0.10.
For this first pass, we will use **MobileNet V1** and $\alpha$=0.10.
### Training