mirror of
https://github.com/harvard-edge/cs249r_book.git
synced 2026-05-06 09:38:33 -05:00
728 lines
25 KiB
Plaintext
728 lines
25 KiB
Plaintext
# CV on Nicla Vision {.unnumbered}
|
||
|
||
As we initiate our studies into embedded machine learning or tinyML,
|
||
it\'s impossible to overlook the transformative impact of Computer
|
||
Vision (CV) and Artificial Intelligence (AI) in our lives. These two
|
||
intertwined disciplines redefine what machines can perceive and
|
||
accomplish, from autonomous vehicles and robotics to healthcare and
|
||
surveillance.
|
||
|
||
More and more, we are facing an artificial intelligence (AI) revolution
|
||
where, as stated by Gartner, **Edge AI** has a very high impact
|
||
potential, and **it is for now**!
|
||
|
||
{width="4.729166666666667in"
|
||
height="4.895833333333333in"}
|
||
|
||
In the \"bull-eye\" of emerging technologies, radar is the *Edge
|
||
Computer Vision*, and when we talk about Machine Learning (ML) applied
|
||
to vision, the first thing that comes to mind is **Image
|
||
Classification**, a kind of ML \"Hello World\"!
|
||
|
||
This exercise will explore a computer vision project utilizing
|
||
Convolutional Neural Networks (CNNs) for real-time image classification.
|
||
Leveraging TensorFlow\'s robust ecosystem, we\'ll implement a
|
||
pre-trained MobileNet model and adapt it for edge deployment. The focus
|
||
will be optimizing the model to run efficiently on resource-constrained
|
||
hardware without sacrificing accuracy.
|
||
|
||
We\'ll employ techniques like quantization and pruning to reduce the
|
||
computational load. By the end of this tutorial, you\'ll have a working
|
||
prototype capable of classifying images in real time, all running on a
|
||
low-power embedded system based on the Arduino Nicla Vision board.
|
||
|
||
## Computer Vision
|
||
|
||
At its core, computer vision aims to enable machines to interpret and
|
||
make decisions based on visual data from the world---essentially
|
||
mimicking the capability of the human optical system. Conversely, AI is
|
||
a broader field encompassing machine learning, natural language
|
||
processing, and robotics, among other technologies. When you bring AI
|
||
algorithms into computer vision projects, you supercharge the system\'s
|
||
ability to understand, interpret, and react to visual stimuli.
|
||
|
||
When discussing Computer Vision projects applied to embedded devices,
|
||
the most common applications that come to mind are *Image
|
||
Classification* and *Object Detection*.
|
||
|
||
{width="6.5in"
|
||
height="2.8333333333333335in"}
|
||
|
||
Both models can be implemented on tiny devices like the Arduino Nicla
|
||
Vision and used on real projects. Let\'s start with the first one.
|
||
|
||
## Image Classification Project
|
||
|
||
The first step in any ML project is to define our goal. In this case, it
|
||
is to detect and classify two specific objects present in one image. For
|
||
this project, we will use two small toys: a *robot* and a small
|
||
Brazilian parrot (named *Periquito*). Also, we will collect images of a
|
||
*background* where those two objects are absent.
|
||
|
||
{width="6.5in"
|
||
height="3.638888888888889in"}
|
||
|
||
## Data Collection
|
||
|
||
Once you have defined your Machine Learning project goal, the next and
|
||
most crucial step is the dataset collection. You can use the Edge
|
||
Impulse Studio, the OpenMV IDE we installed, or even your phone for the
|
||
image capture. Here, we will use the OpenMV IDE for that.
|
||
|
||
**Collecting Dataset with OpenMV IDE**
|
||
|
||
First, create in your computer a folder where your data will be saved,
|
||
for example, \"data.\" Next, on the OpenMV IDE, go to Tools \> Dataset
|
||
Editor and select New Dataset to start the dataset collection:
|
||
|
||
{width="6.291666666666667in"
|
||
height="4.010416666666667in"}
|
||
|
||
The IDE will ask you to open the file where your data will be saved and
|
||
choose the \"data\" folder that was created. Note that new icons will
|
||
appear on the Left panel.
|
||
|
||
{width="0.9583333333333334in"
|
||
height="1.5208333333333333in"}
|
||
|
||
Using the upper icon (1), enter with the first class name, for example,
|
||
\"periquito\":
|
||
|
||
{width="3.25in"
|
||
height="2.65625in"}
|
||
|
||
Run the dataset_capture_script.py, and clicking on the bottom icon (2),
|
||
will start capturing images:
|
||
|
||
{width="6.5in"
|
||
height="4.041666666666667in"}
|
||
|
||
Repeat the same procedure with the other classes
|
||
|
||
{width="6.5in"
|
||
height="3.0972222222222223in"}
|
||
|
||
> *We suggest around 60 images from each category. Try to capture
|
||
> different angles, backgrounds, and light conditions.*
|
||
|
||
The stored images use a QVGA frame size 320x240 and RGB565 (color pixel
|
||
format).
|
||
|
||
After capturing your dataset, close the Dataset Editor Tool on the Tools
|
||
\> Dataset Editor.
|
||
|
||
On your computer, you will end with a dataset that contains three
|
||
classes: periquito, robot, and background.
|
||
|
||
{width="6.5in"
|
||
height="2.2083333333333335in"}
|
||
|
||
You should return to Edge Impulse Studio and upload the dataset to your
|
||
project.
|
||
|
||
## Training the model with Edge Impulse Studio
|
||
|
||
We will use the Edge Impulse Studio for training our model. Enter your
|
||
account credentials at Edge Impulse and create a new project:
|
||
|
||
{width="6.5in"
|
||
height="4.263888888888889in"}
|
||
|
||
> *Here, you can clone a similar project:*
|
||
> *[NICLA-Vision_Image_Classification](https://studio.edgeimpulse.com/public/273858/latest).*
|
||
|
||
## Dataset
|
||
|
||
Using the EI Studio (or *Studio*), we will pass over four main steps to
|
||
have our model ready for use on the Nicla Vision board: Dataset,
|
||
Impulse, Tests, and Deploy (on the Edge Device, in this case, the
|
||
NiclaV).
|
||
|
||
{width="6.5in"
|
||
height="4.194444444444445in"}
|
||
|
||
Regarding the Dataset, it is essential to point out that our Original
|
||
Dataset, captured with the OpenMV IDE, will be split into three parts:
|
||
Training, Validation, and Test. The Test Set will be divided from the
|
||
beginning and left a part to be used only in the Test phase after
|
||
training. The Validation Set will be used during training.
|
||
|
||
{width="6.5in"
|
||
height="4.763888888888889in"}
|
||
|
||
On Studio, go to the Data acquisition tab, and on the UPLOAD DATA
|
||
section, upload from your computer the files from chosen categories:
|
||
|
||
{width="6.5in"
|
||
height="4.263888888888889in"}
|
||
|
||
Left to the Studio to automatically split the original dataset into
|
||
training and test and choose the label related to that specific data:
|
||
|
||
{width="6.5in"
|
||
height="4.263888888888889in"}
|
||
|
||
Repeat the procedure for all three classes. At the end, you should see
|
||
your \"raw data in the Studio:
|
||
|
||
{width="6.5in"
|
||
height="4.263888888888889in"}
|
||
|
||
The Studio allows you to explore your data, showing a complete view of
|
||
all the data in your project. You can clear, inspect, or change labels
|
||
by clicking on individual data items. In our case, a simple project, the
|
||
data seems OK.
|
||
|
||
{width="6.5in"
|
||
height="4.263888888888889in"}
|
||
|
||
## The Impulse Design
|
||
|
||
In this phase, we should define how to:
|
||
|
||
- Pre-process our data, which consists of resizing the individual
|
||
> images and determining the color depth to use (RGB or Grayscale)
|
||
> and
|
||
|
||
- Design a Model that will be \"Transfer Learning (Images)\" to
|
||
> fine-tune a pre-trained MobileNet V2 image classification model on
|
||
> our data. This method performs well even with relatively small
|
||
> image datasets (around 150 images in our case).
|
||
|
||
{width="6.5in"
|
||
height="4.0in"}
|
||
|
||
Transfer Learning with MobileNet offers a streamlined approach to model
|
||
training, which is especially beneficial for resource-constrained
|
||
environments and projects with limited labeled data. MobileNet, known
|
||
for its lightweight architecture, is a pre-trained model that has
|
||
already learned valuable features from a large dataset (ImageNet).
|
||
|
||
{width="6.5in"
|
||
height="1.9305555555555556in"}
|
||
|
||
By leveraging these learned features, you can train a new model for your
|
||
specific task with fewer data and computational resources yet achieve
|
||
competitive accuracy.
|
||
|
||
{width="6.5in"
|
||
height="2.3055555555555554in"}
|
||
|
||
This approach significantly reduces training time and computational
|
||
cost, making it ideal for quick prototyping and deployment on embedded
|
||
devices where efficiency is paramount.
|
||
|
||
Go to the Impulse Design Tab and create the *impulse*, defining an image
|
||
size of 96x96 and squashing them (squared form, without crop). Select
|
||
Image and Transfer Learning blocks. Save the Impulse.
|
||
|
||
{width="6.5in"
|
||
height="4.263888888888889in"}
|
||
|
||
### **Image Pre-Processing**
|
||
|
||
All input QVGA/RGB565 images will be converted to 27,640 features
|
||
(96x96x3).
|
||
|
||
{width="6.5in"
|
||
height="4.319444444444445in"}
|
||
|
||
Press \[Save parameters\] and Generate all features:
|
||
|
||
{width="6.5in"
|
||
height="4.263888888888889in"}
|
||
|
||
## Model Design
|
||
|
||
In 2007, Google introduced
|
||
[[MobileNetV1]{.underline}](https://research.googleblog.com/2017/06/mobilenets-open-source-models-for.html),
|
||
a family of general-purpose computer vision neural networks designed
|
||
with mobile devices in mind to support classification, detection, and
|
||
more. MobileNets are small, low-latency, low-power models parameterized
|
||
to meet the resource constraints of various use cases. in 2018, Google
|
||
launched [MobileNetV2: Inverted Residuals and Linear
|
||
Bottlenecks](https://arxiv.org/abs/1801.04381).
|
||
|
||
MobileNet V1 and MobileNet V2 aim for mobile efficiency and embedded
|
||
vision applications but differ in architectural complexity and
|
||
performance. While both use depthwise separable convolutions to reduce
|
||
the computational cost, MobileNet V2 introduces Inverted Residual Blocks
|
||
and Linear Bottlenecks to enhance performance. These new features allow
|
||
V2 to capture more complex features using fewer parameters, making it
|
||
computationally more efficient and generally more accurate than its
|
||
predecessor. Additionally, V2 employs a non-linear activation in the
|
||
intermediate expansion layer. Still, it uses a linear activation for the
|
||
bottleneck layer, a design choice found to preserve important
|
||
information through the network better. MobileNet V2 offers a more
|
||
optimized architecture for higher accuracy and efficiency and will be
|
||
used in this project.
|
||
|
||
Although the base MobileNet architecture is already tiny and has low
|
||
latency, many times, a specific use case or application may require the
|
||
model to be smaller and faster. MobileNets introduces a straightforward
|
||
parameter α (alpha) called width multiplier to construct these smaller,
|
||
less computationally expensive models. The role of the width multiplier
|
||
α is to thin a network uniformly at each layer.
|
||
|
||
Edge Impulse Studio has available MobileNetV1 (96x96 images) and V2
|
||
(96x96 and 160x160 images), with several different **α** values (from
|
||
0.05 to 1.0). For example, you will get the highest accuracy with V2,
|
||
160x160 images, and α=1.0. Of course, there is a trade-off. The higher
|
||
the accuracy, the more memory (around 1.3M RAM and 2.6M ROM) will be
|
||
needed to run the model, implying more latency. The smaller footprint
|
||
will be obtained at another extreme with MobileNetV1 and α=0.10 (around
|
||
53.2K RAM and 101K ROM).
|
||
|
||
{width="6.5in"
|
||
height="3.5277777777777777in"}
|
||
|
||
For this project, we will use **MobileNetV2 96x96 0.1**, which estimates
|
||
a memory cost of 265.3 KB in RAM. This model should be OK for the Nicla
|
||
Vision with 1MB of SRAM. On the Transfer Learning Tab, select this
|
||
model:
|
||
|
||
{width="6.5in"
|
||
height="4.263888888888889in"}
|
||
|
||
Another necessary technique to be used with Deep Learning is **Data
|
||
Augmentation**. Data augmentation is a method that can help improve the
|
||
accuracy of machine learning models, creating additional artificial
|
||
data. A data augmentation system makes small, random changes to your
|
||
training data during the training process (such as flipping, cropping,
|
||
or rotating the images).
|
||
|
||
Under the rood, here you can see how Edge Impulse implements a data
|
||
Augmentation policy on your data:
|
||
|
||
```python
|
||
# Implements the data augmentation policy
|
||
def augment_image(image, label):
|
||
# Flips the image randomly
|
||
image = tf.image.random_flip_left_right(image)
|
||
|
||
# Increase the image size, then randomly crop it down to
|
||
# the original dimensions
|
||
resize_factor = random.uniform(1, 1.2)
|
||
new_height = math.floor(resize_factor * INPUT_SHAPE[0])
|
||
new_width = math.floor(resize_factor * INPUT_SHAPE[1])
|
||
image = tf.image.resize_with_crop_or_pad(image, new_height, new_width)
|
||
image = tf.image.random_crop(image, size=INPUT_SHAPE)
|
||
|
||
# Vary the brightness of the image
|
||
image = tf.image.random_brightness(image, max_delta=0.2)
|
||
|
||
return image, label
|
||
|
||
```
|
||
Exposure to these variations during training can help prevent your model
|
||
from taking shortcuts by \"memorizing\" superficial clues in your
|
||
training data, meaning it may better reflect the deep underlying
|
||
patterns in your dataset.
|
||
|
||
The final layer of our model will have 12 neurons with a 15% dropout for
|
||
overfitting prevention. Here is the Training result:
|
||
|
||
{width="6.5in"
|
||
height="3.5in"}
|
||
|
||
The result is excellent, with 77ms of latency, which should result in
|
||
13fps (frames per second) during inference.
|
||
|
||
## Model Testing
|
||
|
||
{width="6.5in"
|
||
height="3.8472222222222223in"}
|
||
|
||
Now, you should take the data put apart at the start of the project and
|
||
run the trained model having them as input:
|
||
|
||
{width="3.1041666666666665in"
|
||
height="1.7083333333333333in"}
|
||
|
||
The result was, again, excellent.
|
||
|
||
{width="6.5in"
|
||
height="4.263888888888889in"}
|
||
|
||
## Deploying the model
|
||
|
||
At this point, we can deploy the trained model as.tflite and use the
|
||
OpenMV IDE to run it using MicroPython, or we can deploy it as a C/C++
|
||
or an Arduino library.
|
||
|
||
{width="6.5in"
|
||
height="3.763888888888889in"}
|
||
|
||
**Arduino Library**
|
||
|
||
First, Let\'s deploy it as an Arduino Library:
|
||
|
||
{width="6.5in"
|
||
height="4.263888888888889in"}
|
||
|
||
You should install the library as.zip on the Arduino IDE and run the
|
||
sketch nicla_vision_camera.ino available in Examples under your library
|
||
name.
|
||
|
||
> *Note that Arduino Nicla Vision has, by default, 512KB of RAM
|
||
> allocated for the M7 core and an additional 244KB on the M4 address
|
||
> space. In the code, this allocation was changed to 288 kB to guarantee
|
||
> that the model will run on the device
|
||
> (malloc_addblock((void\*)0x30000000, 288 \* 1024);).*
|
||
|
||
The result was good, with 86ms of measured latency.
|
||
|
||
{width="6.5in"
|
||
height="3.4444444444444446in"}
|
||
|
||
Here is a short video showing the inference results:
|
||
[[https://youtu.be/bZPZZJblU-o]{.underline}](https://youtu.be/bZPZZJblU-o)
|
||
|
||
**OpenMV**
|
||
|
||
It is possible to deploy the trained model to be used with OpenMV in two
|
||
ways: as a library and as a firmware.
|
||
|
||
Three files are generated as a library: the.tflite model, a list with
|
||
the labels, and a simple MicroPython script that can make inferences
|
||
using the model.
|
||
|
||
{width="6.5in"
|
||
height="1.0in"}
|
||
|
||
Running this model as a.tflite directly in the Nicla was impossible. So,
|
||
we can sacrifice the accuracy using a smaller model or deploy the model
|
||
as an OpenMV Firmware (FW). As an FW, the Edge Impulse Studio generates
|
||
optimized models, libraries, and frameworks needed to make the
|
||
inference. Let\'s explore this last one.
|
||
|
||
Select OpenMV Firmware on the Deploy Tab and press \[Build\].
|
||
|
||
{width="6.5in"
|
||
height="4.263888888888889in"}
|
||
|
||
On your computer, you will find a ZIP file. Open it:
|
||
|
||
{width="6.5in" height="2.625in"}
|
||
|
||
Use the Bootloader tool on the OpenMV IDE to load the FW on your board:
|
||
|
||
{width="6.5in" height="3.625in"}
|
||
|
||
Select the appropriate file (.bin for Nicla-Vision):
|
||
|
||
{width="6.5in" height="1.9722222222222223in"}
|
||
|
||
After the download is finished, press OK:
|
||
|
||
{width="3.875in" height="5.708333333333333in"}
|
||
|
||
If a message says that the FW is outdated, DO NOT UPGRADE. Select
|
||
\[NO\].
|
||
|
||
{width="4.572916666666667in"
|
||
height="2.875in"}
|
||
|
||
Now, open the script **ei_image_classification.py** that was downloaded
|
||
from the Studio and the.bin file for the Nicla.
|
||
|
||
{width="6.5in"
|
||
height="4.0in"}
|
||
|
||
And run it. Pointing the camera to the objects we want to classify, the
|
||
inference result will be displayed on the Serial Terminal.
|
||
|
||
{width="6.5in"
|
||
height="3.736111111111111in"}
|
||
|
||
**Changing Code to add labels:**
|
||
|
||
The code provided by Edge Impulse can be modified so that we can see,
|
||
for test reasons, the inference result directly on the image displayed
|
||
on the OpenMV IDE.
|
||
|
||
[[Upload the code from
|
||
GitHub,]{.underline}](https://github.com/Mjrovai/Arduino_Nicla_Vision/blob/main/Micropython/nicla_image_classification.py)
|
||
or modify it as below:
|
||
|
||
```python
|
||
# Marcelo Rovai - NICLA Vision - Image Classification
|
||
# Adapted from Edge Impulse - OpenMV Image Classification Example
|
||
# @24Aug23
|
||
|
||
import sensor, image, time, os, tf, uos, gc
|
||
|
||
sensor.reset() # Reset and initialize the sensor.
|
||
sensor.set_pixformat(sensor.RGB565) # Set pxl fmt to RGB565 (or GRAYSCALE)
|
||
sensor.set_framesize(sensor.QVGA) # Set frame size to QVGA (320x240)
|
||
sensor.set_windowing((240, 240)) # Set 240x240 window.
|
||
sensor.skip_frames(time=2000) # Let the camera adjust.
|
||
|
||
net = None
|
||
labels = None
|
||
|
||
try:
|
||
# Load built in model
|
||
labels, net = tf.load_builtin_model('trained')
|
||
except Exception as e:
|
||
raise Exception(e)
|
||
|
||
clock = time.clock()
|
||
while(True):
|
||
clock.tick() # Starts tracking elapsed time.
|
||
|
||
img = sensor.snapshot()
|
||
|
||
# default settings just do one detection
|
||
for obj in net.classify(img,
|
||
min_scale=1.0,
|
||
scale_mul=0.8,
|
||
x_overlap=0.5,
|
||
y_overlap=0.5):
|
||
fps = clock.fps()
|
||
lat = clock.avg()
|
||
|
||
print("**********\nPrediction:")
|
||
img.draw_rectangle(obj.rect())
|
||
# This combines the labels and confidence values into a list of tuples
|
||
predictions_list = list(zip(labels, obj.output()))
|
||
|
||
max_val = predictions_list[0][1]
|
||
max_lbl = 'background'
|
||
for i in range(len(predictions_list)):
|
||
val = predictions_list[i][1]
|
||
lbl = predictions_list[i][0]
|
||
|
||
if val > max_val:
|
||
max_val = val
|
||
max_lbl = lbl
|
||
|
||
# Print label with the highest probability
|
||
if max_val < 0.5:
|
||
max_lbl = 'uncertain'
|
||
print("{} with a prob of {:.2f}".format(max_lbl, max_val))
|
||
print("FPS: {:.2f} fps ==> latency: {:.0f} ms".format(fps, lat))
|
||
|
||
# Draw label with highest probability to image viewer
|
||
img.draw_string(
|
||
10, 10,
|
||
max_lbl + "\n{:.2f}".format(max_val),
|
||
mono_space = False,
|
||
scale=2
|
||
)
|
||
|
||
```
|
||
|
||
Here you can see the result:
|
||
|
||
{width="6.5in"
|
||
height="2.9444444444444446in"}
|
||
|
||
Note that the latency (136 ms) is almost double what we got directly
|
||
with the Arduino IDE. This is because we are using the IDE as an
|
||
interface and the time to wait for the camera to be ready. If we start
|
||
the clock just before the inference:
|
||
|
||
{width="6.5in"
|
||
height="2.0972222222222223in"}
|
||
|
||
The latency will drop to only 71 ms.
|
||
|
||
{width="3.5520833333333335in"
|
||
height="1.53125in"}
|
||
|
||
> *The NiclaV runs about half as fast when connected to the IDE. The FPS should increase once disconnected.*
|
||
|
||
### **Post-Processing with LEDs**
|
||
|
||
When working with embedded machine learning, we are looking for devices
|
||
that can continually proceed with the inference and result, taking some
|
||
action directly on the physical world and not displaying the result on a
|
||
connected computer. To simulate this, we will define one LED to light up
|
||
for each one of the possible inference results.
|
||
|
||
{width="6.5in"
|
||
height="3.236111111111111in"}
|
||
|
||
For that, we should [[upload the code from
|
||
GitHub]{.underline}](https://github.com/Mjrovai/Arduino_Nicla_Vision/blob/main/Micropython/nicla_image_classification_LED.py)
|
||
or change the last code to include the LEDs:
|
||
|
||
```python
|
||
# Marcelo Rovai - NICLA Vision - Image Classification with LEDs
|
||
# Adapted from Edge Impulse - OpenMV Image Classification Example
|
||
# @24Aug23
|
||
|
||
import sensor, image, time, os, tf, uos, gc, pyb
|
||
|
||
ledRed = pyb.LED(1)
|
||
ledGre = pyb.LED(2)
|
||
ledBlu = pyb.LED(3)
|
||
|
||
sensor.reset() # Reset and initialize the sensor.
|
||
sensor.set_pixformat(sensor.RGB565) # Set pixl fmt to RGB565 (or GRAYSCALE)
|
||
sensor.set_framesize(sensor.QVGA) # Set frame size to QVGA (320x240)
|
||
sensor.set_windowing((240, 240)) # Set 240x240 window.
|
||
sensor.skip_frames(time=2000) # Let the camera adjust.
|
||
|
||
net = None
|
||
labels = None
|
||
|
||
ledRed.off()
|
||
ledGre.off()
|
||
ledBlu.off()
|
||
|
||
try:
|
||
# Load built in model
|
||
labels, net = tf.load_builtin_model('trained')
|
||
except Exception as e:
|
||
raise Exception(e)
|
||
|
||
clock = time.clock()
|
||
|
||
|
||
def setLEDs(max_lbl):
|
||
|
||
if max_lbl == 'uncertain':
|
||
ledRed.on()
|
||
ledGre.off()
|
||
ledBlu.off()
|
||
|
||
if max_lbl == 'periquito':
|
||
ledRed.off()
|
||
ledGre.on()
|
||
ledBlu.off()
|
||
|
||
if max_lbl == 'robot':
|
||
ledRed.off()
|
||
ledGre.off()
|
||
ledBlu.on()
|
||
|
||
if max_lbl == 'background':
|
||
ledRed.off()
|
||
ledGre.off()
|
||
ledBlu.off()
|
||
|
||
|
||
while(True):
|
||
img = sensor.snapshot()
|
||
clock.tick() # Starts tracking elapsed time.
|
||
|
||
# default settings just do one detection.
|
||
for obj in net.classify(img,
|
||
min_scale=1.0,
|
||
scale_mul=0.8,
|
||
x_overlap=0.5,
|
||
y_overlap=0.5):
|
||
fps = clock.fps()
|
||
lat = clock.avg()
|
||
|
||
print("**********\nPrediction:")
|
||
img.draw_rectangle(obj.rect())
|
||
# This combines the labels and confidence values into a list of tuples
|
||
predictions_list = list(zip(labels, obj.output()))
|
||
|
||
max_val = predictions_list[0][1]
|
||
max_lbl = 'background'
|
||
for i in range(len(predictions_list)):
|
||
val = predictions_list[i][1]
|
||
lbl = predictions_list[i][0]
|
||
|
||
if val > max_val:
|
||
max_val = val
|
||
max_lbl = lbl
|
||
|
||
# Print label and turn on LED with the highest probability
|
||
if max_val < 0.8:
|
||
max_lbl = 'uncertain'
|
||
|
||
setLEDs(max_lbl)
|
||
|
||
print("{} with a prob of {:.2f}".format(max_lbl, max_val))
|
||
print("FPS: {:.2f} fps ==> latency: {:.0f} ms".format(fps, lat))
|
||
|
||
# Draw label with highest probability to image viewer
|
||
img.draw_string(
|
||
10, 10,
|
||
max_lbl + "\n{:.2f}".format(max_val),
|
||
mono_space = False,
|
||
scale=2
|
||
)
|
||
|
||
```
|
||
|
||
Now, each time that a class gets a result superior of 0.8, the
|
||
correspondent LED will be light on as below:
|
||
|
||
- Led Red 0n: Uncertain (no one class is over 0.8)
|
||
|
||
- Led Green 0n: Periquito \> 0.8
|
||
|
||
- Led Blue 0n: Robot \> 0.8
|
||
|
||
- All LEDs Off: Background \> 0.8
|
||
|
||
Here is the result:
|
||
|
||
{width="6.5in"
|
||
height="3.6527777777777777in"}
|
||
|
||
In more detail
|
||
|
||
{width="6.5in"
|
||
height="2.0972222222222223in"}
|
||
|
||
### **Image Classification (non-official) Benchmark**
|
||
|
||
Several development boards can be used for embedded machine learning
|
||
(tinyML), and the most common ones for Computer Vision applications
|
||
(with low energy), are the ESP32 CAM, the Seeed XIAO ESP32S3 Sense, the
|
||
Arduinos Nicla Vison, and Portenta.
|
||
|
||
{width="6.5in"
|
||
height="4.194444444444445in"}
|
||
|
||
Using the opportunity, the same trained model was deployed on the
|
||
ESP-CAM, the XIAO, and Portenta (in this one, the model was trained
|
||
again, using grayscaled images to be compatible with its camera. Here is
|
||
the result, deploying the models as Arduino\'s Library:
|
||
|
||
{width="6.5in"
|
||
height="3.4444444444444446in"}
|
||
|
||
## Conclusion
|
||
|
||
Before we finish, consider that Computer Vision is more than just image
|
||
classification. For example, you can develop Edge Machine Learning
|
||
projects around vision in several areas, such as:
|
||
|
||
- **Autonomous Vehicles**: Use sensor fusion, lidar data, and computer
|
||
> vision algorithms to navigate and make decisions.
|
||
|
||
- **Healthcare**: Automated diagnosis of diseases through MRI, X-ray,
|
||
> and CT scan image analysis
|
||
|
||
- **Retail**: Automated checkout systems that identify products as
|
||
> they pass through a scanner.
|
||
|
||
- **Security and Surveillance**: Facial recognition, anomaly
|
||
> detection, and object tracking in real-time video feeds.
|
||
|
||
- **Augmented Reality**: Object detection and classification to
|
||
> overlay digital information in the real world.
|
||
|
||
- **Industrial Automation**: Visual inspection of products, predictive
|
||
> maintenance, and robot and drone guidance.
|
||
|
||
- **Agriculture**: Drone-based crop monitoring and automated
|
||
> harvesting.
|
||
|
||
- **Natural Language Processing**: Image captioning and visual
|
||
> question answering.
|
||
|
||
- **Gesture Recognition**: For gaming, sign language translation, and
|
||
> human-machine interaction.
|
||
|
||
- **Content Recommendation**: Image-based recommendation systems in
|
||
> e-commerce.
|