How to create a neural network that detects people wearing masks. Ultimate description, the A-to-Z workflow for creating a neural network that recognizes images.
A short intro to the paper: https://blog.fulcrum.rocks/neural-network-image-recognition
2. Neural Network From Scratch
is the new black. It seamlessly identifies
people, animals, places, buildings, any
objects you configure.
The world of Image Recognitions moves fast. Hereâs
the proof: facial recognition technology became
Taking all this into account â we have decided to build
our own neural network from scratch. Our goal was to
recognize medical masks on real-life streets footage from
web cams around the world. Hereâs how we did it.
20 times more accurate (from 2014 to 2018).
Image Recognition
Already used by companies like Google, Shutterstock,
Ebay, Salesforce, Pinterest, itâs expected to grow even
more
3. Neural Network From Scratch
Table of contents
1. Technology Stack_
2. The Basics_
3. Educating Modules_
4. Anchors_
5. Labels_
6. Models_
7. Sizes_
8. Jitter_
9. Datasets - Actual images and their description
10. Tools for Labelling_
11. Commands for educating modules_
5. Neural Network From Scratch
We decided to start simple. Earlier we discovered a
system, which worked great for detecting fire.
The basics
It used the ready-made module
in concert with
ImageAI
the special repository
This exact system could recognize fire on images.
We grasped that idea from standard modules and
educational pictures that were available online.
6. Neural Network From Scratch
However, this system used old-school tech
stack. We had to install old-school Python 3.6,
Tensorflow 1, older version of OpenCV.
version 3.6
version 1
Sstruggling with older technologies wasnât convenient.
The only decent solution, in this case, is called
Anaconda. But this also didnât work out.
we tried to compile everything and debug this,
we simply gave it up.
As expected, using older technologies resulted in
failures and bugs. We also couldnât educate models at
all. Typically, we would receive various errors during
the educating process.
After
Older TechnologiesChallenge 1:
7. Neural Network From Scratch
Trying to understand the basicsChallenge 2:
At this stage, we decided to dive right into it &
understand how recognition works from inside. We
used imageAI, rewriting everything in NodeJs. This is
much more convenient than using Python.
Keep in mind, this is Linux-based. If you are using
Windows or macOS youâll need to install Linux
sub-system. Yes, itâs not going to work well on NodeJs.
But the ultimate goal was to understand how the
recognition worked.
âš
So we chose the most powerful module of objects
recognition called This is a truly universal
technology that can recognize anything for a short time
period.
Yolov3
8. Neural Network From Scratch
Old-school Tensorflow versionsChallenge 3:
Nevertheless, we soon faced another problem â âš
was based on darknetYolov3
Tensorflow
After we put everything together on Linux, new
issue came up. The versions werenât compatible,
even though we used the latest versions of each
software.
âš
Turns out that the problem was an old-school
version of . Itâs a neural network for
educating. It was the âbrainsâ behind our software.
So we had to fix this issue altogether for
everything to start working.https://pjreddie.com/darknet/yolo/
9. Neural Network From Scratch
was to convert Tensorflow versions using a special
third-party app. We launched it using the following
code:
tf_upgrade_v2.py --infile yolo.py --outfile
yolo_v2.py
The app converted the older code tf1 into tf2. This
didnât work perfectly, and occasionally we had to
re-write the code. But the problems were partially
gone.
The solution
Whatâs more, it was pretty fun to change to
inside the entire code. Keras became
the part of Tensorflow in its newest version so we
had to completely remove older Keras code from
the project.âš
So after all this struggle, our custom ImageAI still
couldnât educate modules. But it worked pretty well
for recognizing photos and videos (shots). So the
standard pre-educated module could recognize fire
on video, great!
Tensorflow
Keras
10. Neural Network From Scratch
Finally we have found Yolo (Version 3) on a web
forum. It doesnât require Linux and supports nightly
version of Tensorflow 2. We used
âš
The forum user completely rewrote original latest
version of Yolo so that it could support TF2 and
Windows. Finally all versions were compatible. This
library was just perfect for educating models.
Hereâs what data you need to initialize the
program:
Educating modules
this library.
So we started
"anchors": [31,29,86,119,34,27],
"labels": ["mask"],
"net_size": 288
"pretrained": {
"keras_format": "configsmask_500weights.h5",
"darknet_format": "yolov3.weights"
},
"train": {
"min_size": 288,
"max_size": 288,
"num_epoch": 30,
"train_image_folder": "dataset/mask_500/train/images",
"train_annot_folder":
"dataset/mask_500/train/annotations",
"valid_image_folder": "dataset/mask_500/train/images",
"valid_annot_folder":
"dataset/mask_500/train/annotations",
"batch_size": 8,
"learning_rate": 1e-4,
"save_folder": "configs/mask_500",
"jitter": false
}
to dig deeper and found out the following:
11. Neural Network From Scratch
1. Anchors
const kmeans = require("node-kmeans");
export function k_means(ann_dims, anchor_num) {
return kmeans.clusterize(ann_dims, { k: anchor_num }, (err, res) => {
if (err) console.error(err);
// else console.log("%o", res);
}
Anchors are basically the extent of how much the
elements can widen or narrow down; itâs also the
distance that element can move to the centre of
the object.
Basically, we take a on a picture
and move it left or right. In our case, we started
with the simplest solution here. Here it goes:
Central point
12. Neural Network From Scratch
Hereâs the basic logic for calculating âcentralâ points:
Itâs likely that there are better solutions but we
decided not get caught up with this. We already
received the âmagicalâ numbers, so we decided to
move on.
This is simple. Labels are the names of the objects
that we are looking for. In our case, itâs aâšâš
In the perfect-world scenario, we would need to
distinguish faces without masks for comparison.
const clasters = k_means(annotation_dims, num_anchors);
const centroids = [];
clasters.groups.forEach(Group => {
centroids.push(Group.centroid);
});
const anchors: any = centroids;
const widths = anchors.map(c => c[0]);
const sorted_indices: any = widths
.map((item, index) => {
return { item, index };
})
.sort((a, b) => a.item - b.item)
.map(i => i.index);
const anchor_array = [];
let out_string = "";
for (let i = 0; i < sorted_indices.length; i++) {
anchor_array.push(Math.trunc(anchors[i][0] * 416));
anchor_array.push(Math.trunc(anchors[i][1] * 416));
out_string +=
Math.trunc(anchors[i][0] * 416) +
"," +
Math.trunc(anchors[i][1] * 416) +
", ";
}
const reverse_anchor_array = anchor_array;
2. Labels
âMaskâ
13. Neural Network From Scratch
We strictly need yolov3.weights for educating
models. This is a standard pre-defined model,
needed for the initial education. Itâs critical that this
default model shouldnât be further educated; itâs
used for the structure and annotations.
In a nutshell, we need to shrink pictures for educating
process (their size should be multiple of 2). Ideally, the
picture should be shrunk geometrically. Therefore, we need
to define its min size, max size and net size.
Obviously, all pictures have different sizes. Therefore, a new
issue came up â we couldnât combine different shapes. This
means, weâll need to use the same values in all three
parameters.
3. Models
4. SizesMan
Bear
Dog
Kitty cat
Monkey
Bird
Weak Module - 288
Strong Module- 41
14. Neural Network From Scratch
The strong module takes much time to educate.
Itâs pretty accurate, however it doesnât always
recognize all objects. So after we played around
with it, it turned out that it also requires many
images.
This value is used for cropping images [0-1]. As a
rule, we use false or 0.3.
6. Jitter
As a Rule:
Crop images
Batch size is the amount of pictures that are
compared to each other. This number should be
5. Batch size
multiple of 2 If you go for a bigger quantity,
this will result into a high load for your system.
False or 0.3
[0-1]
15. Neural Network From Scratch
7. Datasets - Actual images and their description
Since our task is to recognize masks on human
faces, we simply headed over to Google and
searched for the relevant images
To speed things up, we used a simple utility
â Picture Google Grabber. It retrieves images from
Google using relevant keywords.
There is a slight issue. The previews in Google are
low-quality. We needed to follow each URL and download
original image onsite.
Medical masks on streets
16. Neural Network From Scratch
Picture Google Grabber
Just enter a few search queries and done. Youâll
receive the full collection of images.
file manager Files Grabber
535.jpg
540.jpg
536.jpg
541.jpg
537.jpg
542.jpg
538.jpg 539.jpg
543.jpg 544.jpg
You can make it even more convenient & rename all
images. Select all of them, press F2 â pictures will be
renamed automatically.
17. Neural Network From Scratch
Next step: add annotations to your images, then
highlight all masks. Below you will find the
annotations for Yolov3 in XML format.
Earlier we rewrote using and
now this came in handy.
<annotation>
<folder>images</folder>
<filename>img (3).jpg</filename>
<path>C:GitaaaaaaaaaaaaaaaaaaaaaFIre-detectionsrcmasktraina
nnotationsimg (3).jpg</
path>
<source>
<database>Unknown</database>
</source>
<size>
<width>1280</width>
<height>720</height> âš
<depth>3</depth>
</size>
<segmented>0</segmented>
<object>
<name>mask</name>
<pose>Unspecified</pose>
<truncated>0</truncated>
<difficult>0</difficult>
<bndbox>
<xmin>715</xmin>
<ymin>445</ymin>
<xmax>722</xmax>
<ymax>448</ymax>
</bndbox>
</object>
</annotation>
NodejsImageAi
18. Neural Network From Scratch
Mask
cache
json
logs
models
annotations
images
train
Even though ImageAI failed to educate modules, it
beautifully broke down data into folders. They had
the following structure:
The annotations backup is located in cache.
Meanwhile, we have labels and anchors in JSON.
We really donât need logs and modules. At the same time,
âTrainâ > âValidationâ store pictures & annotations using
similar titles. This structure is pretty convenient for
educating multiple modules and storing data.
The solution worked well with standard models.
Nevertheless, we need to provide our own images &
educate specific pictures with masks.
{"labels":["mask"],"anchors":
[31,29,86,119,34,27,25,29,71,124,44,48,67,69,37,30,45,45]
}
19. Neural Network From Scratch
Labeling
We discovered 4 apps for labelling
images on the web. Here they go,
rated.
At the first sight, the program seems convenient to use.
However, it requires to use Qt. Itâs a fully functional
programming language, which includes many modules.
This language has advantages too.âšâš
For instance, it saves annotations straight into Pascal
Voc XML format, which is exactly what we need. âš
Unfortunately, itâs extremely complicated to install on
Windows, thatâs why we decided to try other solutions.
1) LabelIMG
20. Neural Network From Scratch
This is just a simple webpage on the Internet.
However, as we later found out â itâs pretty sluggish
and hard to use. If indeed, you decide to use it, get
ready to suffer. The major downside here is that the
data is formatted in JSON.
So we had to convert it into Pascal VOC. The webpage
also doesnât have SSL protocols, which is disappointing.
Taking all this into account, we decided to go with
alternative options.
2. VGG Image Annotator
21. Neural Network From Scratch
We gave up on this solution from start. With supervise.ly
you need to highlight the âneededâ object every time.
The truth is â we donât need such power at the moment. âšâš
Whatâs more, users have to clearly define all the
objects/classes/types in advance. If you forget any of it
â you will receive errors. The system stops working.
Naturally, we werenât too excited about this fact. This is
recommended for more accurate, complex labelling. In
our case, we can just use a basic square.
3) Supervise.ly
22. Neural Network From Scratch
Labelbox is an excellent software. It has the
perfect hotkey system and it quickly highlights
lots of objects. Besides, itâs convenient for
teams: multiple people can label objects in the
real-time.
âš
This tool also generates essential statistics like
the amount of missed pictures & it features
many other cool tidbits. Itâs a little surprise, that
we decided to use Labelbox.
4) Labelbox
24. Neural Network From Scratch
const image = cv.imread(`${img_dir}${ann["External ID"]}`);
const size = image.sizes;
Changed this into
<xmin>258</xmin><ymin>208</ymin><xmax>322</xmax><ymax
>244</
ymax>
it DOES NOT include image dimensions
However, thereâs an image title at least. Anyway, we
had to use in order to upload an
image. We took the dimensions from there.
We fixed this with the help of the following code:
Typescript saves the day.
opencv4nodejs
2. As you can see,
3. New Issue That
Follows:
1. { "x": 208, "y": 276 },
2. { "x": 307, "y": 276 },
3. { "x": 307, "y": 352 },
4. { "x": 208, "y": 352 }
const xArr: number[] = [];
const yArr: number[] = [];
object.geometry.forEach(e => {
xArr.push(e.x);
yArr.push(e.y);
});
obj["xmin"] = [Math.min(...xArr)];
obj["ymin"] = [Math.min(...yArr)];
obj["xmax"] = [Math.max(...xArr)];
obj["ymax"] = [Math.max(...yArr)];
25. Neural Network From Scratch
We used Elementree to compose the structure of
the XML tree & set the parameters. Then we
created a loop, so that it would work for many
images.
Next step
All results are kept in the Annotations folder.
Therefore, we have a full dataset with annotations and
convenient structure. The only thing left is to use our
software for educating.
26. Neural Network From Scratch
The software responds to the following
commands:
// =============== read ===================
python src/pred.py -c configs/mask.json -i
imgs/1.jpg âšâš
This command helps to recognize an image. We
just need to prepare our JSON file with the
required parameters (described in the
beginning) and configure the path to the
needed image.
Educating Models
// ================= test ================
python src/eval.py -c configs/ mask.json
This is basically the benchmark of the model. The
command shows 3 different parameters, displaying the
quality of the module.
{'fscore': 0.21052631578947367, 'precision':
0.8461538461538461, 'recall': 0.12021857923497267}
1) Fscore stands for the probability that the model will
find the object on the image.
2) Precision is the probability that the model will find
the right object & wonât make a mistake.
3) Recall â the probability that the square will be
evenly drawn and that it wonât shift its position.
27. Neural Network From Scratch
Educating Models
// ================== train ===================
python src/train_eager.py -c configs/ mask.json
We use this command for educating our neural
network. After lots of mistakes, we achieved the
first results, finally. In this instance, we used 90
images.
âš
Size - 288, Batch_size â 8, num_epoch â 20,
time for education â 1,5 hours.
At this point, we can even process images and videos. But
potentially, with more powerful module this system could
process real-life webcam footage.
Hereâs the result: