Neural network image recognition

Neural

Network From
Scratch
/Image Recognition

Neural Network From Scratch
is the new black. It seamlessly identifies
people, animals, places, buildings, any
objects you configure.
The world of Image Recognitions moves fast. Here’s
the proof: facial recognition technology became
Taking all this into account – we have decided to build
our own neural network from scratch. Our goal was to
recognize medical masks on real-life streets footage from
web cams around the world. Here’s how we did it.
20 times more accurate (from 2014 to 2018).
Image Recognition
Already used by companies like Google, Shutterstock,
Ebay, Salesforce, Pinterest, it’s expected to grow even
more

Table of contents
1. Technology Stack_
2. The Basics_
3. Educating Modules_
4. Anchors_
5. Labels_
6. Models_
7. Sizes_
8. Jitter_
9. Datasets - Actual images and their description
10. Tools for Labelling_
11. Commands for educating modules_

The Essential Tech Stack:

We decided to start simple. Earlier we discovered a
system, which worked great for detecting fire.
The basics
It used the ready-made module
in concert with
ImageAI
the special repository
This exact system could recognize fire on images.
We grasped that idea from standard modules and
educational pictures that were available online.

However, this system used old-school tech
stack. We had to install old-school Python 3.6,
Tensorflow 1, older version of OpenCV.

version 3.6
version 1
Sstruggling with older technologies wasn’t convenient.
The only decent solution, in this case, is called
Anaconda. But this also didn’t work out.

we tried to compile everything and debug this,
we simply gave it up.

As expected, using older technologies resulted in
failures and bugs. We also couldn’t educate models at
all. Typically, we would receive various errors during
the educating process.

After
Older TechnologiesChallenge 1:

Trying to understand the basicsChallenge 2:
At this stage, we decided to dive right into it &
understand how recognition works from inside. We
used imageAI, rewriting everything in NodeJs. This is
much more convenient than using Python.

Keep in mind, this is Linux-based. If you are using
Windows or macOS you’ll need to install Linux
sub-system. Yes, it’s not going to work well on NodeJs.
But the ultimate goal was to understand how the
recognition worked.
 
So we chose the most powerful module of objects
recognition called This is a truly universal
technology that can recognize anything for a short time
period.
Yolov3

Old-school Tensorflow versionsChallenge 3:
Nevertheless, we soon faced another problem –  
was based on darknetYolov3
Tensorflow
After we put everything together on Linux, new
issue came up. The versions weren’t compatible,
even though we used the latest versions of each
software.
 
Turns out that the problem was an old-school
version of . It’s a neural network for
educating. It was the ‘brains’ behind our software.
So we had to fix this issue altogether for

everything to start working.https://pjreddie.com/darknet/yolo/

was to convert Tensorflow versions using a special
third-party app. We launched it using the following
code:
tf_upgrade_v2.py --infile yolo.py --outfile
yolo_v2.py
The app converted the older code tf1 into tf2. This
didn’t work perfectly, and occasionally we had to
re-write the code. But the problems were partially
gone.
The solution
What’s more, it was pretty fun to change to

inside the entire code. Keras became
the part of Tensorflow in its newest version so we
had to completely remove older Keras code from
the project. 

So after all this struggle, our custom ImageAI still
couldn’t educate modules. But it worked pretty well
for recognizing photos and videos (shots). So the
standard pre-educated module could recognize fire
on video, great!
Tensorflow
Keras

Finally we have found Yolo (Version 3) on a web
forum. It doesn’t require Linux and supports nightly
version of Tensorflow 2. We used
 
The forum user completely rewrote original latest
version of Yolo so that it could support TF2 and
Windows. Finally all versions were compatible. This
library was just perfect for educating models.

Here’s what data you need to initialize the
program:
Educating modules
this library.
So we started
"anchors": [31,29,86,119,34,27],

"labels": ["mask"],

"net_size": 288

"pretrained": {

"keras_format": "configsmask_500weights.h5",

"darknet_format": "yolov3.weights"

},

"train": {

"min_size": 288,

"max_size": 288,

"num_epoch": 30,

"train_image_folder": "dataset/mask_500/train/images",

"train_annot_folder":
"dataset/mask_500/train/annotations",

"valid_image_folder": "dataset/mask_500/train/images",

"valid_annot_folder":
"dataset/mask_500/train/annotations",

"batch_size": 8,

"learning_rate": 1e-4,

"save_folder": "configs/mask_500",

"jitter": false

}
to dig deeper and found out the following:

1. Anchors
const kmeans = require("node-kmeans");

export function k_means(ann_dims, anchor_num) {

return kmeans.clusterize(ann_dims, { k: anchor_num }, (err, res) => {

if (err) console.error(err);

// else console.log("%o", res);

}
Anchors are basically the extent of how much the
elements can widen or narrow down; it’s also the
distance that element can move to the centre of
the object.
Basically, we take a on a picture
and move it left or right. In our case, we started
with the simplest solution here. Here it goes:
Central point

Here’s the basic logic for calculating ‘central’ points:
It’s likely that there are better solutions but we
decided not get caught up with this. We already
received the ‘magical’ numbers, so we decided to
move on.
This is simple. Labels are the names of the objects
that we are looking for. In our case, it’s a  
In the perfect-world scenario, we would need to
distinguish faces without masks for comparison.
const clasters = k_means(annotation_dims, num_anchors);

const centroids = [];

clasters.groups.forEach(Group => {

centroids.push(Group.centroid);

});

const anchors: any = centroids;

const widths = anchors.map(c => c[0]);

const sorted_indices: any = widths

.map((item, index) => {

return { item, index };

})

.sort((a, b) => a.item - b.item)

.map(i => i.index);

const anchor_array = [];

let out_string = "";

for (let i = 0; i < sorted_indices.length; i++) {

anchor_array.push(Math.trunc(anchors[i][0] * 416));

anchor_array.push(Math.trunc(anchors[i][1] * 416));

out_string +=

Math.trunc(anchors[i][0] * 416) +

"," +

Math.trunc(anchors[i][1] * 416) +

", ";

}

const reverse_anchor_array = anchor_array;
2. Labels
“Mask”

We strictly need yolov3.weights for educating
models. This is a standard pre-defined model,
needed for the initial education. It’s critical that this
default model shouldn’t be further educated; it’s
used for the structure and annotations.

In a nutshell, we need to shrink pictures for educating
process (their size should be multiple of 2). Ideally, the
picture should be shrunk geometrically. Therefore, we need
to define its min size, max size and net size.

Obviously, all pictures have different sizes. Therefore, a new
issue came up – we couldn’t combine different shapes. This
means, we’ll need to use the same values in all three
parameters.

3. Models
4. SizesMan
Bear
Dog
Kitty cat
Monkey
Bird
Weak Module - 288

Strong Module- 41

The strong module takes much time to educate.
It’s pretty accurate, however it doesn’t always
recognize all objects. So after we played around
with it, it turned out that it also requires many
images.
This value is used for cropping images [0-1]. As a
rule, we use false or 0.3.
6. Jitter
As a Rule:
Crop images
Batch size is the amount of pictures that are
compared to each other. This number should be
5. Batch size
multiple of 2 If you go for a bigger quantity,
this will result into a high load for your system.
False or 0.3
[0-1]

7. Datasets - Actual images and their description
Since our task is to recognize masks on human
faces, we simply headed over to Google and
searched for the relevant images
To speed things up, we used a simple utility
– Picture Google Grabber. It retrieves images from
Google using relevant keywords.

There is a slight issue. The previews in Google are
low-quality. We needed to follow each URL and download
original image onsite.

Medical masks on streets

Picture Google Grabber
Just enter a few search queries and done. You’ll
receive the full collection of images.
file manager Files Grabber
535.jpg
540.jpg
536.jpg
541.jpg
537.jpg
542.jpg
538.jpg 539.jpg
543.jpg 544.jpg
You can make it even more convenient & rename all
images. Select all of them, press F2 – pictures will be
renamed automatically.

Next step: add annotations to your images, then
highlight all masks. Below you will find the
annotations for Yolov3 in XML format.

Earlier we rewrote using and
now this came in handy.
<annotation>

<folder>images</folder>

<filename>img (3).jpg</filename>

<path>C:GitaaaaaaaaaaaaaaaaaaaaaFIre-detectionsrcmasktraina
nnotationsimg (3).jpg</

path>

<source>

<database>Unknown</database>

</source>

<size>

<width>1280</width>

<height>720</height>  
<depth>3</depth>

</size>

<segmented>0</segmented>

<object>
<name>mask</name>

<pose>Unspecified</pose>

<truncated>0</truncated>

<difficult>0</difficult>

<bndbox>

<xmin>715</xmin>

<ymin>445</ymin>

<xmax>722</xmax>

<ymax>448</ymax>

</bndbox>

</object>

</annotation>
NodejsImageAi

Mask
cache
json
logs
models
annotations
images
train
Even though ImageAI failed to educate modules, it
beautifully broke down data into folders. They had
the following structure:
The annotations backup is located in cache.
Meanwhile, we have labels and anchors in JSON.
We really don’t need logs and modules. At the same time,
’Train’ > ‘Validation’ store pictures & annotations using
similar titles. This structure is pretty convenient for
educating multiple modules and storing data.

The solution worked well with standard models.
Nevertheless, we need to provide our own images &
educate specific pictures with masks.
{"labels":["mask"],"anchors":

[31,29,86,119,34,27,25,29,71,124,44,48,67,69,37,30,45,45]

}

Labeling
We discovered 4 apps for labelling
images on the web. Here they go,
rated.

At the first sight, the program seems convenient to use.
However, it requires to use Qt. It’s a fully functional
programming language, which includes many modules.
This language has advantages too.  
For instance, it saves annotations straight into Pascal
Voc XML format, which is exactly what we need.  
Unfortunately, it’s extremely complicated to install on

Windows, that’s why we decided to try other solutions.
1) LabelIMG

This is just a simple webpage on the Internet.

However, as we later found out – it’s pretty sluggish
and hard to use. If indeed, you decide to use it, get
ready to suffer. The major downside here is that the
data is formatted in JSON.

So we had to convert it into Pascal VOC. The webpage
also doesn’t have SSL protocols, which is disappointing.

Taking all this into account, we decided to go with
alternative options.
2. VGG Image Annotator

We gave up on this solution from start. With supervise.ly
you need to highlight the ‘needed’ object every time.
The truth is – we don’t need such power at the moment.   
What’s more, users have to clearly define all the
objects/classes/types in advance. If you forget any of it
– you will receive errors. The system stops working.

Naturally, we weren’t too excited about this fact. This is
recommended for more accurate, complex labelling. In
our case, we can just use a basic square.

3) Supervise.ly

Labelbox is an excellent software. It has the
perfect hotkey system and it quickly highlights
lots of objects. Besides, it’s convenient for
teams: multiple people can label objects in the
real-time.
 
This tool also generates essential statistics like
the amount of missed pictures & it features
many other cool tidbits. It’s a little surprise, that
we decided to use Labelbox.
4) Labelbox

1. The structure of information in JSON is too customized.
1. [

2. {

3. "ID": "",

4. "DataRow ID": "",

5. "Labeled Data": "",

6. "Label": {

7. "mask": [

8. {

9. "geometry": [

10. { "x": 208, "y": 276 },

11. { "x": 307, "y": 276 },

12. { "x": 307, "y": 352 },

13. { "x": 208, "y": 352 }

14. ]

15. }

16. ]

17. },

18. "Created By": "@gmail.com",
19. "Project Name": "masks 500",

20. "Created At": "2020-03-26T07:40:26.000Z",

21. "Updated At": "2020-03-26T07:40:26.000Z",

22. "Seconds to Label": 7.069,

23. "External ID": "img2 (124).jpg",

24. "Agreement": null,

25. "Benchmark Agreement": null,

26. "Benchmark ID": null,

27. "Benchmark Reference ID": null,

28. "Dataset Name": "masks 500",

29. "Reviews": [],

30. "View Label": "",

31. "Masks": {

32. "mask": ""

33. }

34. ]
Labelbox has 2 downsides:

We used Elementree to compose the structure of
the XML tree & set the parameters. Then we
created a loop, so that it would work for many
images.

Next step
All results are kept in the Annotations folder.
Therefore, we have a full dataset with annotations and
convenient structure. The only thing left is to use our
software for educating.

The software responds to the following
commands:

// =============== read ===================

python src/pred.py -c configs/mask.json -i
imgs/1.jpg   
This command helps to recognize an image. We
just need to prepare our JSON file with the
required parameters (described in the
beginning) and configure the path to the
needed image.

Educating Models
// ================= test ================

python src/eval.py -c configs/ mask.json

This is basically the benchmark of the model. The
command shows 3 different parameters, displaying the
quality of the module.

{'fscore': 0.21052631578947367, 'precision':
0.8461538461538461, 'recall': 0.12021857923497267}

1) Fscore stands for the probability that the model will
find the object on the image.

2) Precision is the probability that the model will find
the right object & won’t make a mistake.

3) Recall – the probability that the square will be
evenly drawn and that it won’t shift its position.

Educating Models
// ================== train ===================

python src/train_eager.py -c configs/ mask.json

We use this command for educating our neural
network. After lots of mistakes, we achieved the
first results, finally. In this instance, we used 90
images.
 
Size - 288, Batch_size – 8, num_epoch – 20,

time for education – 1,5 hours.
At this point, we can even process images and videos. But
potentially, with more powerful module this system could
process real-life webcam footage.

Here’s the result:

Neural network image recognition

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a Neural network image recognition

Semelhante a Neural network image recognition (20)

Último

Último (20)

Neural network image recognition