CNN overview for image classification

Convolutional Neural
Network (CNN)
1
In the name of God
Mehrnaz Faraz
Faculty of Electrical Engineering
K. N. Toosi University of Technology
Milad Abbasi
Faculty of Electrical Engineering
Sharif University of Technology

CNN
• A supervised deep learning algorithm
• Not fully connected neural network
• Suitable for big data and tensors
– Tensor: Multidimensional array
• Uses relatively little pre-processing compared to other
algorithms
2

Using CNN
• Computer vision
– Face recognition
– Scene labelling
– Image classification
– Action recognition
– Human pose estimation
– Document analysis
• Natural Language Processing
– Speech recognition
3

Using CNN
4
Face recognition Scene labelling
Human pose estimation
Document analysis

CNN Using
• Classification
• Object detection
• Segmentation
5

CNN
• Using convolutional layers
• Using pooling layers
• Using multiple filters in a layer
– Creates different outputs in a layer
• Suitable for image data
6

Convolutional Layer
• An example input volume in red (e.g. a 32x32x3 image)
– Color image: Height, Width, Depth (Channels)
– Each pixel has 3 channels (R,G and B)
Input image: 32x32x3
Filter: 5x5x3
7
32
32
3
Height
width
depth
5
5
3

Convolutional Layer
• Convolving input with a filter
– Convolution: Sum of element-wise multiplications
– Example:
8

Convolutional Layer
10
Input (x)
Filter (w)
Feature Map
Stacked feature map with 10 different filters
A neuron
(number)
T
w x b

Convolutional Layer
• Stacked feature map:
11
Input
Filter
Filter
Feature Map

Convolutional Layer
• Convolutional layer is NOT fully connected
– Each neuron is connected only to a local region in the input
volume spatially
12

Convolutional Layer
• Increasing number of neurons Increasing parameters
and computational bourdon
• Parameter sharing
– Sharing of weights by all neurons in a particular feature map
– Reduces the number of parameters
• Local connectivity
– Each neural connected only to a subset of the input image
13

Number of Parameters
14
Input: 256x256x3
Parameters: 256*256*3+1=196,609 Parameters: 128*128*3+1=49,153
Kernel: 128x128x3
Parameter sharing

Stride
• Specifies how much we move the convolution filter at
each step
15

Padding
• The size of the feature map is smaller than the input
• To maintain the same dimensionality
– Using padding to surround the input with zero
17

Example
18
P=0, S=1
P=2, S=1
P=1, S=2
P=1, S=2

Example
• Size of feature map:
– i: size of input
– K: size of kernel
– p: padding
– s: stride
– o: size of feature map
19
2
1
i p k
o
s
  
   

Non-linearity
• Adds ReLU after each convolutional layer
• To introduce nonlinearity to a system that basically has just
been computing linear operations during the conv layers
• ReLU dose not saturate
20
Input Image
Feature Maps
Convolutional Layer/ Stacked feature map

Non-linearity
21
• Convolution + ReLU

Pooling Layer
• Or subsampling layer
• Periodically in-between Conv layers in a ConvNet
• Reduce the amount of parameters, size of data, and
computation in the network
• Control overfitting
• Types of pooling:
– Stride
– Mean pooling
– Max pooling
– Sum pooling
22

Pooling Layer
• Mean pooling
• Max pooling
23
With stride 2

CNN Overview
• CNNs have two components:
– The Hidden layers/Feature extraction part
• Perform a series of convolutions and pooling operations
• The convolution is performed on the input data with the
use of a filter or kernel to then produce a feature map
– The Classification part
• Assign a probability for the object on the image being
what the algorithm predicts it is
24

Training
• Back propagation:
28

Common Architectures in CNN
• Classic network architectures:
– LeNet-5
– AlexNet
– VGG16
• Modern network architectures:
– Inception (GoogLeNet)
– ResNet
– ResNeXt
– DenseNet
29

LeNet-5
– 7 layers
– 3 convolutional layers (C1, C3 and C5)
– 2 sub-sampling (pooling) layers (S2 and S4)/ mean pooling
– 1 fully connected layer (F6)
– 60,000 parameters
30
LeCun et al. in 1998

AlexNet
– The general architecture is quite similar to LeNet-5
– This model is considerably larger than LeNet-5
– Opening for computer vision tasks with deep learning
– 60 million parameters
31
Alex Krizhevsky et al. in 2012

VGG16
– Offers a deeper yet simpler variant of the convolutional
structures
– 138 million parameters
32
Introduced in 2014

GoogLeNet
– Comprised of a basic unit referred to as an "Inception
cell
33
In 2014, researchers at Google

CNN overview for image classification

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a CNN overview for image classification

Semelhante a CNN overview for image classification (20)

Último

Último (20)

CNN overview for image classification