The document discusses Convolutional Neural Networks (CNNs), a type of deep learning algorithm used for computer vision tasks. CNNs have convolutional layers that apply filters to input images to extract features, and pooling layers that reduce the spatial size of representations. They use shared weights and local connectivity to classify images. Common CNN architectures described include LeNet-5, AlexNet, VGG16, GoogLeNet and ResNet, with increasing numbers of layers and parameters over time.
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
CNN overview for image classification
1. Convolutional Neural
Network (CNN)
1
In the name of God
Mehrnaz Faraz
Faculty of Electrical Engineering
K. N. Toosi University of Technology
Milad Abbasi
Faculty of Electrical Engineering
Sharif University of Technology
2. CNN
• A supervised deep learning algorithm
• Not fully connected neural network
• Suitable for big data and tensors
– Tensor: Multidimensional array
• Uses relatively little pre-processing compared to other
algorithms
2
3. Using CNN
• Computer vision
– Face recognition
– Scene labelling
– Image classification
– Action recognition
– Human pose estimation
– Document analysis
• Natural Language Processing
– Speech recognition
3
6. CNN
• Using convolutional layers
• Using pooling layers
• Using multiple filters in a layer
– Creates different outputs in a layer
• Suitable for image data
6
7. Convolutional Layer
• An example input volume in red (e.g. a 32x32x3 image)
– Color image: Height, Width, Depth (Channels)
– Each pixel has 3 channels (R,G and B)
Input image: 32x32x3
Filter: 5x5x3
7
32
32
3
Height
width
depth
5
5
3
12. Convolutional Layer
• Convolutional layer is NOT fully connected
– Each neuron is connected only to a local region in the input
volume spatially
12
13. Convolutional Layer
• Increasing number of neurons Increasing parameters
and computational bourdon
• Parameter sharing
– Sharing of weights by all neurons in a particular feature map
– Reduces the number of parameters
• Local connectivity
– Each neural connected only to a subset of the input image
13
14. Number of Parameters
14
Input: 256x256x3
Parameters: 256*256*3+1=196,609 Parameters: 128*128*3+1=49,153
Kernel: 128x128x3
Parameter sharing
17. Padding
• The size of the feature map is smaller than the input
• To maintain the same dimensionality
– Using padding to surround the input with zero
17
19. Example
• Size of feature map:
– i: size of input
– K: size of kernel
– p: padding
– s: stride
– o: size of feature map
19
2
1
i p k
o
s
20. Non-linearity
• Adds ReLU after each convolutional layer
• To introduce nonlinearity to a system that basically has just
been computing linear operations during the conv layers
• ReLU dose not saturate
20
Input Image
Feature Maps
Convolutional Layer/ Stacked feature map
22. Pooling Layer
• Or subsampling layer
• Periodically in-between Conv layers in a ConvNet
• Reduce the amount of parameters, size of data, and
computation in the network
• Control overfitting
• Types of pooling:
– Stride
– Mean pooling
– Max pooling
– Sum pooling
22
24. CNN Overview
• CNNs have two components:
– The Hidden layers/Feature extraction part
• Perform a series of convolutions and pooling operations
• The convolution is performed on the input data with the
use of a filter or kernel to then produce a feature map
– The Classification part
• Assign a probability for the object on the image being
what the algorithm predicts it is
24
29. Common Architectures in CNN
• Classic network architectures:
– LeNet-5
– AlexNet
– VGG16
• Modern network architectures:
– Inception (GoogLeNet)
– ResNet
– ResNeXt
– DenseNet
29
30. LeNet-5
– 7 layers
– 3 convolutional layers (C1, C3 and C5)
– 2 sub-sampling (pooling) layers (S2 and S4)/ mean pooling
– 1 fully connected layer (F6)
– 60,000 parameters
30
LeCun et al. in 1998
31. AlexNet
– The general architecture is quite similar to LeNet-5
– This model is considerably larger than LeNet-5
– Opening for computer vision tasks with deep learning
– 60 million parameters
31
Alex Krizhevsky et al. in 2012
32. VGG16
– Offers a deeper yet simpler variant of the convolutional
structures
– 138 million parameters
32
Introduced in 2014
33. GoogLeNet
– Comprised of a basic unit referred to as an "Inception
cell
33
In 2014, researchers at Google