• A convolutional neural network (CNN,
or ConvNet) is a class of deep, feed-
forward artificial neural networks that has
Successfully been applied to analyzing visual imagery.
• A CNN consists of an input and an output layer, as
well as multiple hidden layers. The hidden layers are
either convolutional, pooling or fully connected.
3. Convolutional Layer:
• Convolutional layers apply a convolution operation
to the input, passing the result to the next layer. The
convolution emulates the response of an individual
neuron to visual stimuli.
• Each convolutional neuron processes data only for
its receptive field. Tiling allows CNNs to
tolerate translation of the input image.
5. Pooling Layer:
• Convolutional networks may include local or global
pooling layers , which combine the outputs of
neuron clusters at one layer into a single neuron in
the next layer for minimizing the risk.
7. Fully Connected layer:
• Fully connected layers connect every neuron in one
layer to every neuron in another layer. It is in
principle the same as the traditional multi-layer
perceptron neural network.
3D volumes of neurons. Convolutional Neural Networks take advantage
of the fact that the input consists of images and they constrain the
architecture in a more sensible way.
• In particular, unlike a regular Neural Network, the layers of a
ConvNet have neurons arranged in 3 dimensions: width, height,
• For example, the input images in CIFAR-10 are an input
volume of activations, and the volume has dimensions
32x32x3 (width, height, depth respectively). As we will
soon see, the neurons in a layer will only be connected to
a small region of the layer before it, instead of all of the
neurons in a fully-connected manner. Moreover, the final
output layer would for CIFAR-10 have dimensions
1x1x10, because by the end of the ConvNet architecture
we will reduce the full image into a single vector of class
scores, arranged along the depth dimension. Here is a