2. 2
Introduction
Deeplearning is a machine learning technique that teaches computers
to do what comes naturally to humans: learn by example. Deeplearning
is a key technologybehind driverless cars, enabling them to recognize a
stop sign, or to distinguish a pedestrian from a lamppost. It is the key to
voice control in consumer devices like phones, tablets, TVs, and hands-
free speakers. Deeplearning is getting lots of attention lately and for
goodreason. It’s achieving results that were not possible before
we will overview the neuralnetworks and take a generalview of some of
their types as examples to understand thebasics of neural networks and
then move to a general description of deep education
1.Artificial Neural networks(ANN)
systems that processing information, inspired by biological neural systems.
Neural networks consist of many simple processing elements called
Neurons, nodes, units or cells.
Artificial neural networks have been developed as generalizations of
mathematical models of human cognition or neural biology, based on the
assumptions that:-
1. Information processingoccurs at many simple elements called
neurons.
2. Signals are passed between neurons over connection links.
3. Each connection link has an associated weight which, in a typical
neural net, multiplies the signal transmitted.
3. 3
1.1 Type of architecture
The NN consists of input units, output units, and one hidden unit. Neural
nets are often classified as single layer or multilayer by depending on the
number of layers (hidden unit) and the number of layers in the NN can be
defined to be the numbers of layers of weighted interconnect links between
the neurons.
Single layer
NN has one layer of connection weights.
4. 4
Multilayer NN
A Multilayer net is a net with one or more layers (or levels) of nodes which
is called hidden units, between the input units and the output units.
2 What Is Learning and What Is Generalization in ANNs?
We use the properties of ANN to made a model that can be solve a problems
through learning the model by given it a training data and adjusting the
weights until the output achieved our goal.
Rather than use a single program to solve every single problem we can
generalization our model to solve several problems.
5. 5
There is three type of learning :
a- Supervised learning : The training data comprise input vectors x and the
desired output vectors y. Training is performed until the neural
network "learns" to associate each input vector x to its corresponding
and desired output vector y.
b- Unsupervised: Only input vectors x are supplied; the neural network
learns some internal features of the whole set of all the input vectors
presented to it.
6. 6
c- Self- Organization
sometimes called reward-penalty learning (reinforcement), is a combination
of the above two paradigms; it is based on presenting input vector x to a
neural network and looking at the output vector calculated by the network. If
it is considered "good," then a "reward" is given to the network in the sense
that the existing connection weights are increased; otherwise the network is
"punished", the connection weights, being considered as "not appropriately
set," decrease.
Learning type
Hebbian Learning Rule
It’s a learning rule that suggested by Hebb in 1949, and it can represented by
the rule: If two neurons on either side of connection are activated
synchronously, then the weight of that connection is increased.
Note: Hebb's rule provide the basis for unsupervised learning.
Features of Hebb rule
1. Unsupervised
2. Fully conncted.
3. Single Layer (with lack of a hidden layer within the network).
The common uses of Hebb is in Character Recognition
By training the net to distinguish between the pattern "X" and the pattern
"0".
8. 8
3 Deep learning neural network
How deep learning works?
We will take an Image recognition model as an example to illustration how
is deep learning working
First stage : learning the model :-
so we feeding the model with a large amount of images (e.g many
various images cats, dogs, wolf, tigers, ….etc.)as a training data and
learning the model to recognize an unlabeled image , the first layer
usually consistof many activation functions that have specific job
which is detection the edges of images, so it will take the input images
and begin learning itself through obtained the features edges images
and gives these features to the second layer. The second layer
activation functions have another job that is detection the objects parts
(nose, mouth, eyes), so it will obtained the features and learning itself.
The third layer activation functions will detecting the object features
and learning itself through the obtained features.
Second stage : test the model
So, when we input an unlabeled image to the model, for example a
dog image, the first layer will matching the edge of cat image with all
images, and feeds the second layer with only the edges that have
similar features (e.g the cat , wolf, dog and raccoonhave similar edge,
but elephant is different). The second layer will receive the obtained
features with the input image from the first layer and will matching
the object parts of input image with only images that already
determined from the first layer by it features ( it will matching the
nose, mouth, eye, …etc with the images of cat , wolf, dog, and
raccoonand will ignoring the other images) and feeds the third layer
with only the object parts that have similar features (e.g the raccoon
and cat will excluded).
The third layer will receive the obtained features with the input image
from the second layer and will matching the object of the input image
with the images that selected from the previous layer (e.g the wolf,
image with dog image), and the result will be the dog image.
9. 9
Notes:
a. Every layer have an independent function but all layers associated in
hierarchically.
b. In every move from lower layer to the next, the options is decreased
until achieved the goal.
c. The higher layer more complex than the lower layer.
10. 10
E.g. Classify a cat:
Bottom Layers: Edge detectors, curves, corners straight lines
Middle Layers: Fur patterns, eyes, ears
Higher Layers: Body, head, legs
Top Layer: Cat or Dog
11. 11
4 Deep Leaning Algorithms
Also known as deep structured learning, hierarchical learning or deep
machine learning) is a class of machine learning algorithms that use a cascade
of many layers of nonlinear processing units for feature extraction and
transformation. Each successive layer uses the output from the previous layer
as input. The algorithms may be supervised or unsupervised and applications
include pattern analysis (unsupervised) and classification (supervised).
convolutional neural networks (CNN)
A typical CNN is composed of many layers of hierarchy with some layers
for feature representations (or feature maps) and others as a type of
conventional neural networks for classification. It often starts with two
altering types of layers called convolutional and subsampling layers:
convolutional layers perform convolution operations with several filter maps
of equal size, while subsampling layers reduce the sizes of proceeding layers
by averaging pixels within a small neighborhood (or by max-pooling.
Facebook uses neural nets for their automatic tagging algorithms, Google for
their photo search, Amazon for their product recommendations, and Instagram
for their search infrastructure. Simple ConvNet is a sequence of layers:
Convolutional Layer, Pooling Layer, and Fully-Connected Layer.
12. 12
Figure shows a typical architecture of CNNs. The input is first convoluted
with a set of filters (C layers in Fig). These 2D filtered data are called feature
maps. After a nonlinear transformation, a subsampling is further performed to
reduce the dimensionality (S layers in Figure 2). The sequence of
convolution/subsampling can be repeated many times (predetermined by
users).
1-Convolutional Layer: Also referred to as Conv. layer, it forms the basis of
the CNN and performs the core operations of training and consequently firing
the neurons of the network. It performs the convolution operation over the input
volume and consists of a 3-dimensional arrangement of neurons (a stack of 2-
dimensional layers of neurons, one for each channel depth). That make :
Filters (Convolution Kernels) :A filter (or kernel) is an integral component of
the layered architecture. it refers to an operator applied to the entirety of the
image such that it transforms the information encoded in the pixels
Figure (2) CNN
13. 13
Spatial arrangement. We have explained the connectivity of each neuron in
the Conv Layer to the input volume, but we haven’t yet discussed how many
neurons there are in the output volume or how they are arranged. Three
hyper parameters control the size of the output volume: the depth,
stride and zero-padding. We discuss these next:
1. First, the depth of the output volume is a hyper parameter: it corresponds
to the number of filters we would like to use, each learning to look for
something different in the input. Forexample, if the first Convolutional
Layer takes as input the raw image, then different neurons along the depth
dimension may activate in presence of various oriented edges, or blobs of
color. We will refer to a set of neurons that are all looking at the same
region of the input as a depth column (some people also prefer the
term fibre).
2. Second, we must specify the stride with which we slide the filter. When
the stride is 1 then we move the filters one pixel at a time. When the
stride is 2 (or uncommonly 3 or more, though this is rare in practice) then
the filters jump 2 pixels at a time as we slide them around. This will
producesmaller output volumes spatially.
3. As we will soonsee, sometimes it will be convenient to pad the input
volume with zeros around the border. The size of this zero-padding is a
hyper parameter. The nice feature of zero padding is that it will allow us
to control the spatial size of the output volumes (most commonly as we’ll
see soonwe will use it to exactly preserve the spatial size of the input
volume so the input and output width and height are the same).
We can compute the spatial size of the output volume as a function of the
input volume size (W), the receptive field size of the Conv Layer neurons
(F), the stride with which they are applied (S), and the amount of zero
padding used (P) on the border. You can convince yourself that the correct
formula for calculating how many neurons “fit” is given
by (W−F+2P)/S+1(W−F+2P)/S+1.
14. 14
Illustration of spatial arrangement. In this example there is only one spatial
dimension (x-axis), one neuron with a receptive field size of F = 3, the input
size is W = 5, and there is zero padding of P = 1.
Left: The neuron strided across the input in stride of S = 1, giving output of
size (5 - 3 + 2)/1+1 = 5.
Right: The neuron uses stride of S = 2, giving output of size
(5 - 3 + 2)/2+1 = 3. Notice that stride S = 3 could not be used since it
wouldn't fit neatly across the volume. In terms of the equation, this can be
determined since (5 - 3 + 2) = 4 is not divisible by 3.
The neuron weights are in this example (shown on very right), and its bias is
zero. These weights are shared across all yellow neurons (see parameter
sharing below
2-Pooling layer (subsampling layer): it is between successive Conv layers in
ConvNet architecture. Its function is to progressively reduce the spatial size of
the representation to reduce the amount of parameters and computation in the
network, and hence to also control over fitting. The Pooling Layer operates
independently on every depth slice of the input and resizes it spatially, using the
MAX operation. The most common form is a pooling layer with filters of size
2x2 applied with a stride of 2 down samples every depth slice in the input by 2
along both width and height, discarding 75% of the activations. Every MAX
operation would in this case be taking a max over 4 numbers (little 2x2region in
some depth slice). The depth dimension remains unchanged.
15. 15
3-The Fully Connected layer: is configured exactly the way its name implies: it
is fully connected with the output of the previous layer. Fully-connected layers
are typically used in the last stages of the CNN to connect to the output layer and
construct the desired number of outputs.
16. 16
Advantages
1-Reduces the need for feature engineering, one of the most time-consuming
parts of machine learning practice
2-Is an architecture that can be adapted to new problems relatively easily
Disadvantage
1-Requires a large amount of data.
2-Is extremely computationally expensive to train. The most complex odels
take weeks to train using hundreds of machines equipped with expensive
GPUs.
Deeplearning applications :
1. Natural language processing
2. Computer vision
3. speechrecognition
References
1. Fundamentals of the New Artificial Intelligence Neural, Evolutionary, Fuzzy
and More Second Edition. By Toshinori Munakata, Springer-Verlag London
Limited 2008
2. Foundations of Neural Networks, Fuzzy Systems, and Knowledge
Engineering, by Nikola K. Kasabov, 1996 Massachusetts Institute of Technology
3. Fundamentals Of Neural Networks, by Laurene Fausett, Prentice Hall 1993
4. Deep Learning, by Ian Goodfellow, Yoshua Bengio, Aaron Courville
5. Deep Learning Methods and Applications, by Li Deng and Dong Yu 2014
6. http://cs231n.github.io/convolutional-networks/
7. Predicting Stock Markets with Neural Networks A Comparative Study, by
Torkil Aamodt 2015.