3. Kyoto University
• An Artificial Neural Network (ANN) is a system that is based on
biological neural network (brain).
▫ The brain has approximately 100 billion neurons, which communicate
through electro-chemical signals
▫ Each neuron receives thousands of connections (signals)
▫ If the resulting sum of signals surpasses certain threshold, the response is
sent
• The ANN attempts to recreate the computational mirror of the
biological neural network …
Artificial Neural Network
3
5. Kyoto University
What is Perceptron?
5
• A perceptron models a neuron
• It receives n inputs (feature vector)
• It sum those inputs , calculated, then
output
• Used for linear or binary classification
6. Kyoto University 6
Perceptron
• The perceptron consists of weights, the summation processor, and an
activation function
• A perceptron takes a weighted sum of inputs and outputs:
7. Kyoto University
Weight & Bias
7
• Bias can also be treated as another input
▫ The bias allow to shift the line
• The weights determine the slope
8. Kyoto University
Transfer or Activation Functions
8
• The transfer function translate the input signals to output signals
• It uses a threshold to produce an output
• Some examples are
▫ Unit Step (threshold)
▫ Sigmoid (logistic regression)
▫ Piecewise linear
▫ Gaussian
9. Kyoto University 9
Unit Step (Threshold)
• The output is set depending on whether the total input is greater or less
than some threshold value.
11. Kyoto University 11
Sigmoid function
• It is used when the output is expected to be a positive number
▫ It generates outputs between 0 and 1
12. Kyoto University 12
Gaussian
• Gaussian functions are bell-shaped curves that are continuous
• It is used in radial basis function ANN (RBF kernel – Chapter 14)
▫ Output is real value
13. Kyoto University 13
The learning rate
• To update the weights and bias to get smaller error
• Help us control how much we change the weight and bias
14. Kyoto University 14
How the algorithm work?
• Initialize the weights (zero or small random value)
• Pick a learning rate (0 – 1)
• For each training set
• Compute the activation output
▫ Adjusting
Error = differences between predicted and actual
Update bias and weight
• Repeating till the error is very small or zero
• If the it is linear separable, we will found solution
16. Kyoto University 16
What if the data is non-linearly separable?
• Because SLP is a linear classifier and if the data are not linearly
separable, the learning process will never find the solution
• For example: XOR problem
19. Kyoto University 19
• A series of logistic regression models stacked on top of each other, with
the final layer being either another logistic regression or a linear
regression model, depending on whether we are solving a classification
or regression problem.
Multi-layer perceptron (MLP)
23. Kyoto University 23
• Use output error, to adjust the weights of inputs at the output layer
• Calculate the error at the previous layer and use it to adjust the weights
• Repeat this process of back-propagating errors through any number of
layers
• You may find mathematical equation of how to minimize cost function
of neural network at 16.5.4 The backpropagation algorithm
The Back Propagation Algorithm
24. Kyoto University 24
Convolutional neural networks
http://yann.lecun.com/exdb/lenet/index.html
• Designed to recognize visual patterns directly from pixel images with
minimal preprocessing.
• The purpose of multiple hidden units are used to learn non-linear
combination of the original inputs (feature extraction)
▫ Individual Informative
▫ Each pixel in an image is not very informative
▫ But the combination will tell
30. Kyoto University 30
• Simplifier the parameters/features
▫ Remove some unnecessary features
• Regularization
▫ Adjusting the weight
How to address it?
31. Kyoto University 31
• The MLP can overfit, esp. if the number of nodes is large
• A simple way to prevent is early stopping
▫ Stopping the training procedure when the error on the validation set first
start to increase
• Techniques are
▫ Consistent Gaussian prior
▫ Weight pruning: smaller the parameters value
▫ Soft weight sharing: group of parameters value have similar value
▫ Semi-supervised embedding: used with deep learning NN
▫ Bayesian Inference
Determine number of hidden units – faster than cross-validation
Regularization