This presentation describes the various components, classification and application of Artificial Neural Networks. It also gives an outline on the other soft computing techniques also.
2. Soft Computing
• Introduced by Lotfi A. Zadeh, University of California, Berkley
• Collection of computational methods
• Includes Fuzzy Systems, Neural Networks and Evolutionary Algorithms
• Deployment of soft computing for the solution of machine learning problems has led to high Machine Intelligence
Quotient
UITE221 SOFT COMPUTING 2
Image Credit: Electrical Engineering and Computer Sciences, UC, Berkeley
“Soft computing differs from hard computing (conventional computing) in its tolerance to
imprecision, uncertainty and partial truth”
-Lotfi A. Zadeh
4. Neural Networks
• Simplified models of the biological nervous system
• Processing elements called neurons – inspired by the brain
• Parallel distributed processing
• Characteristics:
– mapping capabilities or pattern association
– robustness
– fault tolerance
– parallel and high speed information processing
– nonlinearity
– adaptivity
UITE221 SOFT COMPUTING 4
This Photo by Unknown Author is licensed under CC BY-SA
Sensory
inputs
Dendrite
Axon
Soma
Synapse
Nucleus
6. Simple Model of Artificial Neuron
UITE221 SOFT COMPUTING 6
Σ ƒ
w1
w2
wn
.
.
.
xn
x2
x1
Inputs
Weights
Summation
unit
Summation of
weighted inputs
Thresholding
unit
Thresholding
output
Output
7. Simple Model of Artificial Neuron
Let 𝐼 be the total input received by the soma of the artificial neuron
𝐼 = 𝑤1𝑥1 + 𝑤2𝑥2+. . . +𝑤𝑛𝑥𝑛
𝐼 =
𝑖=1
𝑛
𝑤𝑖𝑥𝑖
To generate the output 𝑦, the sum 𝐼 is passed on to a non-linear filter 𝜙 called the Activation function or Transfer function
or Squash Function
𝑦 = 𝜙 𝐼
UITE221 SOFT COMPUTING 7
8. Activation Functions: Heaviside function
Very commonly used activation function: Thresholding function
The sum is compared with a threshold value 𝜃. If 𝐼 > 𝜃, then the output is 1 else it is 0
𝑦 = 𝜙
𝑖=1
𝑛
𝑤𝑖𝑥𝑖 − 𝜃
where, 𝜙 is the step function known as Heaviside function and is such that
𝜙 𝐼 =
1, 𝐼 > 0
0, 𝐼 ≤ 0
UITE221 SOFT COMPUTING 8
Input
I
1
0 𝜃
Threshold
𝜙(𝐼)
Output
9. Activation Functions: Signum function
Also known as Quantizer function
𝜙 𝐼 =
+1, 𝐼 > 0
−1, 𝐼 ≤ 0
UITE221 SOFT COMPUTING 9
Input
I
+1
0
Threshold
-1 𝜃
𝜙(𝐼)
Output
10. Activation Functions: Sigmoidal function
UITE221 SOFT COMPUTING 10
This Photo by Unknown Author is licensed under CC BY-SA
Varies gradually between the asymptotic values 0 and 1 or -1 and +1
𝜙 𝐼 =
1
1 + 𝑒−𝛼𝐼
where, 𝛼 is the slope parameter
The function is differentiable
Prone to vanishing gradient problem
When gradient reaches 0, the network do not learn
11. Activation Functions: Hyperbolic tangent function
Also known as tanh function
𝜙 𝐼 = tanh 𝐼
Scaled version of sigmoid function
Leads to vanishing gradient problem in very deep neural networks
UITE221 SOFT COMPUTING 11
This Photo by Unknown Author is licensed under CC BY-SA
12. Other popular activation functions: ReLU and Softmax
UITE221 SOFT COMPUTING 12
This Photo by Unknown Author is licensed under CC BY-SA
• Most widely used
• Does not activate all neurons at the same time
• If input is negative the neuron will not get activated
• Overcomes the vanishing gradient problem
• Suited for hidden layers
Softmax Function
Softmax is a type of sigmoid function
Used in handling
Ideally used in output layer of the classification
𝐼𝑛 =
𝑒𝑧𝑛
𝑘=1
𝑚
𝑒𝑧𝑘
14. Neural Network Architectures
UITE221 SOFT COMPUTING 14
This Photo by Unknown Author is licensed under CC BY-SA This Photo by Unknown Author is licensed under CC BY-SA
This Photo by Unknown Author is licensed under CC BY-SA
Single layer feedforward Network Multilayer feedforward Network
Recurrent Networks
15. Characteristics of Neural Networks
• Exhibit mapping capabilities
• Learn by examples
• Possess the capability to generalize
• Robust and Fault-tolerant
• Can process information in parallel, at a high speed and in a distributed manner.
UITE221 SOFT COMPUTING 15
17. Supervised Learning
• Every input pattern that is used to train the network is associated with an output pattern, which is the target or the
desired pattern
• A teacher is assumed to be present to make comparison between computed output and expected output, to determine
error
• The error can be used to change network parameters which can be used to improve the performance of the network
UITE221 SOFT COMPUTING 17
18. Unsupervised learning
• Target output is not presented to the network
• System learns of its own by discovering and adapting to structural features in the input pattern
UITE221 SOFT COMPUTING 18
19. Reinforced Learning
• Though teacher will be present, does not present the expected answer but will indicate whether the computed output
is correct or not
• Reward is given for correct answer and penalty for a wrong answer
• Not a widely used method of learning
UITE221 SOFT COMPUTING 19
20. Hebbian Learning
• Proposed by Hebb (1949)
• Based on correlative weight adjustment
𝑊 =
𝑖=1
𝑛
𝑋𝑖𝑌𝑖
𝑇
where (𝑋𝑖, 𝑌𝑖) is the input-output pattern pair
UITE221 SOFT COMPUTING 20
21. Gradient descent learning
• Based on minimization of error defined in terms of weights and activation function of the network
• Activation function deployed whould be differentiable since the weight update depends on the gradient of the error
• If Δ𝑊𝑖𝑗 is the weight update of the link connecting the ith and jth neuron of two neighboring layers, then
Δ𝑊𝑖𝑗 = 𝜂
𝜕𝐸
𝜕𝑊𝑖𝑗
where 𝜂 is the learning rate parameter and
𝜕𝐸
𝜕𝑊𝑖𝑗
is the error gradient with reference to the weight 𝑊𝑖𝑗
UITE221 SOFT COMPUTING 21
22. Competitive Learning
• Those neurons which respond strongly to input stimuli have their weights updated
• When an input pattern is presented, all neurons in the layer compete and the winning neuron undergoes weight
adjustment
• “Winner-takes-all” strategy
UITE221 SOFT COMPUTING 22
23. Stochastic learning
• Weights are adjusted in a probabilistic fashion
• e.g.: simulated annealing deployed by Boltzmann and Cauchy machines
UITE221 SOFT COMPUTING 23
25. Evolution of Neural Networks
Year Neural Network Designer
1943 McCulloch and Pitts Neuron McCulloch and Pitts
1949 Hebb Network Hebb
1958, 1959, 1962, 1988 Perceptron Frank Rosenblatt, Block, Minsky and Papert
1960 ADALINE Widrow and Hoff
1972 Kohonen self organizing feature
map
Kohonen
1982, 1984, 1985,
1986, 1987
Hopfield Network John Hopfield and Tank
1986 Back Propagation Network Rumelhart, Hinton and Williams
1988 Counter-Propagation network Grossberg
1987-1990 Adaptive Resonance Theory (ART) Carpenter and Grossberg
1988 Radial basis function network Broomhead and Lowe
1988 Neo cognitron Fukushima
UITE221 SOFT COMPUTING 25
26. Basic Models
Models of ANN are specified by three basic entities namely
1. Synaptic interconnections of the models
2. Training or learning rules adopted for updating and adjusting the connection weights
3. Activation functions
UITE221 SOFT COMPUTING 26
27. Connections
Five basic types of neuron connection architectures
1. Single-layer feed-forward network
2. Multilayer feed-forward network
3. Single node with its own feedback
4. Single-layer recurrent network
5. Multilayer recurrent network
UITE221 SOFT COMPUTING 27
30. Basic Models: Connection: Single node with its own feedback
UITE221 SOFT COMPUTING 30
A1
Ai
Am
Ak
-𝜀
-𝜀
-𝜀
-𝜀
-𝜀
-𝜀
Input
Output
Feedback
Competitive Nets
34. Basic Models: Learning
Two kinds of learning
1. Parameter Learning: updates the connecting weights in neural network
2. Structure learning: focuses on the change in network structure (no. of processing elements, connection types)
Three categories of learning
1. Supervised Learning
2. Unsupervised Learning
3. Reinforcement Learning
UITE221 SOFT COMPUTING 34
35. Basic Models: Learning: Supervised Learning
UITE221 SOFT COMPUTING 35
Neural Network
W
Error Signal
Generator
Y
(Actual Output)
D
(Desired Output)
X
(Input)
Error
(D-Y)
signals
36. Basic Models: Learning: Unsupervised Learning
UITE221 SOFT COMPUTING 36
Artificial Neural
Network
W
Y
(Actual Output)
X
(Input)
37. Basic Models: Learning: Reinforcement Learning
UITE221 SOFT COMPUTING 37
Neural Network
W
Error Signal
Generator
Y
(Actual Output)
R
(Reinforcement
Signal)
X
(Input)
Error
signals
38. Basic Models: Activation Functions
1. Identity function
2. Binary step function
3. Bipolar step function
4. Sigmoidal function
i. Binary sigmoid function
ii. Bipolar sigmoid function
5. Ramp function
UITE221 SOFT COMPUTING 38
39. Basic Models: Activation Functions: Identity function
𝑓 𝑥 = 𝑥 for all 𝑥
UITE221 SOFT COMPUTING 39
This Photo by Unknown Author is licensed under CC BY
40. Basic Models: Activation Functions: Binary step function
𝑓 𝑥 =
1 𝑖𝑓 𝑥 ≥ 𝜃
0 𝑖𝑓 𝑥 < 𝜃
𝜃 represents the threshold values
• Widely used in single-layer nets to convert the net input to an output
that is a binary (1 or 0)
• Also known as Heaviside function (Refer slide 8)
UITE221 SOFT COMPUTING 40
Input
I
1
0 𝜃
Threshold
𝜙(𝐼)
Output
41. Basic Models: Activation Functions: Bipolar step function
𝑓 𝑥 =
1 𝑖𝑓 𝑥 ≥ 𝜃
−1 𝑖𝑓 𝑥 < 𝜃
𝜃 represents the threshold values
• Used in single-layer nets to convert the net input to an output that is
a bipolar (+1 or -1)
• Also known as signum function (Refer slide 9)
UITE221 SOFT COMPUTING 41
Input
I
+1
0
Threshold
-1 𝜃
𝜙(𝐼)
Output
42. Basic Models: Activation Functions: Sigmoidal function
Binary sigmoid function
• Also known as logistic sigmoid function or unipolar sigmoid function
• Range is from 0 to 1
𝑓 𝑥 =
1
1 + 𝑒−𝜆𝑥
where 𝜆 is the steepness parameter
Derivative of this function is
𝑓′
𝑥 = 𝜆𝑓 𝑥 [1 − 𝑓(𝑥)]
UITE221 SOFT COMPUTING 42
This Photo by Unknown Author is licensed under CC BY-SA
43. Basic Models: Activation Functions: Sigmoid function (Contd…)
Bipolar sigmoid function
𝑓 𝑥 =
2
1 + 𝑒−𝜆𝑥
− 1 =
1 − 𝑒−𝜆𝑥
1 + 𝑒−𝜆𝑥
where 𝜆 is the steepness parameter
The derivative of this function is
𝑓′
𝑥 =
𝜆
2
1 + 𝑓 𝑥 1 − 𝑓 𝑥
This function is closely related to the hyperbolic tangent function
UITE221 SOFT COMPUTING 43
This Photo by Unknown Author is licensed under CC BY-SA
46. Weights
• Contains information about the input signal which is used to solve a problem
• Can be represented in terms of matrix
• Also known as connection matrix
• Weights encode long-term memory [LTM] and the activation states short-term memory [STM]
• Assume “n” processing elements and each element has “m” adaptive weights, weight matrix W is defined by
𝑊 =
𝑤1
𝑇
𝑤2
𝑇
⋮
𝑤𝑛
𝑇
=
𝑤11 𝑤12 …
𝑤21 𝑤22 …
⋮ ⋮ …
𝑤𝑛1 𝑤𝑛2 …
𝑤1𝑚
𝑤2𝑚
⋮
𝑤𝑛𝑚
UITE221 SOFT COMPUTING 46
47. Bias
• Almost like another weight, say 𝑤0𝑗 = 𝑏𝑗
𝑦𝑖𝑛𝑗 = 𝑏𝑗 +
𝑖=1
𝑛
𝑥𝑖𝑤𝑖𝑗
• Consider the line equation 𝑦 = 𝑚𝑥 + 𝑐, 𝑐 may be considered as a bias
• Two types: positive bias and negative bias
UITE221 SOFT COMPUTING 47
48. Threshold
• A value upon which the final output of a network may be calculates
• Used in activation function
• Based on the threshold value, the activation functions are defined and the output is calculated
𝑓 𝑛𝑒𝑡 =
1 𝑖𝑓 𝑛𝑒𝑡 ≥ 𝜃
−1 𝑖𝑓 𝑛𝑒𝑡 < 𝜃
where 𝜃 is the threshold
UITE221 SOFT COMPUTING 48
49. Learning Rate
• Denoted by 𝛼
• Used to control the amount of weight adjustment at each step of training
• Ranges from 0 to 1 and determines the rate of learning at each time step
UITE221 SOFT COMPUTING 49
50. Momentum Factor
• Faster convergence if momentum factor is added to the weight update process
• Generally done in back propagation network
• Weights from one or more previous training patterns must be saved to use momentum
UITE221 SOFT COMPUTING 50
51. Vigilance Parameter
• Denoted by 𝜌
• Generally used in adaptive resonance theory (ART) network
• Used to control the degree of similarity required for patterns to be assigned to same cluster
• Ranges approximately from 0.7 to 1 to perform useful work in controlling the number of clusters
UITE221 SOFT COMPUTING 51
52. Applications
• Pattern recognition / Image processing
• Optimization / constraint satisfaction
• Forecasting and risk assessment
• Control Systems
UITE221 SOFT COMPUTING 52
53. References
Rajasekaran, S., & Pai, G. V. (2017). Neural Networks, Fuzzy Systems and Evolutionary Algorithms: Synthesis and
Applications. PHI Learning Pvt. Ltd..
Haykin, S. (2010). Neural Networks and Learning Machines, 3/E. Pearson Education India.
Sivanandam, S. N., & Deepa, S. N. (2007). Principles of soft computing. John Wiley & Sons.
UITE221 SOFT COMPUTING 53