Introduction to Artificial Neural Networks

Introduction: Artificial Neural Network
Adri Jovin J J, M.Tech., Ph.D.
UITE221- SOFT COMPUTING

Soft Computing
• Introduced by Lotfi A. Zadeh, University of California, Berkley
• Collection of computational methods
• Includes Fuzzy Systems, Neural Networks and Evolutionary Algorithms
• Deployment of soft computing for the solution of machine learning problems has led to high Machine Intelligence
Quotient
UITE221 SOFT COMPUTING 2
Image Credit: Electrical Engineering and Computer Sciences, UC, Berkeley
“Soft computing differs from hard computing (conventional computing) in its tolerance to
imprecision, uncertainty and partial truth”
-Lotfi A. Zadeh

Soft Computing (Contd…)
Fuzzy Systems
Neural
Networks
Evolutionary
Algorithms
Fuzzy-evolutionary hybrids Neuro-fuzzy hybrids
Neuro-evolutionary hybrids
Neuro-fuzzy-evolutionary hybrids

Neural Networks
• Simplified models of the biological nervous system
• Processing elements called neurons – inspired by the brain
• Parallel distributed processing
• Characteristics:
– mapping capabilities or pattern association
– robustness
– fault tolerance
– parallel and high speed information processing
– nonlinearity
– adaptivity
This Photo by Unknown Author is licensed under CC BY-SA
Sensory
inputs
Dendrite
Axon
Soma
Synapse
Nucleus

Terminology Relationship
Biological Neuron Artificial Neuron
Cell Neuron
Dendrites Weights or Interconnections
Soma Net input
Axon Output

Simple Model of Artificial Neuron
Σ ƒ
w1
w2
wn
.
.
.
xn
x2
x1
Inputs
Weights
Summation
unit
Summation of
weighted inputs
Thresholding
unit
Thresholding
output
Output

Simple Model of Artificial Neuron
Let 𝐼 be the total input received by the soma of the artificial neuron
𝐼 = 𝑤1𝑥1 + 𝑤2𝑥2+. . . +𝑤𝑛𝑥𝑛
𝐼 =
𝑖=1
𝑛
𝑤𝑖𝑥𝑖
To generate the output 𝑦, the sum 𝐼 is passed on to a non-linear filter 𝜙 called the Activation function or Transfer function
or Squash Function
𝑦 = 𝜙 𝐼

Activation Functions: Heaviside function
Very commonly used activation function: Thresholding function
The sum is compared with a threshold value 𝜃. If 𝐼 > 𝜃, then the output is 1 else it is 0
𝑦 = 𝜙
𝑖=1
𝑛
𝑤𝑖𝑥𝑖 − 𝜃
where, 𝜙 is the step function known as Heaviside function and is such that
𝜙 𝐼 =
1, 𝐼 > 0
0, 𝐼 ≤ 0
Input
I
1
0 𝜃
Threshold
𝜙(𝐼)
Output

Activation Functions: Signum function
Also known as Quantizer function
𝜙 𝐼 =
+1, 𝐼 > 0
−1, 𝐼 ≤ 0
Input
I
+1
0
Threshold
-1 𝜃
𝜙(𝐼)
Output

Activation Functions: Sigmoidal function
Varies gradually between the asymptotic values 0 and 1 or -1 and +1
𝜙 𝐼 =
1
1 + 𝑒−𝛼𝐼
where, 𝛼 is the slope parameter
The function is differentiable
Prone to vanishing gradient problem
When gradient reaches 0, the network do not learn

Activation Functions: Hyperbolic tangent function
Also known as tanh function
𝜙 𝐼 = tanh 𝐼
Scaled version of sigmoid function
Leads to vanishing gradient problem in very deep neural networks

Other popular activation functions: ReLU and Softmax
• Most widely used
• Does not activate all neurons at the same time
• If input is negative the neuron will not get activated
• Overcomes the vanishing gradient problem
• Suited for hidden layers
Softmax Function
Softmax is a type of sigmoid function
Used in handling
Ideally used in output layer of the classification
𝐼𝑛 =
𝑒𝑧𝑛
𝑘=1
𝑚
𝑒𝑧𝑘

Broader classification
Neural Networks
Single layer
feedforward networks
Multilayer
feedforward networks
Recurrent
networks

Neural Network Architectures
This Photo by Unknown Author is licensed under CC BY-SA This Photo by Unknown Author is licensed under CC BY-SA
Single layer feedforward Network Multilayer feedforward Network
Recurrent Networks

Characteristics of Neural Networks
• Exhibit mapping capabilities
• Learn by examples
• Possess the capability to generalize
• Robust and Fault-tolerant
• Can process information in parallel, at a high speed and in a distributed manner.

Learning Methods
Learning Methods
Supervised Unsupervised Reinforced
Gradient Descent Stochastic Hebbian Competitive
Least Mean
Square
Backpropagation

Supervised Learning
• Every input pattern that is used to train the network is associated with an output pattern, which is the target or the
desired pattern
• A teacher is assumed to be present to make comparison between computed output and expected output, to determine
error
• The error can be used to change network parameters which can be used to improve the performance of the network

Unsupervised learning
• Target output is not presented to the network
• System learns of its own by discovering and adapting to structural features in the input pattern

Reinforced Learning
• Though teacher will be present, does not present the expected answer but will indicate whether the computed output
is correct or not
• Reward is given for correct answer and penalty for a wrong answer
• Not a widely used method of learning

Hebbian Learning
• Proposed by Hebb (1949)
• Based on correlative weight adjustment
𝑊 =
𝑖=1
𝑛
𝑋𝑖𝑌𝑖
𝑇
where (𝑋𝑖, 𝑌𝑖) is the input-output pattern pair

Gradient descent learning
• Based on minimization of error defined in terms of weights and activation function of the network
• Activation function deployed whould be differentiable since the weight update depends on the gradient of the error
• If Δ𝑊𝑖𝑗 is the weight update of the link connecting the ith and jth neuron of two neighboring layers, then
Δ𝑊𝑖𝑗 = 𝜂
𝜕𝐸
𝜕𝑊𝑖𝑗
where 𝜂 is the learning rate parameter and
𝜕𝐸
𝜕𝑊𝑖𝑗
is the error gradient with reference to the weight 𝑊𝑖𝑗

Competitive Learning
• Those neurons which respond strongly to input stimuli have their weights updated
• When an input pattern is presented, all neurons in the layer compete and the winning neuron undergoes weight
adjustment
• “Winner-takes-all” strategy

Stochastic learning
• Weights are adjusted in a probabilistic fashion
• e.g.: simulated annealing deployed by Boltzmann and Cauchy machines

Taxonomy of Neural Network Architectures
Adaptive Linear Neural Element (ADALINE)
Adaptive Resonance Theory (ART)
Associative Memory (AM)
Bidirectional Associative Memory (BAM)
Boltzmann Machine
Brain-State-in-a-Box (BSB)
Cascade Correlation (CCN)
Cauchy Machine
Counter Propagation Network (CPN)
Hamming Network
Hopfield Network
Learning Vector Quantization (LVQ)
Many ADALINE (MADALINE)
Multilayer Feedforward Network (MLFF)
Neocognitron
Perceptron
Radial Basis Function (RBF)
Recurrent Neural Network (RNN)
Self-organizing Feature Map (SOFM)

Evolution of Neural Networks
Year Neural Network Designer
1943 McCulloch and Pitts Neuron McCulloch and Pitts
1949 Hebb Network Hebb
1958, 1959, 1962, 1988 Perceptron Frank Rosenblatt, Block, Minsky and Papert
1960 ADALINE Widrow and Hoff
1972 Kohonen self organizing feature
map
Kohonen
1982, 1984, 1985,
1986, 1987
Hopfield Network John Hopfield and Tank
1986 Back Propagation Network Rumelhart, Hinton and Williams
1988 Counter-Propagation network Grossberg
1987-1990 Adaptive Resonance Theory (ART) Carpenter and Grossberg
1988 Radial basis function network Broomhead and Lowe
1988 Neo cognitron Fukushima

Basic Models
Models of ANN are specified by three basic entities namely
1. Synaptic interconnections of the models
2. Training or learning rules adopted for updating and adjusting the connection weights
3. Activation functions

Connections
Five basic types of neuron connection architectures
1. Single-layer feed-forward network
2. Multilayer feed-forward network
3. Single node with its own feedback
4. Single-layer recurrent network
5. Multilayer recurrent network

Basic Models: Connection: Single-layer feed-forward network
x1
x2
xn
y1
y2
ym
w11
w21
wn1
w12
w22
wn2
w1m
w2m
wnm
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Output
Neurons
Input
Neurons
Output
layer
Input
layer

Basic Models: Connection: Single node with its own feedback
A1
Ai
Am
Ak
-𝜀
-𝜀
-𝜀
-𝜀
-𝜀
-𝜀
Input
Output
Feedback
Competitive Nets

Basic Models: Connection: Lateral inhibition structure

Basic Models: Learning
Two kinds of learning
1. Parameter Learning: updates the connecting weights in neural network
2. Structure learning: focuses on the change in network structure (no. of processing elements, connection types)
Three categories of learning
1. Supervised Learning
2. Unsupervised Learning
3. Reinforcement Learning

Basic Models: Learning: Supervised Learning
Neural Network
W
Error Signal
Generator
Y
(Actual Output)
D
(Desired Output)
X
(Input)
Error
(D-Y)
signals

Basic Models: Learning: Unsupervised Learning
Artificial Neural
Network
W
Y
(Actual Output)
X
(Input)

Basic Models: Learning: Reinforcement Learning
Neural Network
W
Error Signal
Generator
Y
(Actual Output)
R
(Reinforcement
Signal)
X
(Input)
Error
signals

Basic Models: Activation Functions
1. Identity function
2. Binary step function
3. Bipolar step function
4. Sigmoidal function
i. Binary sigmoid function
ii. Bipolar sigmoid function
5. Ramp function

Basic Models: Activation Functions: Identity function
𝑓 𝑥 = 𝑥 for all 𝑥
This Photo by Unknown Author is licensed under CC BY

Basic Models: Activation Functions: Binary step function
𝑓 𝑥 =
1 𝑖𝑓 𝑥 ≥ 𝜃
0 𝑖𝑓 𝑥 < 𝜃
𝜃 represents the threshold values
• Widely used in single-layer nets to convert the net input to an output
that is a binary (1 or 0)
• Also known as Heaviside function (Refer slide 8)
Input
I
1
0 𝜃
Threshold
𝜙(𝐼)
Output

Basic Models: Activation Functions: Bipolar step function
𝑓 𝑥 =
1 𝑖𝑓 𝑥 ≥ 𝜃
−1 𝑖𝑓 𝑥 < 𝜃
𝜃 represents the threshold values
• Used in single-layer nets to convert the net input to an output that is
a bipolar (+1 or -1)
• Also known as signum function (Refer slide 9)
Input
I
+1
0
Threshold
-1 𝜃
𝜙(𝐼)
Output

Basic Models: Activation Functions: Sigmoidal function
Binary sigmoid function
• Also known as logistic sigmoid function or unipolar sigmoid function
• Range is from 0 to 1
𝑓 𝑥 =
1
1 + 𝑒−𝜆𝑥
where 𝜆 is the steepness parameter
Derivative of this function is
𝑓′
𝑥 = 𝜆𝑓 𝑥 [1 − 𝑓(𝑥)]

Basic Models: Activation Functions: Sigmoid function (Contd…)
Bipolar sigmoid function
𝑓 𝑥 =
2
1 + 𝑒−𝜆𝑥
− 1 =
1 − 𝑒−𝜆𝑥
1 + 𝑒−𝜆𝑥
where 𝜆 is the steepness parameter
The derivative of this function is
𝑓′
𝑥 =
𝜆
2
1 + 𝑓 𝑥 1 − 𝑓 𝑥
This function is closely related to the hyperbolic tangent function

Basic Models: Activation Functions: Ramp Function
𝑓 𝑥 =
1 𝑖𝑓 𝑥 > 1
𝑥 𝑖𝑓 0 ≤ 𝑥 ≤ 1
0 𝑖𝑓 𝑥 < 0
Input
1
1
0 𝜃
Threshold
𝜙(𝐼)
Output

Important terminologies
• Weights
• Bias
• Threshold
• Learning Rate
• Momentum Factor
• Vigilance Parameter

Weights
• Contains information about the input signal which is used to solve a problem
• Can be represented in terms of matrix
• Also known as connection matrix
• Weights encode long-term memory [LTM] and the activation states short-term memory [STM]
• Assume “n” processing elements and each element has “m” adaptive weights, weight matrix W is defined by
𝑊 =
𝑤1
𝑇
𝑤2
𝑇
⋮
𝑤𝑛
𝑇
=
𝑤11 𝑤12 …
𝑤21 𝑤22 …
⋮ ⋮ …
𝑤𝑛1 𝑤𝑛2 …
𝑤1𝑚
𝑤2𝑚
⋮
𝑤𝑛𝑚

Bias
• Almost like another weight, say 𝑤0𝑗 = 𝑏𝑗
𝑦𝑖𝑛𝑗 = 𝑏𝑗 +
𝑖=1
𝑛
𝑥𝑖𝑤𝑖𝑗
• Consider the line equation 𝑦 = 𝑚𝑥 + 𝑐, 𝑐 may be considered as a bias
• Two types: positive bias and negative bias

Threshold
• A value upon which the final output of a network may be calculates
• Used in activation function
• Based on the threshold value, the activation functions are defined and the output is calculated
𝑓 𝑛𝑒𝑡 =
1 𝑖𝑓 𝑛𝑒𝑡 ≥ 𝜃
−1 𝑖𝑓 𝑛𝑒𝑡 < 𝜃
where 𝜃 is the threshold

Learning Rate
• Denoted by 𝛼
• Used to control the amount of weight adjustment at each step of training
• Ranges from 0 to 1 and determines the rate of learning at each time step

Momentum Factor
• Faster convergence if momentum factor is added to the weight update process
• Generally done in back propagation network
• Weights from one or more previous training patterns must be saved to use momentum

Vigilance Parameter
• Denoted by 𝜌
• Generally used in adaptive resonance theory (ART) network
• Used to control the degree of similarity required for patterns to be assigned to same cluster
• Ranges approximately from 0.7 to 1 to perform useful work in controlling the number of clusters

Applications
• Pattern recognition / Image processing
• Optimization / constraint satisfaction
• Forecasting and risk assessment
• Control Systems

References
Rajasekaran, S., & Pai, G. V. (2017). Neural Networks, Fuzzy Systems and Evolutionary Algorithms: Synthesis and
Applications. PHI Learning Pvt. Ltd..
Haykin, S. (2010). Neural Networks and Learning Machines, 3/E. Pearson Education India.
Sivanandam, S. N., & Deepa, S. N. (2007). Principles of soft computing. John Wiley & Sons.

Introduction to Artificial Neural Networks

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Introduction to Artificial Neural Networks

Similar to Introduction to Artificial Neural Networks (20)

More from Adri Jovin

More from Adri Jovin (20)

Recently uploaded

Recently uploaded (20)

Introduction to Artificial Neural Networks

Editor's Notes