NNAF_DRK.pdf

Neural Network Activation Functions
Bharatiya Vidya Bhavan’s
Sardar Patel Institute of Technology,
Munshi Nagar, Andheri (w) Mumbai.
Neural Network Activation Functions
AICTE Sponsored Two Week FDP on
“Insights into Intelligent Automation
Machine Learning and Data science”
19th Oct to 31st Oct 2020
By
Dhananjay Kalbande
Professor, Computer
Engineering,S.P.I.T. Mumba

NN and ANN
NN is a simplified model of the biological neural system.
ANN is non-linear parameterized function with restricted output range.
Def.1 (By DARPA ,1988)
A neural network is a system composed of many simple processing elements operating in
parallel whose function is determined by network structure, connection strengths, and the
processing performed at computing elements or nodes.
Def.2 (By Zurada,1992)
Artificial neural systems, or neural networks, are physical cellular systems which can
acquire, store and utilize experiential knowledge.
NN and ANN
NN is a simplified model of the biological neural system.
linear parameterized function with restricted output range.
neural network is a system composed of many simple processing elements operating in
parallel whose function is determined by network structure, connection strengths, and the
processing performed at computing elements or nodes.
Artificial neural systems, or neural networks, are physical cellular systems which can
acquire, store and utilize experiential knowledge.

ARTIFICIAL NEURAL NET
• Information-processing system.
• Neurons process the information.
• Neurons process the information.
• The signals are transmitted by means of connection links.
• The links possess an associated weight.
• The output signal is obtained by applying activations to the net input.
ARTIFICIAL NEURAL NET
The signals are transmitted by means of connection links.
The links possess an associated weight.
The output signal is obtained by applying activations to the net input.

What are Activation Functions?
It perform mathematical operations on the signal output.
It is a function that is used to get the output
Why we use Activation functions with Neural Networks?
It is used to determine the output of neural net
between 0 to 1 or -1 to 1 etc. (depending upon the function).
What are Activation Functions?
It perform mathematical operations on the signal output.
put of node. It is also known as Transfer Function.
Why we use Activation functions with Neural Networks?
etwork like yes or no. It maps the resulting values i
1 to 1 etc. (depending upon the function).

Importance Of Activation Functions
1. Simple linear operations namely; multiply the
them across all the inputs arriving to the neuron
2. It is likely that in certain situations, the output
When, this output is fed into the further layers,
larger values, making things computationally
3. This is where the activation functions play a
3. This is where the activation functions play a
to a fix interval (e.g. between -1 and 1,0 to 1
4. The activation functions help the network
suppress the irrelevant data points.
the input by weights, add a bias and sum
neuron are performed in neural networks.
output derived above, takes a large value.
layers, they can be transformed to even
computationally uncontrollable.
a major role i.e. squashing a real-number
a major role i.e. squashing a real-number
1).
network use the important information and

Why should one understand the logic behind Activation Functions?
➔ Even though there are many functions in python which can be used use why is it
important to understand the activation functions?
● Ever activation function has different properties and thus they have different
applications
For example :
❏ When we have a multi classification task sigmoid activation cannot be used in
❏ When we have a multi classification task sigmoid activation cannot be used in
the output layer. Softmax activation is used to classify the output values into
different classes
❏ Relu which is a very popular activation function is always used in hidden
layers. (Return max(0,x))
Why should one understand the logic behind Activation Functions?
Even though there are many functions in python which can be used use why is it
important to understand the activation functions?
Ever activation function has different properties and thus they have different
When we have a multi classification task sigmoid activation cannot be used in
When we have a multi classification task sigmoid activation cannot be used in
activation is used to classify the output values into
which is a very popular activation function is always used in hidden

ANN: Biological perception
X1
W1
Y
X2
X1
W2
figure shows a simple artificial neural net with two input
ons (X1, X2) and one output neuron (Y). The inter
nected weights are given by W1 and W2.
ANN: Biological perception
input
inter
Activation Unit

-
∑
w0j
w1j
x0
x1
Weighted
sum
Input
vector x
Weight
vector
w
wnj
xn
-
f
Output y
Bias
f
Activation
function

ivation function decides, whether a neuron should be activated or not by calculating
ighted sum and further adding bias with it. The purpose of the activation function is t
ighted sum and further adding bias with it. The purpose of the activation function is to
roduce non-linearity into the output of a neuron.

TYPES OF ACTIVATION FUNCTIONS
ACTIVATION LEVEL – DISCRETE OR CONTINUOUS
HARD LIMIT FUCNTION (DISCRETE)
Binary Activation function
Binary Activation function
Bipolar activation function
Identity function
SIGMOIDAL ACTIVATION FUNCTION (CONTINUOUS)
Binary Sigmoidal activation function
Bipolar Sigmoidal activation function
TYPES OF ACTIVATION FUNCTIONS
DISCRETE OR CONTINUOUS
HARD LIMIT FUCNTION (DISCRETE)
SIGMOIDAL ACTIVATION FUNCTION (CONTINUOUS)
Binary Sigmoidal activation function
Bipolar Sigmoidal activation function

Activation functions graphs:
(A)Identity
(B)Binary step
(C)Bipolar step
(D)Binary sigmoidal
(E)Bipolar sigmoidal
(F)Ramp (Satlins)
Activation functions graphs:

Binary step:
(x) = 1, if x >= threshold; 0 if x < threshold
Bipolar step:
(x) = 1, if x >= threshold; -1 if x < threshold
Binary sigmoid:
Activation function equations:
Binary sigmoid:
(x) = 1 / (1 + e^-x)
Bipolar sigmoid:
(x) = (1 - e^-x) / (1 + e^-x)
Hyperbolic tangent( tanh)
(x) = (1 - e^-2x) / (1 + e^-2x)
● Sigmoid
● Tanh
● Soft
● Winn
(x) = 1, if x >= threshold; 0 if x < threshold
1 if x < threshold
Activation function equations:
Others Act.Functions
● Relu
● Leaky Relu
● Adaline
● Winn
Take
All
● Delt
● Ramp

Some activation have their own drawbacks which can only be understood if we
dive deep and understand the activation
For Example: In Tanh and sigmoid activation if the value is close to 0 or
is found that learning becomes very sl
becomes almost 0.Thus update in wei
not decrease rapidly.
activation have their own drawbacks which can only be understood if we
dive deep and understand the activation functions.
Example: In Tanh and sigmoid activation if the value is close to 0 or -1 or 1 it
very slow.This is because slope of the function
in weights is slow and thus the cost(loss) does
Slow learning

ctivation function decides, whether a neuron s
eighted sum and further adding bias with it. T
introduce non-linearity into the output of a neuron.
neuron should be activated or not by calculating
with it. The purpose of the activation function is
into the output of a neuron.

SIGMOID
(Non-Linear Function)
Sigmoid function is an S shaped curve and
scales down large linear values to a value between
It is used for binary classification where we
above the threshold then it belongs to category
above the threshold then it belongs to category
Where x is linear weighted sum of the
If the values of x is too high the value of function
function is close to 0.
SIGMOID
Linear Function)
and major reason for using this activation func
between 0 and 1.
we set a threshold (generally 0.5) and if the v
category 1 else category 0.
category 1 else category 0.
the features (x1*w1+ x2*w2……)
function is close to 1 and if x is too low the v

SIGMOID FUNCTION
Sigmoid function is used in output layer of a binary
classification, where result is either 0 or 1, as value for
sigmoid function lies between 0 and 1 only so, result can be
predicted easily to be 1
1
1
1 if value is greater than
otherwise.
SIGMOID FUNCTION
Sigmoid function is used in output layer of a binary
classification, where result is either 0 or 1, as value for
sigmoid function lies between 0 and 1 only so, result can be
if value is greater than 0.5
0.5
0.5
0.5 and 0
0
0
0

EXAMPLE WITH PYTHON CODE
f sigmoid (x):
s=1/(1+np.exp(-x))
return s
return s
np.arrange(-6,6,0.01)
gmoid(x)
ample: x=0
gmoid(x)=½=0.5
EXAMPLE WITH PYTHON CODE

Tanh ACTIVATION FUNCTION
The activation that works almost always better than sigmoid function is
Tanh function also known as Tangent Hyperbolic function
Value Range :- -1 to +1
Nature :- non-linear
Usually used in hidden layers of a neur
Usually used in hidden layers of a neur
-1 to 1 hence the mean for the hidden l
it, hence helps in centering the data by bringing mean close to 0. This
makes learning for the next layer much easier.
Tanh ACTIVATION FUNCTION
The activation that works almost always better than sigmoid function is
Tangent Hyperbolic function.
ural network as it’s values lies between
ural network as it’s values lies between
n layer comes out be 0 or very close to
by bringing mean close to 0. This
makes learning for the next layer much easier.

Tanh function output graph
Tanh function output graph

Tanh Activation python code
x) = (1 - e^-2x) / (1 + e^-2x)
mport numpy as np;
f Tanh(z):
turn np.tanh(z)
xample :
z=0
nh(0)=0
nh(0)=0
anh(0) #returns 0
R
ython user defined function:
def hyperbole(x):
num = math.exp(-2*x)
return (1-num)/(1+num);
Tanh Activation python code

RELU
1. Stands for Rectified linear unit
activation function. Chiefly implemented in
Neural network.
2. Equation :- A(x) = max(0,x)
and 0 otherwise.
3. Value Range :- [0, inf)
3. Value Range :- [0, inf)
4. Uses :- ReLu is less compu
sigmoid because it involves
At a time only a few neuron
sparse making it efficient and easy for computation.
RELU
Rectified linear unit. It is the most widely used
activation function. Chiefly implemented in hidden layers of
A(x) = max(0,x). It gives an output x if x is positive
putationally expensive than tanh and
es simpler mathematical operations.
ons are activated making the network
sparse making it efficient and easy for computation.

Output of Relu function
Output of Relu function

Python Code For Relu Function
Import numpy as np
def Relu(x):
return np.max(x,0)
Example : if x is 9
Max between 9 and 0 is 9 and thus function will return 9
Thus all negative outputs of the node will be mapped to 0.
Python Code For Relu Function
Max between 9 and 0 is 9 and thus function will return 9
Thus all negative outputs of the node will be mapped to 0.

LEAKY RELU
1. Leaky Rectified linear unit(Leaky Relu) is an extension of the Relu
function to overcome the dying neuron problem.
2. Relu return 0 if the input is negative and hence the neuron
becomes inactive as it does not contribute to gradient flow.
3. Leaky Relu overcomes this problem by allowing small value to
flow when the input is negative. So, if the learning is too slow
using Relu, one can try using Leaky Relu to see any improvement
happens or not.
Leaky Rectified linear unit(Leaky Relu) is an extension of the Relu
function to overcome the dying neuron problem.
Relu return 0 if the input is negative and hence the neuron
Leaky Relu overcomes this problem by allowing small value to
flow when the input is negative. So, if the learning is too slow
using Relu, one can try using Leaky Relu to see any improvement

Python code for leaky Relu
def LeakyRelu(x):
if(x<0):
return 0.01*x
else:
else:
return x
Example:
x=-1
Function will return -1*0.01=-0.01
Ans=LeakyRelu() #Ans=-0.01
Python code for leaky Relu
RELU LEAKY RELU

SOFTMAX ACTIVATION FUNCTION
1) The softmax function is also a type of sigmoid function.
2) It is used when we are trying to handle classification problems.
3) The softmax function is a function that turns a
into a vector of K real values that sum to 1.
4) The input values can be positive, negative, zero, or greater than one,
but the softmax transforms them into values between 0 and 1, so that
they can be interpreted as probabilities
5) If one of the inputs is small or nega
probability, and if an input is large, t
but it will always remain between 0 and 1.
6) The softmax function is ideally used in the output layer of the classifier
where we are actually trying to atta
of each input.
SOFTMAX ACTIVATION FUNCTION
The softmax function is also a type of sigmoid function.
It is used when we are trying to handle classification problems.
The softmax function is a function that turns a vector of K real values
into a vector of K real values that sum to 1.
The input values can be positive, negative, zero, or greater than one,
probabilities.
gative, the softmax turns it into a small
e, then it turns it into a large probability,
but it will always remain between 0 and 1.
The softmax function is ideally used in the output layer of the classifier
ttain the probabilities to define the class

EXAMPLE : Z =
SOFTMAX
FORMULA
First we can calculate the exponential of each
top half of the softmax equation.Note that in th
than 5, 2981 is much larger than 148 due to the effect of the exponential.
h element of the input array. This is the term in th
the input elements, although 8 is only a little large
than 5, 2981 is much larger than 148 due to the effect of the exponential.

2. We can obtain the normalization term, the bottom half of the softmax
equation, by summing all three exponential terms:
3. Finally, dividing by the normalization term, we obtain the softmax
output for each of the three elements. Note that there is not a single
output value because the softmax transforms an array to an array of
the same length, in this case 3.
We can obtain the normalization term, the bottom half of the softmax
equation, by summing all three exponential terms:
Finally, dividing by the normalization term, we obtain the softmax
output for each of the three elements. Note that there is not a single

Python code for softmax activation
function
def softmax_function(x):
e=2.718281 #set value of e which is 2.718
z = e**x #z=e^x
z_ = z/z.sum() #softmax definition
return z_
Or
Or
Import numpy as np
def softmax(z):
np.exp(z)
return e_x / e_x.sum()
Python code for softmax activation

WINNER TAKES ALL
1. Winner Takes All is based on
2. The connections between the output neurons shows the
competition between them
3. one of neurons would be ‘ON
3. one of neurons would be ‘ON
winner and others would be ‘OFF’.
4. Only the weights of winner neuron gets updated
WINNER TAKES ALL
on the competitive learning rule.
The connections between the output neurons shows the
N’ which means it would be the
N’ which means it would be the
winner and others would be ‘OFF’.
Only the weights of winner neuron gets updated

The learning is based on the pre
layer, say mth has the maximum
in figure. This neuron is declared as the winner.
remise that one of the neurons in the
m response due to input x, as shown
in figure. This neuron is declared as the winner.

EXAMPLE : Max Net
The single node whose number of inputs is
activations of all other nodes would be inact
with
f(x)= { x if x>0
{ 0 if x≤0
Python code:
-> def Winner_Takes_All() :
-> If (x>0): return x
-> else: return 0
EXAMPLE : Max Net (unsupervised learning)
is maximum would be active or winner and t
active. Max Net uses identity activation funct

ADALINE
(LINEAR BINARY CLASSIFIER)
All the input feature are multiplied with
heir respective weights
Add all the multiplied values .
he weighted sum is passed through a
near activation function and output of
his is compared with the target output
which is used to update the weights.
inally the output is passed through a non
near activation function like Unit step
unction.
ADALINE
(LINEAR BINARY CLASSIFIER)

THON CODE for Adaline
om mlxtend.classifier import Adaline
y =make_moons(n_samples=100, random_state=123)
a = Adaline(epochs=50, eta=0.05, random_seed=0)
a = Adaline(epochs=50, eta=0.05, random_seed=0)
a.fit(X, y)
THON CODE for Adaline

LTA RULE
Calculate the derivative of f(net)
Calculate difference between the expected
and current output of activation(o)
Multiply the derivative of f(net) with the
step 2.
step 2.
expected output of activation(d)
the difference calculated in

RAMP FUNCTION
Ramp activation function is used to normalize the output of neural networks within the linear
range of activation function.
RAMP FUNCTION
Ramp activation function is used to normalize the output of neural networks within the linear

Python code for Ramp Activation function
#if output range required is between 0 to 1
def Ramp(x):
if(x>1) : return 1
elif(x<0): return 0
else: return x
else: return x
#if output range required is between -1 and 1
def Ramp(x):
if(x>1) : return 1
elif(x<-1): return -1
else: return x
Python code for Ramp Activation function
1 and 1

Example of Ramp Activation
If x = -1 function returns 0
If x=10 function returns 1
If x=0.25 function returns 0.25
If x = -7 function returns -1
If x=10 function returns 1
Example of Ramp Activation

……….Just recall
ACTIVATION FUNCTIONS ARE TRANSFER FUNCTIONS.
IT TRANFER NET INPUT SIGNAL TO OUTPUT SIGNAL.
IT GENERATE THE OUTPUT OF THE NN MODEL.
How AI can make better society and better India using non
……….Just recall
ACTIVATION FUNCTIONS ARE TRANSFER FUNCTIONS.
IT TRANFER NET INPUT SIGNAL TO OUTPUT SIGNAL.
IT GENERATE THE OUTPUT OF THE NN MODEL.
How AI can make better society and better India using non-invasive methods……

NNAF_DRK.pdf

Recomendados

Recomendados

Mais conteúdo relacionado

Semelhante a NNAF_DRK.pdf

Semelhante a NNAF_DRK.pdf (20)

Mais de RohanBorgalli

Mais de RohanBorgalli (14)

Último

Último (20)

NNAF_DRK.pdf