SlideShare a Scribd company logo
1 of 146
Download to read offline
Neural Networks in R
Venkat Reddy
Statinfer.com
Data Science Training and R&D
statinfer.com
2
Corporate Training
Classroom Training
Online Training
Contact us
info@statinfer.com
venkat@statinfer.com
Note
•This presentation is just my class notes. The course notes for data
science training is written by me, as an aid for myself.
•The best way to treat this is, as a high-level summary; the actual
session went more in depth and contained detailed information and
examples
•Most of this material was written as informal notes, not intended for
publication
•Please send questions/comments/corrections to info@statinfer.com
•Please check our website statinfer.com for latest version of this
document
-Venkata Reddy Konasani
(Cofounder statinfer.com)
statinfer.com
3
Contents
Contents
•Neural network Intuition
•Neural network and vocabulary
•Neural network algorithm
•Math behind neural network algorithm
•Building the neural networks
•Validating the neural network model
•Neural network applications
•Image recognition using neural networks
5
statinfer.com
Recap of Logistic Regression
Recap of Logistic Regression
•Categorical output YES/NO type
•Using the predictor variables to predict the categorical output
7
statinfer.com
Decision Boundary
Decision Boundary – Logistic Regression
•The line or margin that separates
the classes
•Classification algorithms are all
about finding the decision
boundaries
•It need not be straight line
always
•The final function of our decision
boundary looks like
• Y=1 if wTx+w0>0 ; else Y=0
9
x1
x2
statinfer.com
Decision Boundary – Logistic Regression
•In logistic regression, Decision Boundary can be derived from the
logistic regression coefficients and the threshold.
• Imagine the logistic regression line p(y)=e(b0+b1x1+b2x2)/1+exp(b0+b1x1+b2x2)
• Suppose if p(y)>0.5 then class-1 or else class-0
• log(y/1-y)=b0+b1x1+b2x2
• Log(0.5/0.5)=b0+b1x1+b2x2
• 0=b0+b1x1+b2x2
• b0+b1x1+b2x2=0 is the line
10
statinfer.com
Decision Boundary – Logistic Regression
•Rewriting it in mx+c form
• X2=(-b1/b2)X1+(-b0/b2)
•Anything above this line is class-1, below this line is class-0
• X2>(-b1/b2)X1+(-b0/b2)is class-1
• X2<(-b1/b2)X1+(-b0/b2) is class-0
• X2=(-b1/b2)X1+(-b0/b2) tie probability of 0.5
•We can change the decision boundary by changing the threshold
value(here 0.5)
11
statinfer.com
LAB: Logistic Regression and
Decision Boundary
LAB: Logistic Regression
•Dataset: Emp_Productivity/Emp_Productivity.csv
•Filter the data and take a subset from above dataset . Filter condition
is Sample_Set<3
•Draw a scatter plot that shows Age on X axis and Experience on Y-axis.
Try to distinguish the two classes with colors or shapes (visualizing the
classes)
•Build a logistic regression model to predict Productivity using age and
experience
•Create the confusion matrix
•Calculate the accuracy and error rates
13
statinfer.com
LAB: Decision Boundary
•Draw a scatter plot that shows Age on X axis and Experience on Y-axis.
Try to distinguish the two classes with colors or shapes (visualizing the
classes)
•Build a logistic regression model to predict Productivity using age and
experience
•Finally draw the decision boundary for this logistic regression model
14
statinfer.com
Code: Logistic Regression
15
statinfer.com
Code: Logistic Regression
16
statinfer.com
Code: Logistic Regression
17
statinfer.com
Code: Logistic Regression
18
statinfer.com
Code: Decision Boundary
19
statinfer.com
Code: Decision Boundary
20
statinfer.com
New representation for logistic
regression
New representation for logistic regression
22
𝑦 =
𝑒 𝛽0+𝛽1𝑥1+𝛽2𝑥2
1 + 𝑒 𝛽0+𝛽1𝑥1+𝛽2𝑥2
𝑦 =
1
1 + 𝑒−(𝛽0+𝛽1𝑥1+𝛽2𝑥2)
x1
x2
w1
w2
w0
yW0+w1x1+w2x
2
𝑦 = 𝑔(σ 𝑤 𝑘 𝑥 𝑘)
𝑦 = 𝑔 𝑤0 + 𝑤1 𝑥1 + 𝑤2 𝑥2 𝑤ℎ𝑒𝑟𝑒 𝑔 𝑥 =
1
1 + 𝑒−𝑥
statinfer.com
Finding the weights in logistic regression
23
x1
x2
W0+w1x1+w2x
2
w1
w2
w0
y
𝑊𝑒 𝑓𝑖𝑛𝑑 𝑤 𝑡𝑜 𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒 σ𝑖=1
𝑛
[𝑦𝑖 − 𝑔 σ 𝑤 𝑘 𝑥 𝑘 ]2
𝑜𝑢𝑡(𝑥) = 𝑔(σ 𝑤 𝑘 𝑥 𝑘)
The above output is a non linear function of linear combination of inputs – A typical multiple logistic
regression line
statinfer.com
LAB: Non-Linear Decision
Boundaries
LAB: Non-Linear Decision Boundaries
•Dataset: “Emp_Productivity/ Emp_Productivity_All_Sites.csv”
•Draw a scatter plot that shows Age on X axis and Experience on Y-axis.
Try to distinguish the two classes with colors or shapes (visualizing the
classes)
•Build a logistic regression model to predict Productivity using age and
experience
•Finally draw the decision boundary for this logistic regression model
•Create the confusion matrix
•Calculate the accuracy and error rates
25
statinfer.com
Code: Non-Linear Decision Boundaries
26
statinfer.com
Code: Non-Linear Decision Boundaries
27
statinfer.com
Code: Non-Linear Decision Boundaries
28
statinfer.com
Code: Non-Linear Decision Boundaries
29
statinfer.com
Code: Non-Linear Decision Boundaries
30
statinfer.com
Non-Linear Decision
Boundaries-Issue
Non-Linear Decision Boundaries
32x1
x2
statinfer.com
Non-Linear Decision Boundaries-issues
•Logistic Regression line doesn’t seam to be a good option when we
have non-linear decision boundaries
33
statinfer.com
Non-Linear Decision
Boundaries-Solution
Intermediate outputs
35
x1
x2
𝐼𝑛𝑡𝑒𝑟𝑚𝑒𝑑𝑖𝑎𝑡𝑒 𝑜𝑢𝑡𝑝𝑢𝑡1
𝑜𝑢𝑡(𝑥) = 𝑔(σ 𝑤 𝑘 𝑥 𝑘) ,Say h1
𝐼𝑛𝑡𝑒𝑟𝑚𝑒𝑑𝑖𝑎𝑡𝑒 𝑜𝑢𝑡𝑝𝑢𝑡2
𝑜𝑢𝑡(𝑥) = 𝑔(σ 𝑤 𝑘 𝑥 𝑘) , Say h2
Model2Model1
statinfer.com
The Intermediate output
•Using the x’s Directly predicting y is challenging.
•We can predict h, the intermediate output, which will indeed predict
Y
36
x1
x2
w11
w12
y
w22
w21
h1
h2
W1
W2
statinfer.com
Finding the weights for intermediate outputs
37
𝐹𝑖𝑛𝑎𝑙 𝑜𝑢𝑡𝑝𝑢𝑡
𝑦 = 𝑜𝑢𝑡(ℎ) = 𝑔(σ 𝑊𝑗ℎ𝑗)
𝐼𝑛𝑡𝑒𝑟𝑚𝑒𝑑𝑖𝑎𝑡𝑒 𝑜𝑢𝑡𝑝𝑢𝑡2
ℎ2 = 𝑜𝑢𝑡(𝑥) = 𝑔(σ 𝑤2𝑘 𝑥 𝑘)
𝑊𝑒 𝑓𝑖𝑛𝑑 𝑤1 𝑡𝑜 𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒 σ𝑖=1
𝑛
[ℎ1 𝑖 − 𝑔 σ 𝑤1𝑘 𝑥 𝑘 ]2
𝑊𝑒 𝑓𝑖𝑛𝑑 𝑤2 𝑡𝑜 𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒 σ𝑖=1
𝑛
[ℎ2 𝑖 − 𝑔 σ 𝑤1𝑘 𝑥 𝑘 ]2
𝐼𝑛𝑡𝑒𝑟𝑚𝑒𝑑𝑖𝑎𝑡𝑒 𝑜𝑢𝑡𝑝𝑢𝑡1
ℎ1 = 𝑜𝑢𝑡(𝑥) = 𝑔(σ 𝑤1𝑘 𝑥 𝑘)
𝑊𝑒 𝑓𝑖𝑛𝑑 𝑊 𝑡𝑜 𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒 σ𝑖=1
𝑛
[𝑦𝑖 − 𝑔 σ 𝑊𝑗ℎ𝑗𝑖 ]2
x1
x2
h1 = 𝑔(෍ 𝑤1𝑘 𝑥 𝑘)
𝑦 =
𝑔(σ 𝑊𝑗ℎ𝑗)
ℎ2 = 𝑔(෍ 𝑤2𝑘 𝑥 𝑘)
statinfer.com
LAB: Intermediate output
LAB: Intermediate output
•Dataset: Emp_Productivity/ Emp_Productivity_All_Sites.csv
•Filter the data and take first 74 observations from above dataset .
•Build a logistic regression model to predict Productivity using age and
experience
•Calculate the prediction probabilities for all the inputs. Store the
probabilities in inter1 variable
•Filter the data and take observations from row 34 onwards.
•Build a logistic regression model to predict Productivity using age and
experience
•Calculate the prediction probabilities for all the inputs. Store the
probabilities in inter2 variable
•Build a consolidated model to predict productivity using inter-1 and inter-2
variables
•Create the confusion matrix and find the accuracy and error rates for the
consolidated model
39
statinfer.com
Code: Intermediate output
40
statinfer.com
Code: Intermediate output
41
statinfer.com
Code: Intermediate output
42
statinfer.com
Code: Intermediate output
43
statinfer.com
Code: Intermediate output
44
statinfer.com
Code: Intermediate output
45
statinfer.com
Code: Intermediate output
46
statinfer.com
Code: Intermediate output
47
statinfer.com
Code: Intermediate output
48
statinfer.com
Code: Intermediate output
49
statinfer.com
Code: Intermediate output
50
statinfer.com
Code: Intermediate output
51
statinfer.com
Code: Intermediate output
52
statinfer.com
Code: Intermediate output
53
statinfer.com
Neural Network intuition
Neural Network intuition
55
𝐹𝑖𝑛𝑎𝑙 𝑜𝑢𝑡𝑝𝑢𝑡
𝑦 = 𝑜𝑢𝑡(ℎ) = 𝑔(σ 𝑊𝑗ℎ𝑗)
𝑦 = 𝑜𝑢𝑡(ℎ) = 𝑔(σ 𝑊𝑗 𝑔(σ 𝑤𝑗𝑘 𝑥 𝑘))
ℎ𝑗 = 𝑜𝑢𝑡 𝑥 = 𝑔(σ 𝑤𝑗𝑘 𝑥 𝑘)
• So h is a non linear function of linear combination of inputs – A multiple logistic regression line
• Y is a non linear function of linear combination of outputs of logistic regressions
• Y is a non linear function of linear combination of non linear functions of linear combination of inputs
𝑊𝑒 𝑓𝑖𝑛𝑑 𝑊 𝑡𝑜 𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒 σ𝑖=1
𝑛
[𝑦𝑖 − 𝑔 σ 𝑊𝑗ℎ𝑗 ]2
𝑊𝑒 𝑓𝑖𝑛𝑑 {𝑊𝑗} & {𝑤𝑗𝑘} 𝑡𝑜 𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒 σ𝑖=1
𝑛
[𝑦𝑖 − 𝑔(σ 𝑊𝑗 𝑔(σ 𝑤𝑗𝑘 𝑥 𝑘))]2
Neural networks is all about finding the sets of weights {Wj,} and {wjk} using Gradient Descent Method
statinfer.com
Neural Network intuition
56
𝐹𝑖𝑛𝑎𝑙 𝑜𝑢𝑡𝑝𝑢𝑡
𝑦 = 𝑜𝑢𝑡(ℎ) = 𝑔(σ 𝑊𝑗ℎ𝑗)
𝐼𝑛𝑡𝑒𝑟𝑚𝑒𝑑𝑖𝑎𝑡𝑒 𝑜𝑢𝑡𝑝𝑢𝑡2
ℎ2 = 𝑜𝑢𝑡(𝑥) = 𝑔(σ 𝑤2𝑘 𝑥 𝑘)
𝑊𝑒 𝑓𝑖𝑛𝑑 𝑤1 𝑡𝑜 𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒 σ𝑖=1
𝑛
[ℎ1 𝑖 − 𝑔 σ 𝑤1𝑘 𝑥 𝑘 ]2
𝑊𝑒 𝑓𝑖𝑛𝑑 𝑤2 𝑡𝑜 𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒 σ𝑖=1
𝑛
[ℎ2 𝑖 − 𝑔 σ 𝑤1𝑘 𝑥 𝑘 ]2
𝐼𝑛𝑡𝑒𝑟𝑚𝑒𝑑𝑖𝑎𝑡𝑒 𝑜𝑢𝑡𝑝𝑢𝑡1
ℎ1 = 𝑜𝑢𝑡(𝑥) = 𝑔(σ 𝑤1𝑘 𝑥 𝑘)
𝑊𝑒 𝑓𝑖𝑛𝑑 𝑊 𝑡𝑜 𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒 σ𝑖=1
𝑛
[𝑦𝑖 − 𝑔 σ 𝑊𝑗ℎ𝑗𝑖 ]2
x1
x2
h1 = 𝑔(෍ 𝑤1𝑘 𝑥 𝑘)
𝑦 =
𝑔(σ 𝑊𝑗ℎ𝑗)
ℎ2 = 𝑔(෍ 𝑤2𝑘 𝑥 𝑘)
statinfer.com
The Neural Networks
•The neural networks methodology is similar to the intermediate output
method explained above.
•But we will not manually subset the data to create the different
models.
•The neural network technique automatically takes care of all the
intermediate outputs using hidden layers
•It works very well for the data with non-linear decision boundaries
•The intermediate output layer in the network is known as hidden layer
•In Simple terms, neural networks are multi layer nonlinear regression
models.
•If we have sufficient number of hidden layers, then we can estimate
any complex non-linear function 57
statinfer.com
Neural network and vocabulary
58
1
x1
x2
h1
h2
y
Hidden
Layer
Input Output
h1=
1
1+𝑒−(𝑤11
+𝑤12
𝑥1
+𝑤22
𝑥2
)
h2=
1
1+𝑒−(𝑤21
+𝑤13
𝑥1
+𝑤23
𝑥2
)
𝑦 =
1
1 + 𝑒−(𝑊0+𝑊1ℎ1+𝑊2ℎ2)
• X1,X2 inputs
• 1  bias term
• W’s are weights
• 1/(1+e-u) is the sigmoid
function
• Y is output
statinfer.com
Why are they called hidden layers?
•A hidden layer “hides” the desired output.
•Instead of predicting the actual output using a single model, build
multiple models to predict intermediate output
•There is no standard way of deciding the number of hidden layers.
59
statinfer.com
The Neural network Algorithm
Algorithm for Finding weights
61
x1
x2
h1
𝑦
h2
w11
w21
w12
w13
w22
w23
W2
W1
W3
• Algorithm is all about finding the weights/coefficients
• We randomly initialize some weights; Calculate the output by supplying training input; If there is an error
the weights are adjusted to reduce this error.
statinfer.com
The Neural Network Algorithm
•Step 1: Initialization of weights: Randomly select some weights
•Step 2 : Training & Activation: Input the training values and perform
the calculations forward.
•Step 3 : Error Calculation: Calculate the error at the outputs. Use the
output error to calculate error fractions at each hidden layer
•Step 4: Weight training : Update the weights to reduce the error,
recalculate and repeat the process of training & updating the weights
for all the examples.
•Step 5: Stopping criteria: Stop the training and weights updating
process when the minimum error criteria is met 62
statinfer.com
Randomly initialize weights
63
x1
x2
h1
𝑦
h2
w11
w21
w12
w13
w22
w23
W2
W1
W3
Step 1: Initialization
of weights: Randomly
select some weights
statinfer.com
Training & Activation
64
x1
x2
h1
𝑦
h2
w11
w21
w12
w13
w22
w23
W1
W0
W2
h1=
1
1+𝑒−(𝑤11
+𝑤12
𝑥1
+𝑤22
𝑥2
)
h2=
1
1+𝑒−(𝑤21
+𝑤13
𝑥1
+𝑤23
𝑥2
)
𝑦 =
1
1 + 𝑒−(𝑊0+𝑊1ℎ1+𝑊2ℎ2)
Training input & calculations – Feed Forward
Step 2 : Input the
training values and
perform the
calculations forward
statinfer.com
Error Calculation at Output
65
Step 3: Calculate the
error at the outputs.
Use the output error
to calculate error
fractions at each
hidden layer
෍
𝑖=1
𝑛
𝑦𝑖 − 𝑔 ෍
𝑘=1
𝑚
𝑤 𝑘ℎ 𝑘𝑖
2
x1
x2
h1
𝑦
h2
w11
w21
w12
w13
w22
w23
W2
W1
W3
statinfer.com
Error Calculation at hidden layers
66
x1
x2
h1
𝑦
h2
Step 3: Calculate the
error at the outputs.
Use the output error to
calculate error fractions
at each hidden layer
𝐸𝑟𝑟 = ෍
𝑖=1
𝑛
𝑦𝑖 − 𝑔 ෍
𝑘=1
𝑚
𝑤 𝑘ℎ 𝑘𝑖
2
Back Propagation - Calculate Errors signals backwards
𝛿 𝑘 = 𝑦 1 − 𝑦 ∗ 𝑊 ∗ 𝐸𝑟𝑟
statinfer.com
Calculate weight corrections
67
x1
x2
h1
𝑦
h2
Dw11
Dw21
Dw12
Dw13
Dw22
Dw23
Step 4: Update the
weights to reduce
the error, recalculate
and repeat the
process
DW1
DW0
DW2
𝐸𝑟𝑟 = ෍
𝑖=1
𝑛
𝑦𝑖 − 𝑔 ෍
𝑘=1
𝑚
𝑤 𝑘ℎ 𝑘𝑖
2
𝛿 𝑘 = 𝑦 1 − 𝑦 ∗ 𝑊 ∗ 𝐸𝑟𝑟
statinfer.com
Update Weights
68
x1
x2
h1
𝑦
h2
w11 :=w11+Dw11
w12 :=w12+Dw12
w22 :=w22+Dw22
w13 :=w13+Dw13
w21 :=w21+Dw21 Step 4: Update the
weights to reduce
the error, recalculate
and repeat the
process
W1:=W1+DW1
W0:=W0+DW0
W2:=W2+DW2
𝐸𝑟𝑟 = ෍
𝑖=1
𝑛
𝑦𝑖 − 𝑔 ෍
𝑘=1
𝑚
𝑤 𝑘ℎ 𝑘𝑖
2
𝛿 𝑘 = 𝑦 1 − 𝑦 ∗ 𝑊 ∗ 𝐸𝑟𝑟
statinfer.com
Stopping Criteria
69
x1
x2
h1
𝑦
h2
w11
w21
w12
w13
w22
w23
W1
W0
W2
෍
𝒊=𝟏
𝒏
𝒚𝒊 − 𝒈 ෍
𝒌=𝟏
𝒎
𝒘 𝒌 𝒉 𝒌𝒊
𝟐
Step 5: Stop the
training and weights
updating process
when the minimum
error criteria is met
statinfer.com
Once Again ….Neural network Algorithm
•Step 1: Initialization of weights: Randomly select some weights
•Step 2 : Training & Activation: Input the training values and perform
the calculations forward.
•Step 3 : Error Calculation: Calculate the error at the outputs. Use the
output error to calculate error fractions at each hidden layer
•Step 4: Weight training : Update the weights to reduce the error,
recalculate and repeat the process of training & updating the weights
for all the examples.
•Step 5: Stopping criteria: Stop the training and weights updating
process when the minimum error criteria is met 70
statinfer.com
Neural network Algorithm-
Demo
Neural network Algorithm-Demo
Looks like a dataset that can’t be separated by using single linear decision boundary/perceptron
72
0
0
1
1
1
1 1
1
1
1
1
1 1
1
1
1
1
1 1
1
1
1
1
1 1
1
11
1 1
1
1
1
1
1
1 1
1
1
1
1
1 1
1
1
1
1
1 1
1
1
1
1
1 1
1
1
1
1
1 1
1
11
1 1
1
1
1
1
1 1
1
00 00 0
0 00
0
0 00 00 00 0
0 0
0
0
00 00 00 0
0 00 00 00 00 00 0
0 00
0 0
0
0 0 0 0
0
11
1
0
00 00 0
0 00
0 00 00 00 0
0
0
0
0 00 00 0
0 00 00 00 00 00 0
0 00
0 0
0 0 0
1
1
1
1 1
1
1
1
1 1
1
1
1
1 1
1
1
1
1 1
1
11
1 1
1
1
1
1
1 1
1
0
1
1
1
1
1 1
1
1
1
1
1
1
1
1 1
1
1
1
1
1 1
1
1
1
1 1
1
1
1
1
1
1
1 1
1
00 00 0
0 00
0 00 00 00 0
0
0
0
0
0 00 00 000 0
0 00
0 0 0
1
00 0
0
00 00 00 0
0 00 00 0
0 00 00 00 00 00 0
0 00
0 0
1
statinfer.com
Neural network Algorithm-Demo
•Lets consider a similar but simple classification example
•XOR Gate Dataset
73
Input1(x1) Input2(x2) Output(y)
1 1 0
1 0 1
0 1 1
0 0 0 0
0
1
1
statinfer.com
Randomly initialize weights
74
x1
x2
h1
𝑦
h2
0.5
Step 1: Initialization
of weights: Randomly
select some weights
statinfer.com
Activation
75
1
1
0.818
0.7137126
0.731
0.5
h1=
1
1+𝑒−(𝑤11
+𝑤12
𝑥1
+𝑤22
𝑥2
)
h2=
1
1+𝑒−(𝑤21
+𝑤13
𝑥1
+𝑤23
𝑥2
)
𝑦 =
1
1 + 𝑒−(𝑊0+𝑊1ℎ1+𝑊2ℎ2)
input1 input2 output
1 1 0
• In this step we input 1 and 1 as input &
expect 0 as output.
• With these weights we got an error of -
0.714 at output layer
• We need to adjust weights
statinfer.com
Back-Propagate Errors
76
Ytarget-Yobs -0.7137126
Y(1-Y) 0.20432693
𝛿 at Y -0.1458307
1
1
h1
𝑦
h2
0.5
𝛿 at Y -0.1458307
h2(1-h2) 0.1966119
𝑊 1
𝛿 at h2 -0.0286721
𝛿 at Y -0.1458307
h1(1-h1) 0.1491465
𝑊 -1
𝛿 at h1 0.0217501
𝑊𝑗𝑘 ∶= 𝑊𝑗𝑘 + ∆𝑊𝑗𝑘
𝑤ℎ𝑒𝑟𝑒 ∆𝑊𝑗𝑘 = 𝜂. 𝑦j 𝛿 𝑘
𝜂 is the learning parameter
𝛿 𝑘 = 𝑦 𝑘(1 − 𝑦 𝑘) ∗ 𝐸𝑟𝑟
(for hidden layers𝛿 𝑘 = 𝑦 𝑘(1 − 𝑦 𝑘) ∗ 𝑤𝑗 ∗ 𝐸𝑟𝑟)
Err=Expected output-Actual output
statinfer.com
Calculate Weight Corrections
77
1
1
𝛿 at h1
0.0217501
𝛿 at Y
−0.1458307
𝛿 at h2
−0.02867
0.5
∆𝑊𝑗𝑘 = 𝜂. 𝑦𝑗 𝛿 𝑘
∆𝑊=0.1*1*0.0217501
∆𝑊=0.00217501
∆𝑊𝑗𝑘 = 𝜂. 𝑦𝑗 𝛿 𝑘
∆𝑊=0.1*1*−0.02867
∆𝑊=-0.002867
𝑊𝑗𝑘 ∶= 𝑊𝑗𝑘 + ∆𝑊𝑗𝑘
𝑤ℎ𝑒𝑟𝑒 ∆𝑊𝑗𝑘 = 𝜂. 𝑦j 𝛿 𝑘
𝜂 is the learning parameter
𝛿 𝑘 = 𝑦 𝑘(1 − 𝑦 𝑘) ∗ 𝐸𝑟𝑟
(for hidden layers𝛿 𝑘 = 𝑦 𝑘(1 − 𝑦 𝑘) ∗ 𝑤𝑗 ∗ 𝐸𝑟𝑟)
Err=Expected output-Actual output
statinfer.com
Updated Weights
78
1
1
𝛿 at h1
0.0217501
𝛿 at Y
−0.1458307
𝛿 at h2
−0.02867
0.502175
𝑊𝑗𝑘 ∶= 𝑊𝑗𝑘 + ∆𝑊𝑗𝑘
W(new)=W(old)+Correction
W(new) =0.502175
𝑊𝑗𝑘 ∶= 𝑊𝑗𝑘 + ∆𝑊𝑗𝑘
W(new)=W(old)+Correction
W(new) =-1.002867
𝑊𝑗𝑘 ∶= 𝑊𝑗𝑘 + ∆𝑊𝑗𝑘
𝑤ℎ𝑒𝑟𝑒 ∆𝑊𝑗𝑘 = 𝜂. 𝑦j 𝛿 𝑘
𝜂 is the learning parameter
𝛿 𝑘 = 𝑦 𝑘(1 − 𝑦 𝑘) ∗ 𝐸𝑟𝑟
(for hidden layers𝛿 𝑘 = 𝑦 𝑘(1 − 𝑦 𝑘) ∗ 𝑤𝑗 ∗ 𝐸𝑟𝑟)
Err=Expected output-Actual output
statinfer.com
Updated Weights..contd
79
x1
x2
h1
𝑦
h2
0.50218
statinfer.com
Iterations and Stopping Criteria
•This iteration is just for one training example (1,1,0). This is just the
first epoch.
•We repeat the same process of training and updating of weights for all
the data points
•We continue and update the weights until we see there is no
significant change in the error or when the maximum permissible error
criteria is met.
•By updating the weights in this method, we reduce the error slightly.
When the error reaches the minimum point the iterations will be
stopped and the weights will be considered as optimum for this
training set 80
statinfer.com
XOR Gate final NN Model
81
statinfer.com
Building the Neural network
The good news is..
•We don’t need to write the code for weights calculation and updating
•There readymade codes, libraries and packages available in R
•The gradient descent method is not very easy to understand for a non
mathematics students
•Neural network tools don’t expect the user to write the code for the
full length back propagation algorithm
83
statinfer.com
Building the neural network in R
•We have a couple of packages available in R
• net
• neuralnet
•We need to mention the dataset, input, output & number of hidden
layers as input.
•Neural network calculations are very complex. The algorithm may take
sometime to produce the results
•One need to be careful while setting the parameters. The runtime
changed based on the input parameter values
84
statinfer.com
LAB: Building the neural
network in R
LAB: Building the neural network in R
•Build a neural network for XOR data
86
statinfer.com
Code: Building the neural network in R
87
statinfer.com
R Code Options
• neuralnet(Productivity~Age+Experience,data=Emp_Productivity_raw, hidden=2,
stepmax = 1e+07, threshold=0.00001, linear.output = FALSE)
•The number of hidden layers in the neural network. It is actually the
number of nodes. We can input a vector to add more hidden layers
•Stepmax:
• The number of iterations while executing algorithm.
• Sometimes we may need more than 100,000 steps for the algorithm to converge.
• Some times we may get an error “Alogorithm didn't converge with the default step
max”; We need to increase the stepmax parameter value in such cases.
• Additional info
• One epoch one complete run of training data. If epoch=500 then algorithm sees
the entire data set 500 times
• One iteration is the number of times a "batch" of data passed through the
algorithm(Steps). If batch size is same as full training data then iterations is equal
to epochs
88
statinfer.com
R Code Options
•Threshold
• Connected to weights optimization on error function
• By default, neuralnet requires the model partial derivative error to change at least
0.01 otherwise it will stop changing.
• It can be used as a stopping criteria. If the partial derivative of error function
reaches this threshold then the algorithm will stop.
• A lower threshold value will force the algorithm for more iterations and accuracy.
•The output is expected to be linear by default. We need to specifically
mention linear.output = FALSE for classification problems
89
statinfer.com
Code: Building the neural network in R
90
Execute a couple of
times to get zero error
statinfer.com
Code: Building the neural network in R
91
statinfer.com
Code: Building the neural network in R
92
statinfer.com
Code: Building the neural network in R
93
statinfer.com
Lab: Building Neural network on Employee
productivity data
•Dataset: Emp_Productivity/Emp_Productivity.csv
•Draw a 2D graph between age, experience and productivity
•Build neural network algorithm to predict the productivity based on
age and experience
•Plot the neural network with final weights
•Increase the hidden layers and see the change in accuracy
94
statinfer.com
Code: Neural network on Employee
productivity data
95
statinfer.com
Code: Neural network on Employee
productivity data
96
statinfer.com
Code: Neural network on Employee
productivity data
97
statinfer.com
Code: Neural network on Employee
productivity data
98
statinfer.com
Code: Neural network on Employee
productivity data
99
statinfer.com
Code: Neural network on Employee
productivity data
100
statinfer.com
Code: Neural network on Employee
productivity data
101
statinfer.com
There can be many solutions
102
Set-1
statinfer.com
There can be many solutions
103
Set-2
statinfer.com
There can be many solutions
104
Set-3
statinfer.com
Local vs. Global Minimum
Local vs. Global Minimum
• The neural network might give different results with different start weights.
• The algorithm tries to find the local minima rather than global minima.
• There can be many local minima’s, which means there can be many solutions to
neural network problem
• We need to perform the validation checks before choosing the final model.
106Global
minimu
m
Local
Minimu
m
statinfer.com
Hidden layers and their role
Multi Layer Neural Network
108
1
x1
x2
H11
H12
Y
Hidden LayersInput Output
H21
H22
H22
statinfer.com
The role of hidden layers
109
• The First hidden layer
• The first layer is nothing but the liner
decision boundaries
• The simple logistic regression line
outputs
• We can see them as multiple lines on
the decision space
0
0
1
1
1
1 1
1
1
1
1
1 1
1
1
1
1
1 1
1
1
1
1
1 1
1
11
1 1
1
1
1
1
1
1 1
1
1
1
1
1 1
1
1
1
1
1 1
1
1
1
1
1 1
1
1
1
1
1 1
1
11
1 1
1
1
1
1
1 1
1
00 00 0
0 00
0
0 00 00 00 0
0 0
0
0
0
0 00 00 0
0 00 00 00 00 00 0
0 00
0 0
0
0 0 0 0
0
11
1
0
00 00 0
0 00
0 00 00 00 0
0
0
0
0 00 00 0
0 00 00 00 00 00 0
0 00
0 0
0 0 0
1
1
1
1 1
1
1
1
1 1
1
1
1
1 1
1
1
1
1 1
1
11
1 1
1
1
1
1
1 1
1
0
1
1
1
1
1 1
1
1
1
1
1
1
1
1 1
1
1
1
1
1 1
1
1
1
1 1
1
1
1
1
1
1
1 1
1
00 00 0
0 00
0 00 00 00 0
0
0
0
0
0 00 00 000 0
0 00
0 0 0
1
00 0
0
00 00 00 0
0 00 00 0
0 00 00 00 00 00 0
0 00
0 0
1
statinfer.com
The role of hidden layers
110
• The Second hidden layer
• The Second layer combines these
lines and forms simple decision
boundary shapes
• The third hidden layer forms even
complex shapes within the
boundaries generated by second
layer.
• You can imagine All these layers
together divide the whole objective
space into multiple decision
boundary shapes, the cases within
the shape are class-1 outside the
shape are class-2
0
0
1
1
1
1 1
1
1
1
1
1 1
1
1
1
1
1 1
1
1
1
1
1 1
1
11
1 1
1
1
1
1
1
1 1
1
1
1
1
1 1
1
1
1
1
1 1
1
1
1
1
1 1
1
1
1
1
1 1
1
11
1 1
1
1
1
1
1 1
1
00 00 0
0 00
0
0 00 00 00 0
0 0
0
0
0
0 00 00 0
0 00 00 00 00 00 0
0 00
0 0
0
0 0 0 0
0
11
1
0
00 00 0
0 00
0 00 00 00 0
0
0
0
0 00 00 0
0 00 00 00 00 00 0
0 00
0 0
0 0 0
1
1
1
1 1
1
1
1
1 1
1
1
1
1 1
1
1
1
1 1
1
11
1 1
1
1
1
1
1 1
1
0
1
1
1
1
1 1
1
1
1
1
1
1
1
1 1
1
1
1
1
1 1
1
1
1
1 1
1
1
1
1
1
1
1 1
1
00 00 0
0 00
0 00 00 00 0
0
0
0
0
0 00 00 000 0
0 00
0 0 0
1
00 0
0
00 00 00 0
0 00 00 0
0 00 00 00 00 00 0
0 00
0 0
1
statinfer.com
The Number of hidden layers
The Number of hidden layers
•There is no concrete rule to choose the right number. We need to
choose by trail and error validation
•Too few hidden layers might result in imperfect models. The error rate
will be high
•High number of hidden layers might lead to over‐fitting, but it can be
identified by using some validation techniques
•The final number is based on the number of predictor variables,
training data size and the complexity in the target.
•When we are in doubt, its better to go with many hidden nodes than
few. It will ensure higher accuracy. The training process will be slower
though
•Cross validation and testing error can help us in determining the model
with optimal hidden layers
112
statinfer.com
LAB: Digit Recognizer
LAB: Digit Recognizer
• Take an image of a handwritten single digit, and determine what that digit is.
• Normalized handwritten digits, automatically scanned from envelopes by the U.S. Postal Service.
The original scanned digits are binary and of different sizes and orientations; the images here have
been de slanted and size normalized, resultingin 16 x 16 grayscale images (Le Cun et al., 1990).
• The data are in two gzipped files, and each line consists of the digitid (0-9) followed by the 256
grayscale values.
• Build a neural network model that can be used as the digit recognizer
• Use the test dataset to validate the true classification power of the model
• What is the final accuracy of the model?
114
statinfer.com
Code: Digit Recognizer
#Importing test and training data - USPS Data
digits_train <- read.table("D:Google DriveTrainingDatasetsDigit
RecognizerUSPSzip.train.txt", quote=""", comment.char="")
digits_test <- read.table("D:Google DriveTrainingDatasetsDigit
RecognizerUSPSzip.test.txt", quote=""", comment.char="")
dim(digits_train)
col_names <- names(digits_train[,-1])
label_levels<-names(table(digits_train$V1))
#Lets see some images.
for(i in 1:10)
{
data_row<-digits_train[i,-1]
pixels = matrix(as.numeric(data_row),16,16,byrow=TRUE)
image(pixels, axes = FALSE)
title(main = paste("Label is" , digits_train[i,1]), font.main = 4)
}
115
statinfer.com
Code: Digit Recognizer
116
statinfer.com
Code: Digit Recognizer
#####Creating multiple columns for multiple outputs
#####We need these variables while building the model
digit_labels<-data.frame(label=digits_train[,1])
for (i in 1:10)
{
digit_labels<-cbind(digit_labels, digit_labels$label==i-1)
names(digit_labels)[i+1]<-paste("l",i-1,sep="")
}
label_names<-names(digit_labels[,-1])
#Update the training dataset
digits_train1<-cbind(digits_train,digit_labels)
names(digits_train1)
#formula y~. doesn't work in neuralnet function
model_form <- as.formula(paste(paste(label_names, collapse = " + "), "~", paste(col_names,
collapse = " + "))) 117
statinfer.com
Code: Digit Recognizer
118
statinfer.com
Code: Digit Recognizer
119
statinfer.com
Code: Digit Recognizer
120
statinfer.com
Code: Digit Recognizer
121
statinfer.com
Code: Digit Recognizer
122
statinfer.com
Real-world applications
Real-world applications
•Self driving car by taking the video as input
•Speech recognition
•Face recognition
•Cancer cell analysis
•Heart attack predictions
•Currency predictions and stock price predictions
•Credit card default and loan predictions
•Marketing and advertising by predicting the response probability
•Weather forecasting and rainfall prediction
124
statinfer.com
Real-world applications
•Face recognition :
• https://www.youtube.com/watch?v=57VkfXqJ1LU
• https://www.youtube.com/watch?v=xVQLBbXdVUY
•Autonomous car software
• https://www.youtube.com/watch?v=gG72-SjwxAM
125
statinfer.com
Drawbacks of Neural Networks
Drawbacks of Neural Networks
•No real theory that explains how to choose the number of hidden
layers
•Takes lot of time when the input data is large, needs powerful
computing machines
•Difficult to interpret the results. Very hard to interpret and measure
the impact of individual predictors
•Its not easy to choose the right training sample size and learning rate.
•The local minimum issue. The gradient descent algorithm produces the
optimal weights for the local minimum, the global minimum of the
error function is not guaranteed
127
statinfer.com
Why the name neural
network?
Why the name neural network?
•The neural network algorithm for
solving complex learning problems is
inspired by human brain
•Our brains are a huge network of
processing elements. It contains a
network of billions of neurons.
•In our brain, a neuron receives input
from other neurons. Inputs are
combined and send to next neuron
•The artificial neural network
algorithm is built on the same logic.
129
statinfer.com
Why the name neural network?
130
Dendrites Input(X)
Cell body  Processor(Swx)
Axon  Output(Y)
statinfer.com
Conclusion
Conclusion
•Neural network is a vast subject. Many data scientists solely focus on only
Neural network techniques
•In this session we practiced the introductory concepts only. Neural Networks
has much more advanced techniques. There are many algorithms other than
back propagation.
•Neural networks particularly work well on some particular class of problems
like image recognition.
•The neural networks algorithms are very calculation intensive. They require
highly efficient computing machines. Large datasets take significant amount
of runtime on R. We need to try different types of options and packages.
•Currently there is a lot of exciting research is going on, around neural
networks.
•After gaining sufficient knowledge in this basic session, you may want to
explore reinforced learning, deep learning etc.,
132
statinfer.com
Appendix
Math- How to update the
weights?
Math- How to update the weights?
•We update the weights backwards by iteratively calculating the error
•The formula for weights updating is done using gradient descent method or
delta rule also known as Widrow-Hoff rule
•First we calculate the weight corrections for the output layer then we take
care of hidden layers
135
statinfer.com
Math- How to update the weights?
• 𝑊𝑗𝑘 ∶= 𝑊𝑗𝑘 + ∆𝑊𝑗𝑘
• 𝑤ℎ𝑒𝑟𝑒 ∆𝑊𝑗𝑘 = 𝜂. 𝑦j 𝛿 𝑘
• 𝜂 is the learning parameter
• 𝛿 𝑘 = 𝑦 𝑘(1 − 𝑦 𝑘) ∗ 𝐸𝑟𝑟 (for hidden layers 𝛿 𝑘 = 𝑦 𝑘(1 − 𝑦 𝑘) ∗ 𝑤𝑗 ∗ 𝐸𝑟𝑟)
• Err=Expected output-Actual output
•The weight corrections is calculated based on the error function
•The new weights are chosen in such way that the final error in that
network is minimized
136
statinfer.com
Math-How does the delta rule
work?
How does the delta rule work?
• Lets consider a simple example to understand the weight updating using delta rule.
138
• If we building a simple
logistic regression line. We
would like to find the weights
using weight update rule
• Y=1/(1+e-wx) is the equation
• We are searching for the
optimal w for our data
• Let w be 1
• Y=1/(1+e-x) is the initial
equation
• The error in our initial step is
3.59
• To reduce the error we will
add a delta to w and make it
1.5
• Now w is 1.5 (blue line)
• Y=1/(1+e-1.5x) the updated
equation
• With the updated weight, the
error is 1.57
• We can further reduce the
error by increasing w by delta
statinfer.com
How does the delta rule work?
139
• If we repeat the same process of adding delta
and updating weights, we can finally end up
with minimum error
• The weight at that final step is the optimal
weight
• In this example the weight is 8, and the error is
0
• Y=1/(1+e-8x) is the final equation
• In this example, we manually changed the weights to reduce the error. This is just for
intuition, manual updating is not feasible for complex optimization problems.
• In gradient descent is a scientific optimization method. We update the weights by calculating
gradient of the function.
statinfer.com
Math-How does gradient
descent work?
How does gradient descent work?
•Gradient descent is one of the famous ways to calculate the local minimum
•By Changing the weights we are moving towards the minimum value of the
error function. The weights are changed by taking steps in the negative
direction of the function gradient(derivative).
141
Error
Weight
statinfer.com
Demo-How does gradient
descent work?
Does this method really work?
• We changed the weights did it reduce the overall error?
• Lets calculate the error with new weights and see the change
143
1
1
0.818545647
0.706552799
0.729364041
0.50218
statinfer.com
Gradient Descent method validation
•With our initial set of weights the overall error was 0.7137,Y Actual is
0, Y Predicted is 0.7137 error =0.7137
•The new weights give us a predicted value of 0.70655
•In one iteration, we reduced the error from 0.7137 to 0.70655
•The error is reduced by 1%. Repeat the same process with multiple
epochs and training examples, we can reduce the error further.
144
input1 input2 Output(Y-Actual) Y Predicted Error
Old Weights 1 1 0 0.71371259 0.71371259
Updated Weights 1 1 0 0.706552799 0.706552799
statinfer.com
Thank you
Statinfer.com
statinfer.com
146
Download the course videos and handouts from the below link
https://statinfer.com/course/machine-learning-with-r-2/curriculum/?c=b433a9be3189

More Related Content

What's hot

Machine learning with R
Machine learning with RMachine learning with R
Machine learning with RMaarten Smeets
 
Face recognition and deep learning โดย ดร. สรรพฤทธิ์ มฤคทัต NECTEC
Face recognition and deep learning  โดย ดร. สรรพฤทธิ์ มฤคทัต NECTECFace recognition and deep learning  โดย ดร. สรรพฤทธิ์ มฤคทัต NECTEC
Face recognition and deep learning โดย ดร. สรรพฤทธิ์ มฤคทัต NECTECBAINIDA
 
Feature engineering pipelines
Feature engineering pipelinesFeature engineering pipelines
Feature engineering pipelinesRamesh Sampath
 
Boosted tree
Boosted treeBoosted tree
Boosted treeZhuyi Xue
 
Ml9 introduction to-unsupervised_learning_and_clustering_methods
Ml9 introduction to-unsupervised_learning_and_clustering_methodsMl9 introduction to-unsupervised_learning_and_clustering_methods
Ml9 introduction to-unsupervised_learning_and_clustering_methodsankit_ppt
 
Meetup_Consumer_Credit_Default_Vers_2_All
Meetup_Consumer_Credit_Default_Vers_2_AllMeetup_Consumer_Credit_Default_Vers_2_All
Meetup_Consumer_Credit_Default_Vers_2_AllBernard Ong
 
Feature Reduction Techniques
Feature Reduction TechniquesFeature Reduction Techniques
Feature Reduction TechniquesVishal Patel
 
Matrix decomposition and_applications_to_nlp
Matrix decomposition and_applications_to_nlpMatrix decomposition and_applications_to_nlp
Matrix decomposition and_applications_to_nlpankit_ppt
 
Ml10 dimensionality reduction-and_advanced_topics
Ml10 dimensionality reduction-and_advanced_topicsMl10 dimensionality reduction-and_advanced_topics
Ml10 dimensionality reduction-and_advanced_topicsankit_ppt
 
Gradient Boosted Regression Trees in scikit-learn
Gradient Boosted Regression Trees in scikit-learnGradient Boosted Regression Trees in scikit-learn
Gradient Boosted Regression Trees in scikit-learnDataRobot
 
GLM & GBM in H2O
GLM & GBM in H2OGLM & GBM in H2O
GLM & GBM in H2OSri Ambati
 
Machine learning Algorithms with a Sagemaker demo
Machine learning Algorithms with a Sagemaker demoMachine learning Algorithms with a Sagemaker demo
Machine learning Algorithms with a Sagemaker demoHridyesh Bisht
 
Ml3 logistic regression-and_classification_error_metrics
Ml3 logistic regression-and_classification_error_metricsMl3 logistic regression-and_classification_error_metrics
Ml3 logistic regression-and_classification_error_metricsankit_ppt
 
Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017Gabriel Moreira
 
Algorithms Design Patterns
Algorithms Design PatternsAlgorithms Design Patterns
Algorithms Design PatternsAshwin Shiv
 

What's hot (20)

Machine learning with R
Machine learning with RMachine learning with R
Machine learning with R
 
Face recognition and deep learning โดย ดร. สรรพฤทธิ์ มฤคทัต NECTEC
Face recognition and deep learning  โดย ดร. สรรพฤทธิ์ มฤคทัต NECTECFace recognition and deep learning  โดย ดร. สรรพฤทธิ์ มฤคทัต NECTEC
Face recognition and deep learning โดย ดร. สรรพฤทธิ์ มฤคทัต NECTEC
 
Feature engineering pipelines
Feature engineering pipelinesFeature engineering pipelines
Feature engineering pipelines
 
Boosted tree
Boosted treeBoosted tree
Boosted tree
 
Ml9 introduction to-unsupervised_learning_and_clustering_methods
Ml9 introduction to-unsupervised_learning_and_clustering_methodsMl9 introduction to-unsupervised_learning_and_clustering_methods
Ml9 introduction to-unsupervised_learning_and_clustering_methods
 
07 learning
07 learning07 learning
07 learning
 
Meetup_Consumer_Credit_Default_Vers_2_All
Meetup_Consumer_Credit_Default_Vers_2_AllMeetup_Consumer_Credit_Default_Vers_2_All
Meetup_Consumer_Credit_Default_Vers_2_All
 
Feature Reduction Techniques
Feature Reduction TechniquesFeature Reduction Techniques
Feature Reduction Techniques
 
Matrix decomposition and_applications_to_nlp
Matrix decomposition and_applications_to_nlpMatrix decomposition and_applications_to_nlp
Matrix decomposition and_applications_to_nlp
 
Ml10 dimensionality reduction-and_advanced_topics
Ml10 dimensionality reduction-and_advanced_topicsMl10 dimensionality reduction-and_advanced_topics
Ml10 dimensionality reduction-and_advanced_topics
 
Matrix Factorization
Matrix FactorizationMatrix Factorization
Matrix Factorization
 
Gradient Boosted Regression Trees in scikit-learn
Gradient Boosted Regression Trees in scikit-learnGradient Boosted Regression Trees in scikit-learn
Gradient Boosted Regression Trees in scikit-learn
 
Ppt shuai
Ppt shuaiPpt shuai
Ppt shuai
 
GLM & GBM in H2O
GLM & GBM in H2OGLM & GBM in H2O
GLM & GBM in H2O
 
QBIC
QBICQBIC
QBIC
 
Machine learning Algorithms with a Sagemaker demo
Machine learning Algorithms with a Sagemaker demoMachine learning Algorithms with a Sagemaker demo
Machine learning Algorithms with a Sagemaker demo
 
Ml3 logistic regression-and_classification_error_metrics
Ml3 logistic regression-and_classification_error_metricsMl3 logistic regression-and_classification_error_metrics
Ml3 logistic regression-and_classification_error_metrics
 
Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017
 
Ml7 bagging
Ml7 baggingMl7 bagging
Ml7 bagging
 
Algorithms Design Patterns
Algorithms Design PatternsAlgorithms Design Patterns
Algorithms Design Patterns
 

Similar to Neural Networks made easy

background.pptx
background.pptxbackground.pptx
background.pptxKabileshCm
 
An Introduction to Deep Learning
An Introduction to Deep LearningAn Introduction to Deep Learning
An Introduction to Deep Learningmilad abbasi
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep LearningMehrnaz Faraz
 
Data Science and Machine Learning with Tensorflow
 Data Science and Machine Learning with Tensorflow Data Science and Machine Learning with Tensorflow
Data Science and Machine Learning with TensorflowShubham Sharma
 
Practice discovering biological knowledge using networks approach.
Practice discovering biological knowledge using networks approach.Practice discovering biological knowledge using networks approach.
Practice discovering biological knowledge using networks approach.Elena Sügis
 
Machine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis IntroductionMachine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis IntroductionTe-Yen Liu
 
Machine Learning in q/kdb+ - Teaching KDB to Read Japanese
Machine Learning in q/kdb+ - Teaching KDB to Read JapaneseMachine Learning in q/kdb+ - Teaching KDB to Read Japanese
Machine Learning in q/kdb+ - Teaching KDB to Read JapaneseMark Lefevre, CQF
 
Recurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRURecurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRUananth
 
Heuristic design of experiments w meta gradient search
Heuristic design of experiments w meta gradient searchHeuristic design of experiments w meta gradient search
Heuristic design of experiments w meta gradient searchGreg Makowski
 
Unsupervised Learning: Clustering
Unsupervised Learning: Clustering Unsupervised Learning: Clustering
Unsupervised Learning: Clustering Experfy
 
Final Presentation - Edan&Itzik
Final Presentation - Edan&ItzikFinal Presentation - Edan&Itzik
Final Presentation - Edan&Itzikitzik cohen
 
05 contours seg_matching
05 contours seg_matching05 contours seg_matching
05 contours seg_matchingankit_ppt
 
Foundations: Artificial Neural Networks
Foundations: Artificial Neural NetworksFoundations: Artificial Neural Networks
Foundations: Artificial Neural Networksananth
 
Seq2Seq (encoder decoder) model
Seq2Seq (encoder decoder) modelSeq2Seq (encoder decoder) model
Seq2Seq (encoder decoder) model佳蓉 倪
 
Neural Art (English Version)
Neural Art (English Version)Neural Art (English Version)
Neural Art (English Version)Mark Chang
 
Neural Networks in Data Mining - “An Overview”
Neural Networks  in Data Mining -   “An Overview”Neural Networks  in Data Mining -   “An Overview”
Neural Networks in Data Mining - “An Overview”Dr.(Mrs).Gethsiyal Augasta
 
Week 12 Dimensionality Reduction Bagian 1
Week 12 Dimensionality Reduction Bagian 1Week 12 Dimensionality Reduction Bagian 1
Week 12 Dimensionality Reduction Bagian 1khairulhuda242
 

Similar to Neural Networks made easy (20)

background.pptx
background.pptxbackground.pptx
background.pptx
 
An Introduction to Deep Learning
An Introduction to Deep LearningAn Introduction to Deep Learning
An Introduction to Deep Learning
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep Learning
 
Data Science and Machine Learning with Tensorflow
 Data Science and Machine Learning with Tensorflow Data Science and Machine Learning with Tensorflow
Data Science and Machine Learning with Tensorflow
 
Kx for wine tasting
Kx for wine tastingKx for wine tasting
Kx for wine tasting
 
Practice discovering biological knowledge using networks approach.
Practice discovering biological knowledge using networks approach.Practice discovering biological knowledge using networks approach.
Practice discovering biological knowledge using networks approach.
 
Machine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis IntroductionMachine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis Introduction
 
Machine Learning in q/kdb+ - Teaching KDB to Read Japanese
Machine Learning in q/kdb+ - Teaching KDB to Read JapaneseMachine Learning in q/kdb+ - Teaching KDB to Read Japanese
Machine Learning in q/kdb+ - Teaching KDB to Read Japanese
 
Recurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRURecurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRU
 
Iiwas19 yamazaki slide
Iiwas19 yamazaki slideIiwas19 yamazaki slide
Iiwas19 yamazaki slide
 
Heuristic design of experiments w meta gradient search
Heuristic design of experiments w meta gradient searchHeuristic design of experiments w meta gradient search
Heuristic design of experiments w meta gradient search
 
Unsupervised Learning: Clustering
Unsupervised Learning: Clustering Unsupervised Learning: Clustering
Unsupervised Learning: Clustering
 
Final Presentation - Edan&Itzik
Final Presentation - Edan&ItzikFinal Presentation - Edan&Itzik
Final Presentation - Edan&Itzik
 
Machine learning meetup
Machine learning meetupMachine learning meetup
Machine learning meetup
 
05 contours seg_matching
05 contours seg_matching05 contours seg_matching
05 contours seg_matching
 
Foundations: Artificial Neural Networks
Foundations: Artificial Neural NetworksFoundations: Artificial Neural Networks
Foundations: Artificial Neural Networks
 
Seq2Seq (encoder decoder) model
Seq2Seq (encoder decoder) modelSeq2Seq (encoder decoder) model
Seq2Seq (encoder decoder) model
 
Neural Art (English Version)
Neural Art (English Version)Neural Art (English Version)
Neural Art (English Version)
 
Neural Networks in Data Mining - “An Overview”
Neural Networks  in Data Mining -   “An Overview”Neural Networks  in Data Mining -   “An Overview”
Neural Networks in Data Mining - “An Overview”
 
Week 12 Dimensionality Reduction Bagian 1
Week 12 Dimensionality Reduction Bagian 1Week 12 Dimensionality Reduction Bagian 1
Week 12 Dimensionality Reduction Bagian 1
 

More from Venkata Reddy Konasani

Machine Learning Deep Learning AI and Data Science
Machine Learning Deep Learning AI and Data Science Machine Learning Deep Learning AI and Data Science
Machine Learning Deep Learning AI and Data Science Venkata Reddy Konasani
 
Model selection and cross validation techniques
Model selection and cross validation techniquesModel selection and cross validation techniques
Model selection and cross validation techniquesVenkata Reddy Konasani
 
Table of Contents - Practical Business Analytics using SAS
Table of Contents - Practical Business Analytics using SAS Table of Contents - Practical Business Analytics using SAS
Table of Contents - Practical Business Analytics using SAS Venkata Reddy Konasani
 
Learning Tableau - Data, Graphs, Filters, Dashboards and Advanced features
Learning Tableau -  Data, Graphs, Filters, Dashboards and Advanced featuresLearning Tableau -  Data, Graphs, Filters, Dashboards and Advanced features
Learning Tableau - Data, Graphs, Filters, Dashboards and Advanced featuresVenkata Reddy Konasani
 
Introduction to predictive modeling v1
Introduction to predictive modeling v1Introduction to predictive modeling v1
Introduction to predictive modeling v1Venkata Reddy Konasani
 
Model building in credit card and loan approval
Model building in credit card and loan approval Model building in credit card and loan approval
Model building in credit card and loan approval Venkata Reddy Konasani
 

More from Venkata Reddy Konasani (20)

Transformers 101
Transformers 101 Transformers 101
Transformers 101
 
Machine Learning Deep Learning AI and Data Science
Machine Learning Deep Learning AI and Data Science Machine Learning Deep Learning AI and Data Science
Machine Learning Deep Learning AI and Data Science
 
Model selection and cross validation techniques
Model selection and cross validation techniquesModel selection and cross validation techniques
Model selection and cross validation techniques
 
Decision tree
Decision treeDecision tree
Decision tree
 
Step By Step Guide to Learn R
Step By Step Guide to Learn RStep By Step Guide to Learn R
Step By Step Guide to Learn R
 
Credit Risk Model Building Steps
Credit Risk Model Building StepsCredit Risk Model Building Steps
Credit Risk Model Building Steps
 
Table of Contents - Practical Business Analytics using SAS
Table of Contents - Practical Business Analytics using SAS Table of Contents - Practical Business Analytics using SAS
Table of Contents - Practical Business Analytics using SAS
 
SAS basics Step by step learning
SAS basics Step by step learningSAS basics Step by step learning
SAS basics Step by step learning
 
Testing of hypothesis case study
Testing of hypothesis case study Testing of hypothesis case study
Testing of hypothesis case study
 
L101 predictive modeling case_study
L101 predictive modeling case_studyL101 predictive modeling case_study
L101 predictive modeling case_study
 
Learning Tableau - Data, Graphs, Filters, Dashboards and Advanced features
Learning Tableau -  Data, Graphs, Filters, Dashboards and Advanced featuresLearning Tableau -  Data, Graphs, Filters, Dashboards and Advanced features
Learning Tableau - Data, Graphs, Filters, Dashboards and Advanced features
 
Machine Learning for Dummies
Machine Learning for DummiesMachine Learning for Dummies
Machine Learning for Dummies
 
Online data sources for analaysis
Online data sources for analaysis Online data sources for analaysis
Online data sources for analaysis
 
A data analyst view of Bigdata
A data analyst view of Bigdata A data analyst view of Bigdata
A data analyst view of Bigdata
 
Cluster Analysis for Dummies
Cluster Analysis for DummiesCluster Analysis for Dummies
Cluster Analysis for Dummies
 
ARIMA
ARIMA ARIMA
ARIMA
 
Introduction to predictive modeling v1
Introduction to predictive modeling v1Introduction to predictive modeling v1
Introduction to predictive modeling v1
 
Big data Introduction by Mohan
Big data Introduction by MohanBig data Introduction by Mohan
Big data Introduction by Mohan
 
Data Analyst - Interview Guide
Data Analyst - Interview GuideData Analyst - Interview Guide
Data Analyst - Interview Guide
 
Model building in credit card and loan approval
Model building in credit card and loan approval Model building in credit card and loan approval
Model building in credit card and loan approval
 

Recently uploaded

MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxHumphrey A Beña
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPCeline George
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfJemuel Francisco
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
Activity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translationActivity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translationRosabel UA
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management systemChristalin Nelson
 
Integumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptIntegumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptshraddhaparab530
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...JojoEDelaCruz
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17Celine George
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4JOYLYNSAMANIEGO
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Mark Reed
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management SystemChristalin Nelson
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfVirtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfErwinPantujan2
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...Nguyen Thanh Tu Collection
 
Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)cama23
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptxmary850239
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptxiammrhaywood
 

Recently uploaded (20)

MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERP
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
Activity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translationActivity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translation
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management system
 
Integumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptIntegumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.ppt
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management System
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfVirtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
 
Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
 

Neural Networks made easy

  • 1. Neural Networks in R Venkat Reddy
  • 2. Statinfer.com Data Science Training and R&D statinfer.com 2 Corporate Training Classroom Training Online Training Contact us info@statinfer.com venkat@statinfer.com
  • 3. Note •This presentation is just my class notes. The course notes for data science training is written by me, as an aid for myself. •The best way to treat this is, as a high-level summary; the actual session went more in depth and contained detailed information and examples •Most of this material was written as informal notes, not intended for publication •Please send questions/comments/corrections to info@statinfer.com •Please check our website statinfer.com for latest version of this document -Venkata Reddy Konasani (Cofounder statinfer.com) statinfer.com 3
  • 5. Contents •Neural network Intuition •Neural network and vocabulary •Neural network algorithm •Math behind neural network algorithm •Building the neural networks •Validating the neural network model •Neural network applications •Image recognition using neural networks 5 statinfer.com
  • 6. Recap of Logistic Regression
  • 7. Recap of Logistic Regression •Categorical output YES/NO type •Using the predictor variables to predict the categorical output 7 statinfer.com
  • 9. Decision Boundary – Logistic Regression •The line or margin that separates the classes •Classification algorithms are all about finding the decision boundaries •It need not be straight line always •The final function of our decision boundary looks like • Y=1 if wTx+w0>0 ; else Y=0 9 x1 x2 statinfer.com
  • 10. Decision Boundary – Logistic Regression •In logistic regression, Decision Boundary can be derived from the logistic regression coefficients and the threshold. • Imagine the logistic regression line p(y)=e(b0+b1x1+b2x2)/1+exp(b0+b1x1+b2x2) • Suppose if p(y)>0.5 then class-1 or else class-0 • log(y/1-y)=b0+b1x1+b2x2 • Log(0.5/0.5)=b0+b1x1+b2x2 • 0=b0+b1x1+b2x2 • b0+b1x1+b2x2=0 is the line 10 statinfer.com
  • 11. Decision Boundary – Logistic Regression •Rewriting it in mx+c form • X2=(-b1/b2)X1+(-b0/b2) •Anything above this line is class-1, below this line is class-0 • X2>(-b1/b2)X1+(-b0/b2)is class-1 • X2<(-b1/b2)X1+(-b0/b2) is class-0 • X2=(-b1/b2)X1+(-b0/b2) tie probability of 0.5 •We can change the decision boundary by changing the threshold value(here 0.5) 11 statinfer.com
  • 12. LAB: Logistic Regression and Decision Boundary
  • 13. LAB: Logistic Regression •Dataset: Emp_Productivity/Emp_Productivity.csv •Filter the data and take a subset from above dataset . Filter condition is Sample_Set<3 •Draw a scatter plot that shows Age on X axis and Experience on Y-axis. Try to distinguish the two classes with colors or shapes (visualizing the classes) •Build a logistic regression model to predict Productivity using age and experience •Create the confusion matrix •Calculate the accuracy and error rates 13 statinfer.com
  • 14. LAB: Decision Boundary •Draw a scatter plot that shows Age on X axis and Experience on Y-axis. Try to distinguish the two classes with colors or shapes (visualizing the classes) •Build a logistic regression model to predict Productivity using age and experience •Finally draw the decision boundary for this logistic regression model 14 statinfer.com
  • 21. New representation for logistic regression
  • 22. New representation for logistic regression 22 𝑦 = 𝑒 𝛽0+𝛽1𝑥1+𝛽2𝑥2 1 + 𝑒 𝛽0+𝛽1𝑥1+𝛽2𝑥2 𝑦 = 1 1 + 𝑒−(𝛽0+𝛽1𝑥1+𝛽2𝑥2) x1 x2 w1 w2 w0 yW0+w1x1+w2x 2 𝑦 = 𝑔(σ 𝑤 𝑘 𝑥 𝑘) 𝑦 = 𝑔 𝑤0 + 𝑤1 𝑥1 + 𝑤2 𝑥2 𝑤ℎ𝑒𝑟𝑒 𝑔 𝑥 = 1 1 + 𝑒−𝑥 statinfer.com
  • 23. Finding the weights in logistic regression 23 x1 x2 W0+w1x1+w2x 2 w1 w2 w0 y 𝑊𝑒 𝑓𝑖𝑛𝑑 𝑤 𝑡𝑜 𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒 σ𝑖=1 𝑛 [𝑦𝑖 − 𝑔 σ 𝑤 𝑘 𝑥 𝑘 ]2 𝑜𝑢𝑡(𝑥) = 𝑔(σ 𝑤 𝑘 𝑥 𝑘) The above output is a non linear function of linear combination of inputs – A typical multiple logistic regression line statinfer.com
  • 25. LAB: Non-Linear Decision Boundaries •Dataset: “Emp_Productivity/ Emp_Productivity_All_Sites.csv” •Draw a scatter plot that shows Age on X axis and Experience on Y-axis. Try to distinguish the two classes with colors or shapes (visualizing the classes) •Build a logistic regression model to predict Productivity using age and experience •Finally draw the decision boundary for this logistic regression model •Create the confusion matrix •Calculate the accuracy and error rates 25 statinfer.com
  • 26. Code: Non-Linear Decision Boundaries 26 statinfer.com
  • 27. Code: Non-Linear Decision Boundaries 27 statinfer.com
  • 28. Code: Non-Linear Decision Boundaries 28 statinfer.com
  • 29. Code: Non-Linear Decision Boundaries 29 statinfer.com
  • 30. Code: Non-Linear Decision Boundaries 30 statinfer.com
  • 33. Non-Linear Decision Boundaries-issues •Logistic Regression line doesn’t seam to be a good option when we have non-linear decision boundaries 33 statinfer.com
  • 35. Intermediate outputs 35 x1 x2 𝐼𝑛𝑡𝑒𝑟𝑚𝑒𝑑𝑖𝑎𝑡𝑒 𝑜𝑢𝑡𝑝𝑢𝑡1 𝑜𝑢𝑡(𝑥) = 𝑔(σ 𝑤 𝑘 𝑥 𝑘) ,Say h1 𝐼𝑛𝑡𝑒𝑟𝑚𝑒𝑑𝑖𝑎𝑡𝑒 𝑜𝑢𝑡𝑝𝑢𝑡2 𝑜𝑢𝑡(𝑥) = 𝑔(σ 𝑤 𝑘 𝑥 𝑘) , Say h2 Model2Model1 statinfer.com
  • 36. The Intermediate output •Using the x’s Directly predicting y is challenging. •We can predict h, the intermediate output, which will indeed predict Y 36 x1 x2 w11 w12 y w22 w21 h1 h2 W1 W2 statinfer.com
  • 37. Finding the weights for intermediate outputs 37 𝐹𝑖𝑛𝑎𝑙 𝑜𝑢𝑡𝑝𝑢𝑡 𝑦 = 𝑜𝑢𝑡(ℎ) = 𝑔(σ 𝑊𝑗ℎ𝑗) 𝐼𝑛𝑡𝑒𝑟𝑚𝑒𝑑𝑖𝑎𝑡𝑒 𝑜𝑢𝑡𝑝𝑢𝑡2 ℎ2 = 𝑜𝑢𝑡(𝑥) = 𝑔(σ 𝑤2𝑘 𝑥 𝑘) 𝑊𝑒 𝑓𝑖𝑛𝑑 𝑤1 𝑡𝑜 𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒 σ𝑖=1 𝑛 [ℎ1 𝑖 − 𝑔 σ 𝑤1𝑘 𝑥 𝑘 ]2 𝑊𝑒 𝑓𝑖𝑛𝑑 𝑤2 𝑡𝑜 𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒 σ𝑖=1 𝑛 [ℎ2 𝑖 − 𝑔 σ 𝑤1𝑘 𝑥 𝑘 ]2 𝐼𝑛𝑡𝑒𝑟𝑚𝑒𝑑𝑖𝑎𝑡𝑒 𝑜𝑢𝑡𝑝𝑢𝑡1 ℎ1 = 𝑜𝑢𝑡(𝑥) = 𝑔(σ 𝑤1𝑘 𝑥 𝑘) 𝑊𝑒 𝑓𝑖𝑛𝑑 𝑊 𝑡𝑜 𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒 σ𝑖=1 𝑛 [𝑦𝑖 − 𝑔 σ 𝑊𝑗ℎ𝑗𝑖 ]2 x1 x2 h1 = 𝑔(෍ 𝑤1𝑘 𝑥 𝑘) 𝑦 = 𝑔(σ 𝑊𝑗ℎ𝑗) ℎ2 = 𝑔(෍ 𝑤2𝑘 𝑥 𝑘) statinfer.com
  • 39. LAB: Intermediate output •Dataset: Emp_Productivity/ Emp_Productivity_All_Sites.csv •Filter the data and take first 74 observations from above dataset . •Build a logistic regression model to predict Productivity using age and experience •Calculate the prediction probabilities for all the inputs. Store the probabilities in inter1 variable •Filter the data and take observations from row 34 onwards. •Build a logistic regression model to predict Productivity using age and experience •Calculate the prediction probabilities for all the inputs. Store the probabilities in inter2 variable •Build a consolidated model to predict productivity using inter-1 and inter-2 variables •Create the confusion matrix and find the accuracy and error rates for the consolidated model 39 statinfer.com
  • 55. Neural Network intuition 55 𝐹𝑖𝑛𝑎𝑙 𝑜𝑢𝑡𝑝𝑢𝑡 𝑦 = 𝑜𝑢𝑡(ℎ) = 𝑔(σ 𝑊𝑗ℎ𝑗) 𝑦 = 𝑜𝑢𝑡(ℎ) = 𝑔(σ 𝑊𝑗 𝑔(σ 𝑤𝑗𝑘 𝑥 𝑘)) ℎ𝑗 = 𝑜𝑢𝑡 𝑥 = 𝑔(σ 𝑤𝑗𝑘 𝑥 𝑘) • So h is a non linear function of linear combination of inputs – A multiple logistic regression line • Y is a non linear function of linear combination of outputs of logistic regressions • Y is a non linear function of linear combination of non linear functions of linear combination of inputs 𝑊𝑒 𝑓𝑖𝑛𝑑 𝑊 𝑡𝑜 𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒 σ𝑖=1 𝑛 [𝑦𝑖 − 𝑔 σ 𝑊𝑗ℎ𝑗 ]2 𝑊𝑒 𝑓𝑖𝑛𝑑 {𝑊𝑗} & {𝑤𝑗𝑘} 𝑡𝑜 𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒 σ𝑖=1 𝑛 [𝑦𝑖 − 𝑔(σ 𝑊𝑗 𝑔(σ 𝑤𝑗𝑘 𝑥 𝑘))]2 Neural networks is all about finding the sets of weights {Wj,} and {wjk} using Gradient Descent Method statinfer.com
  • 56. Neural Network intuition 56 𝐹𝑖𝑛𝑎𝑙 𝑜𝑢𝑡𝑝𝑢𝑡 𝑦 = 𝑜𝑢𝑡(ℎ) = 𝑔(σ 𝑊𝑗ℎ𝑗) 𝐼𝑛𝑡𝑒𝑟𝑚𝑒𝑑𝑖𝑎𝑡𝑒 𝑜𝑢𝑡𝑝𝑢𝑡2 ℎ2 = 𝑜𝑢𝑡(𝑥) = 𝑔(σ 𝑤2𝑘 𝑥 𝑘) 𝑊𝑒 𝑓𝑖𝑛𝑑 𝑤1 𝑡𝑜 𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒 σ𝑖=1 𝑛 [ℎ1 𝑖 − 𝑔 σ 𝑤1𝑘 𝑥 𝑘 ]2 𝑊𝑒 𝑓𝑖𝑛𝑑 𝑤2 𝑡𝑜 𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒 σ𝑖=1 𝑛 [ℎ2 𝑖 − 𝑔 σ 𝑤1𝑘 𝑥 𝑘 ]2 𝐼𝑛𝑡𝑒𝑟𝑚𝑒𝑑𝑖𝑎𝑡𝑒 𝑜𝑢𝑡𝑝𝑢𝑡1 ℎ1 = 𝑜𝑢𝑡(𝑥) = 𝑔(σ 𝑤1𝑘 𝑥 𝑘) 𝑊𝑒 𝑓𝑖𝑛𝑑 𝑊 𝑡𝑜 𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒 σ𝑖=1 𝑛 [𝑦𝑖 − 𝑔 σ 𝑊𝑗ℎ𝑗𝑖 ]2 x1 x2 h1 = 𝑔(෍ 𝑤1𝑘 𝑥 𝑘) 𝑦 = 𝑔(σ 𝑊𝑗ℎ𝑗) ℎ2 = 𝑔(෍ 𝑤2𝑘 𝑥 𝑘) statinfer.com
  • 57. The Neural Networks •The neural networks methodology is similar to the intermediate output method explained above. •But we will not manually subset the data to create the different models. •The neural network technique automatically takes care of all the intermediate outputs using hidden layers •It works very well for the data with non-linear decision boundaries •The intermediate output layer in the network is known as hidden layer •In Simple terms, neural networks are multi layer nonlinear regression models. •If we have sufficient number of hidden layers, then we can estimate any complex non-linear function 57 statinfer.com
  • 58. Neural network and vocabulary 58 1 x1 x2 h1 h2 y Hidden Layer Input Output h1= 1 1+𝑒−(𝑤11 +𝑤12 𝑥1 +𝑤22 𝑥2 ) h2= 1 1+𝑒−(𝑤21 +𝑤13 𝑥1 +𝑤23 𝑥2 ) 𝑦 = 1 1 + 𝑒−(𝑊0+𝑊1ℎ1+𝑊2ℎ2) • X1,X2 inputs • 1  bias term • W’s are weights • 1/(1+e-u) is the sigmoid function • Y is output statinfer.com
  • 59. Why are they called hidden layers? •A hidden layer “hides” the desired output. •Instead of predicting the actual output using a single model, build multiple models to predict intermediate output •There is no standard way of deciding the number of hidden layers. 59 statinfer.com
  • 60. The Neural network Algorithm
  • 61. Algorithm for Finding weights 61 x1 x2 h1 𝑦 h2 w11 w21 w12 w13 w22 w23 W2 W1 W3 • Algorithm is all about finding the weights/coefficients • We randomly initialize some weights; Calculate the output by supplying training input; If there is an error the weights are adjusted to reduce this error. statinfer.com
  • 62. The Neural Network Algorithm •Step 1: Initialization of weights: Randomly select some weights •Step 2 : Training & Activation: Input the training values and perform the calculations forward. •Step 3 : Error Calculation: Calculate the error at the outputs. Use the output error to calculate error fractions at each hidden layer •Step 4: Weight training : Update the weights to reduce the error, recalculate and repeat the process of training & updating the weights for all the examples. •Step 5: Stopping criteria: Stop the training and weights updating process when the minimum error criteria is met 62 statinfer.com
  • 63. Randomly initialize weights 63 x1 x2 h1 𝑦 h2 w11 w21 w12 w13 w22 w23 W2 W1 W3 Step 1: Initialization of weights: Randomly select some weights statinfer.com
  • 64. Training & Activation 64 x1 x2 h1 𝑦 h2 w11 w21 w12 w13 w22 w23 W1 W0 W2 h1= 1 1+𝑒−(𝑤11 +𝑤12 𝑥1 +𝑤22 𝑥2 ) h2= 1 1+𝑒−(𝑤21 +𝑤13 𝑥1 +𝑤23 𝑥2 ) 𝑦 = 1 1 + 𝑒−(𝑊0+𝑊1ℎ1+𝑊2ℎ2) Training input & calculations – Feed Forward Step 2 : Input the training values and perform the calculations forward statinfer.com
  • 65. Error Calculation at Output 65 Step 3: Calculate the error at the outputs. Use the output error to calculate error fractions at each hidden layer ෍ 𝑖=1 𝑛 𝑦𝑖 − 𝑔 ෍ 𝑘=1 𝑚 𝑤 𝑘ℎ 𝑘𝑖 2 x1 x2 h1 𝑦 h2 w11 w21 w12 w13 w22 w23 W2 W1 W3 statinfer.com
  • 66. Error Calculation at hidden layers 66 x1 x2 h1 𝑦 h2 Step 3: Calculate the error at the outputs. Use the output error to calculate error fractions at each hidden layer 𝐸𝑟𝑟 = ෍ 𝑖=1 𝑛 𝑦𝑖 − 𝑔 ෍ 𝑘=1 𝑚 𝑤 𝑘ℎ 𝑘𝑖 2 Back Propagation - Calculate Errors signals backwards 𝛿 𝑘 = 𝑦 1 − 𝑦 ∗ 𝑊 ∗ 𝐸𝑟𝑟 statinfer.com
  • 67. Calculate weight corrections 67 x1 x2 h1 𝑦 h2 Dw11 Dw21 Dw12 Dw13 Dw22 Dw23 Step 4: Update the weights to reduce the error, recalculate and repeat the process DW1 DW0 DW2 𝐸𝑟𝑟 = ෍ 𝑖=1 𝑛 𝑦𝑖 − 𝑔 ෍ 𝑘=1 𝑚 𝑤 𝑘ℎ 𝑘𝑖 2 𝛿 𝑘 = 𝑦 1 − 𝑦 ∗ 𝑊 ∗ 𝐸𝑟𝑟 statinfer.com
  • 68. Update Weights 68 x1 x2 h1 𝑦 h2 w11 :=w11+Dw11 w12 :=w12+Dw12 w22 :=w22+Dw22 w13 :=w13+Dw13 w21 :=w21+Dw21 Step 4: Update the weights to reduce the error, recalculate and repeat the process W1:=W1+DW1 W0:=W0+DW0 W2:=W2+DW2 𝐸𝑟𝑟 = ෍ 𝑖=1 𝑛 𝑦𝑖 − 𝑔 ෍ 𝑘=1 𝑚 𝑤 𝑘ℎ 𝑘𝑖 2 𝛿 𝑘 = 𝑦 1 − 𝑦 ∗ 𝑊 ∗ 𝐸𝑟𝑟 statinfer.com
  • 69. Stopping Criteria 69 x1 x2 h1 𝑦 h2 w11 w21 w12 w13 w22 w23 W1 W0 W2 ෍ 𝒊=𝟏 𝒏 𝒚𝒊 − 𝒈 ෍ 𝒌=𝟏 𝒎 𝒘 𝒌 𝒉 𝒌𝒊 𝟐 Step 5: Stop the training and weights updating process when the minimum error criteria is met statinfer.com
  • 70. Once Again ….Neural network Algorithm •Step 1: Initialization of weights: Randomly select some weights •Step 2 : Training & Activation: Input the training values and perform the calculations forward. •Step 3 : Error Calculation: Calculate the error at the outputs. Use the output error to calculate error fractions at each hidden layer •Step 4: Weight training : Update the weights to reduce the error, recalculate and repeat the process of training & updating the weights for all the examples. •Step 5: Stopping criteria: Stop the training and weights updating process when the minimum error criteria is met 70 statinfer.com
  • 72. Neural network Algorithm-Demo Looks like a dataset that can’t be separated by using single linear decision boundary/perceptron 72 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 00 00 0 0 00 0 0 00 00 00 0 0 0 0 0 00 00 00 0 0 00 00 00 00 00 0 0 00 0 0 0 0 0 0 0 0 11 1 0 00 00 0 0 00 0 00 00 00 0 0 0 0 0 00 00 0 0 00 00 00 00 00 0 0 00 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 00 00 0 0 00 0 00 00 00 0 0 0 0 0 0 00 00 000 0 0 00 0 0 0 1 00 0 0 00 00 00 0 0 00 00 0 0 00 00 00 00 00 0 0 00 0 0 1 statinfer.com
  • 73. Neural network Algorithm-Demo •Lets consider a similar but simple classification example •XOR Gate Dataset 73 Input1(x1) Input2(x2) Output(y) 1 1 0 1 0 1 0 1 1 0 0 0 0 0 1 1 statinfer.com
  • 74. Randomly initialize weights 74 x1 x2 h1 𝑦 h2 0.5 Step 1: Initialization of weights: Randomly select some weights statinfer.com
  • 75. Activation 75 1 1 0.818 0.7137126 0.731 0.5 h1= 1 1+𝑒−(𝑤11 +𝑤12 𝑥1 +𝑤22 𝑥2 ) h2= 1 1+𝑒−(𝑤21 +𝑤13 𝑥1 +𝑤23 𝑥2 ) 𝑦 = 1 1 + 𝑒−(𝑊0+𝑊1ℎ1+𝑊2ℎ2) input1 input2 output 1 1 0 • In this step we input 1 and 1 as input & expect 0 as output. • With these weights we got an error of - 0.714 at output layer • We need to adjust weights statinfer.com
  • 76. Back-Propagate Errors 76 Ytarget-Yobs -0.7137126 Y(1-Y) 0.20432693 𝛿 at Y -0.1458307 1 1 h1 𝑦 h2 0.5 𝛿 at Y -0.1458307 h2(1-h2) 0.1966119 𝑊 1 𝛿 at h2 -0.0286721 𝛿 at Y -0.1458307 h1(1-h1) 0.1491465 𝑊 -1 𝛿 at h1 0.0217501 𝑊𝑗𝑘 ∶= 𝑊𝑗𝑘 + ∆𝑊𝑗𝑘 𝑤ℎ𝑒𝑟𝑒 ∆𝑊𝑗𝑘 = 𝜂. 𝑦j 𝛿 𝑘 𝜂 is the learning parameter 𝛿 𝑘 = 𝑦 𝑘(1 − 𝑦 𝑘) ∗ 𝐸𝑟𝑟 (for hidden layers𝛿 𝑘 = 𝑦 𝑘(1 − 𝑦 𝑘) ∗ 𝑤𝑗 ∗ 𝐸𝑟𝑟) Err=Expected output-Actual output statinfer.com
  • 77. Calculate Weight Corrections 77 1 1 𝛿 at h1 0.0217501 𝛿 at Y −0.1458307 𝛿 at h2 −0.02867 0.5 ∆𝑊𝑗𝑘 = 𝜂. 𝑦𝑗 𝛿 𝑘 ∆𝑊=0.1*1*0.0217501 ∆𝑊=0.00217501 ∆𝑊𝑗𝑘 = 𝜂. 𝑦𝑗 𝛿 𝑘 ∆𝑊=0.1*1*−0.02867 ∆𝑊=-0.002867 𝑊𝑗𝑘 ∶= 𝑊𝑗𝑘 + ∆𝑊𝑗𝑘 𝑤ℎ𝑒𝑟𝑒 ∆𝑊𝑗𝑘 = 𝜂. 𝑦j 𝛿 𝑘 𝜂 is the learning parameter 𝛿 𝑘 = 𝑦 𝑘(1 − 𝑦 𝑘) ∗ 𝐸𝑟𝑟 (for hidden layers𝛿 𝑘 = 𝑦 𝑘(1 − 𝑦 𝑘) ∗ 𝑤𝑗 ∗ 𝐸𝑟𝑟) Err=Expected output-Actual output statinfer.com
  • 78. Updated Weights 78 1 1 𝛿 at h1 0.0217501 𝛿 at Y −0.1458307 𝛿 at h2 −0.02867 0.502175 𝑊𝑗𝑘 ∶= 𝑊𝑗𝑘 + ∆𝑊𝑗𝑘 W(new)=W(old)+Correction W(new) =0.502175 𝑊𝑗𝑘 ∶= 𝑊𝑗𝑘 + ∆𝑊𝑗𝑘 W(new)=W(old)+Correction W(new) =-1.002867 𝑊𝑗𝑘 ∶= 𝑊𝑗𝑘 + ∆𝑊𝑗𝑘 𝑤ℎ𝑒𝑟𝑒 ∆𝑊𝑗𝑘 = 𝜂. 𝑦j 𝛿 𝑘 𝜂 is the learning parameter 𝛿 𝑘 = 𝑦 𝑘(1 − 𝑦 𝑘) ∗ 𝐸𝑟𝑟 (for hidden layers𝛿 𝑘 = 𝑦 𝑘(1 − 𝑦 𝑘) ∗ 𝑤𝑗 ∗ 𝐸𝑟𝑟) Err=Expected output-Actual output statinfer.com
  • 80. Iterations and Stopping Criteria •This iteration is just for one training example (1,1,0). This is just the first epoch. •We repeat the same process of training and updating of weights for all the data points •We continue and update the weights until we see there is no significant change in the error or when the maximum permissible error criteria is met. •By updating the weights in this method, we reduce the error slightly. When the error reaches the minimum point the iterations will be stopped and the weights will be considered as optimum for this training set 80 statinfer.com
  • 81. XOR Gate final NN Model 81 statinfer.com
  • 83. The good news is.. •We don’t need to write the code for weights calculation and updating •There readymade codes, libraries and packages available in R •The gradient descent method is not very easy to understand for a non mathematics students •Neural network tools don’t expect the user to write the code for the full length back propagation algorithm 83 statinfer.com
  • 84. Building the neural network in R •We have a couple of packages available in R • net • neuralnet •We need to mention the dataset, input, output & number of hidden layers as input. •Neural network calculations are very complex. The algorithm may take sometime to produce the results •One need to be careful while setting the parameters. The runtime changed based on the input parameter values 84 statinfer.com
  • 85. LAB: Building the neural network in R
  • 86. LAB: Building the neural network in R •Build a neural network for XOR data 86 statinfer.com
  • 87. Code: Building the neural network in R 87 statinfer.com
  • 88. R Code Options • neuralnet(Productivity~Age+Experience,data=Emp_Productivity_raw, hidden=2, stepmax = 1e+07, threshold=0.00001, linear.output = FALSE) •The number of hidden layers in the neural network. It is actually the number of nodes. We can input a vector to add more hidden layers •Stepmax: • The number of iterations while executing algorithm. • Sometimes we may need more than 100,000 steps for the algorithm to converge. • Some times we may get an error “Alogorithm didn't converge with the default step max”; We need to increase the stepmax parameter value in such cases. • Additional info • One epoch one complete run of training data. If epoch=500 then algorithm sees the entire data set 500 times • One iteration is the number of times a "batch" of data passed through the algorithm(Steps). If batch size is same as full training data then iterations is equal to epochs 88 statinfer.com
  • 89. R Code Options •Threshold • Connected to weights optimization on error function • By default, neuralnet requires the model partial derivative error to change at least 0.01 otherwise it will stop changing. • It can be used as a stopping criteria. If the partial derivative of error function reaches this threshold then the algorithm will stop. • A lower threshold value will force the algorithm for more iterations and accuracy. •The output is expected to be linear by default. We need to specifically mention linear.output = FALSE for classification problems 89 statinfer.com
  • 90. Code: Building the neural network in R 90 Execute a couple of times to get zero error statinfer.com
  • 91. Code: Building the neural network in R 91 statinfer.com
  • 92. Code: Building the neural network in R 92 statinfer.com
  • 93. Code: Building the neural network in R 93 statinfer.com
  • 94. Lab: Building Neural network on Employee productivity data •Dataset: Emp_Productivity/Emp_Productivity.csv •Draw a 2D graph between age, experience and productivity •Build neural network algorithm to predict the productivity based on age and experience •Plot the neural network with final weights •Increase the hidden layers and see the change in accuracy 94 statinfer.com
  • 95. Code: Neural network on Employee productivity data 95 statinfer.com
  • 96. Code: Neural network on Employee productivity data 96 statinfer.com
  • 97. Code: Neural network on Employee productivity data 97 statinfer.com
  • 98. Code: Neural network on Employee productivity data 98 statinfer.com
  • 99. Code: Neural network on Employee productivity data 99 statinfer.com
  • 100. Code: Neural network on Employee productivity data 100 statinfer.com
  • 101. Code: Neural network on Employee productivity data 101 statinfer.com
  • 102. There can be many solutions 102 Set-1 statinfer.com
  • 103. There can be many solutions 103 Set-2 statinfer.com
  • 104. There can be many solutions 104 Set-3 statinfer.com
  • 105. Local vs. Global Minimum
  • 106. Local vs. Global Minimum • The neural network might give different results with different start weights. • The algorithm tries to find the local minima rather than global minima. • There can be many local minima’s, which means there can be many solutions to neural network problem • We need to perform the validation checks before choosing the final model. 106Global minimu m Local Minimu m statinfer.com
  • 107. Hidden layers and their role
  • 108. Multi Layer Neural Network 108 1 x1 x2 H11 H12 Y Hidden LayersInput Output H21 H22 H22 statinfer.com
  • 109. The role of hidden layers 109 • The First hidden layer • The first layer is nothing but the liner decision boundaries • The simple logistic regression line outputs • We can see them as multiple lines on the decision space 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 00 00 0 0 00 0 0 00 00 00 0 0 0 0 0 0 0 00 00 0 0 00 00 00 00 00 0 0 00 0 0 0 0 0 0 0 0 11 1 0 00 00 0 0 00 0 00 00 00 0 0 0 0 0 00 00 0 0 00 00 00 00 00 0 0 00 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 00 00 0 0 00 0 00 00 00 0 0 0 0 0 0 00 00 000 0 0 00 0 0 0 1 00 0 0 00 00 00 0 0 00 00 0 0 00 00 00 00 00 0 0 00 0 0 1 statinfer.com
  • 110. The role of hidden layers 110 • The Second hidden layer • The Second layer combines these lines and forms simple decision boundary shapes • The third hidden layer forms even complex shapes within the boundaries generated by second layer. • You can imagine All these layers together divide the whole objective space into multiple decision boundary shapes, the cases within the shape are class-1 outside the shape are class-2 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 00 00 0 0 00 0 0 00 00 00 0 0 0 0 0 0 0 00 00 0 0 00 00 00 00 00 0 0 00 0 0 0 0 0 0 0 0 11 1 0 00 00 0 0 00 0 00 00 00 0 0 0 0 0 00 00 0 0 00 00 00 00 00 0 0 00 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 00 00 0 0 00 0 00 00 00 0 0 0 0 0 0 00 00 000 0 0 00 0 0 0 1 00 0 0 00 00 00 0 0 00 00 0 0 00 00 00 00 00 0 0 00 0 0 1 statinfer.com
  • 111. The Number of hidden layers
  • 112. The Number of hidden layers •There is no concrete rule to choose the right number. We need to choose by trail and error validation •Too few hidden layers might result in imperfect models. The error rate will be high •High number of hidden layers might lead to over‐fitting, but it can be identified by using some validation techniques •The final number is based on the number of predictor variables, training data size and the complexity in the target. •When we are in doubt, its better to go with many hidden nodes than few. It will ensure higher accuracy. The training process will be slower though •Cross validation and testing error can help us in determining the model with optimal hidden layers 112 statinfer.com
  • 114. LAB: Digit Recognizer • Take an image of a handwritten single digit, and determine what that digit is. • Normalized handwritten digits, automatically scanned from envelopes by the U.S. Postal Service. The original scanned digits are binary and of different sizes and orientations; the images here have been de slanted and size normalized, resultingin 16 x 16 grayscale images (Le Cun et al., 1990). • The data are in two gzipped files, and each line consists of the digitid (0-9) followed by the 256 grayscale values. • Build a neural network model that can be used as the digit recognizer • Use the test dataset to validate the true classification power of the model • What is the final accuracy of the model? 114 statinfer.com
  • 115. Code: Digit Recognizer #Importing test and training data - USPS Data digits_train <- read.table("D:Google DriveTrainingDatasetsDigit RecognizerUSPSzip.train.txt", quote=""", comment.char="") digits_test <- read.table("D:Google DriveTrainingDatasetsDigit RecognizerUSPSzip.test.txt", quote=""", comment.char="") dim(digits_train) col_names <- names(digits_train[,-1]) label_levels<-names(table(digits_train$V1)) #Lets see some images. for(i in 1:10) { data_row<-digits_train[i,-1] pixels = matrix(as.numeric(data_row),16,16,byrow=TRUE) image(pixels, axes = FALSE) title(main = paste("Label is" , digits_train[i,1]), font.main = 4) } 115 statinfer.com
  • 117. Code: Digit Recognizer #####Creating multiple columns for multiple outputs #####We need these variables while building the model digit_labels<-data.frame(label=digits_train[,1]) for (i in 1:10) { digit_labels<-cbind(digit_labels, digit_labels$label==i-1) names(digit_labels)[i+1]<-paste("l",i-1,sep="") } label_names<-names(digit_labels[,-1]) #Update the training dataset digits_train1<-cbind(digits_train,digit_labels) names(digits_train1) #formula y~. doesn't work in neuralnet function model_form <- as.formula(paste(paste(label_names, collapse = " + "), "~", paste(col_names, collapse = " + "))) 117 statinfer.com
  • 124. Real-world applications •Self driving car by taking the video as input •Speech recognition •Face recognition •Cancer cell analysis •Heart attack predictions •Currency predictions and stock price predictions •Credit card default and loan predictions •Marketing and advertising by predicting the response probability •Weather forecasting and rainfall prediction 124 statinfer.com
  • 125. Real-world applications •Face recognition : • https://www.youtube.com/watch?v=57VkfXqJ1LU • https://www.youtube.com/watch?v=xVQLBbXdVUY •Autonomous car software • https://www.youtube.com/watch?v=gG72-SjwxAM 125 statinfer.com
  • 126. Drawbacks of Neural Networks
  • 127. Drawbacks of Neural Networks •No real theory that explains how to choose the number of hidden layers •Takes lot of time when the input data is large, needs powerful computing machines •Difficult to interpret the results. Very hard to interpret and measure the impact of individual predictors •Its not easy to choose the right training sample size and learning rate. •The local minimum issue. The gradient descent algorithm produces the optimal weights for the local minimum, the global minimum of the error function is not guaranteed 127 statinfer.com
  • 128. Why the name neural network?
  • 129. Why the name neural network? •The neural network algorithm for solving complex learning problems is inspired by human brain •Our brains are a huge network of processing elements. It contains a network of billions of neurons. •In our brain, a neuron receives input from other neurons. Inputs are combined and send to next neuron •The artificial neural network algorithm is built on the same logic. 129 statinfer.com
  • 130. Why the name neural network? 130 Dendrites Input(X) Cell body  Processor(Swx) Axon  Output(Y) statinfer.com
  • 132. Conclusion •Neural network is a vast subject. Many data scientists solely focus on only Neural network techniques •In this session we practiced the introductory concepts only. Neural Networks has much more advanced techniques. There are many algorithms other than back propagation. •Neural networks particularly work well on some particular class of problems like image recognition. •The neural networks algorithms are very calculation intensive. They require highly efficient computing machines. Large datasets take significant amount of runtime on R. We need to try different types of options and packages. •Currently there is a lot of exciting research is going on, around neural networks. •After gaining sufficient knowledge in this basic session, you may want to explore reinforced learning, deep learning etc., 132 statinfer.com
  • 134. Math- How to update the weights?
  • 135. Math- How to update the weights? •We update the weights backwards by iteratively calculating the error •The formula for weights updating is done using gradient descent method or delta rule also known as Widrow-Hoff rule •First we calculate the weight corrections for the output layer then we take care of hidden layers 135 statinfer.com
  • 136. Math- How to update the weights? • 𝑊𝑗𝑘 ∶= 𝑊𝑗𝑘 + ∆𝑊𝑗𝑘 • 𝑤ℎ𝑒𝑟𝑒 ∆𝑊𝑗𝑘 = 𝜂. 𝑦j 𝛿 𝑘 • 𝜂 is the learning parameter • 𝛿 𝑘 = 𝑦 𝑘(1 − 𝑦 𝑘) ∗ 𝐸𝑟𝑟 (for hidden layers 𝛿 𝑘 = 𝑦 𝑘(1 − 𝑦 𝑘) ∗ 𝑤𝑗 ∗ 𝐸𝑟𝑟) • Err=Expected output-Actual output •The weight corrections is calculated based on the error function •The new weights are chosen in such way that the final error in that network is minimized 136 statinfer.com
  • 137. Math-How does the delta rule work?
  • 138. How does the delta rule work? • Lets consider a simple example to understand the weight updating using delta rule. 138 • If we building a simple logistic regression line. We would like to find the weights using weight update rule • Y=1/(1+e-wx) is the equation • We are searching for the optimal w for our data • Let w be 1 • Y=1/(1+e-x) is the initial equation • The error in our initial step is 3.59 • To reduce the error we will add a delta to w and make it 1.5 • Now w is 1.5 (blue line) • Y=1/(1+e-1.5x) the updated equation • With the updated weight, the error is 1.57 • We can further reduce the error by increasing w by delta statinfer.com
  • 139. How does the delta rule work? 139 • If we repeat the same process of adding delta and updating weights, we can finally end up with minimum error • The weight at that final step is the optimal weight • In this example the weight is 8, and the error is 0 • Y=1/(1+e-8x) is the final equation • In this example, we manually changed the weights to reduce the error. This is just for intuition, manual updating is not feasible for complex optimization problems. • In gradient descent is a scientific optimization method. We update the weights by calculating gradient of the function. statinfer.com
  • 141. How does gradient descent work? •Gradient descent is one of the famous ways to calculate the local minimum •By Changing the weights we are moving towards the minimum value of the error function. The weights are changed by taking steps in the negative direction of the function gradient(derivative). 141 Error Weight statinfer.com
  • 143. Does this method really work? • We changed the weights did it reduce the overall error? • Lets calculate the error with new weights and see the change 143 1 1 0.818545647 0.706552799 0.729364041 0.50218 statinfer.com
  • 144. Gradient Descent method validation •With our initial set of weights the overall error was 0.7137,Y Actual is 0, Y Predicted is 0.7137 error =0.7137 •The new weights give us a predicted value of 0.70655 •In one iteration, we reduced the error from 0.7137 to 0.70655 •The error is reduced by 1%. Repeat the same process with multiple epochs and training examples, we can reduce the error further. 144 input1 input2 Output(Y-Actual) Y Predicted Error Old Weights 1 1 0 0.71371259 0.71371259 Updated Weights 1 1 0 0.706552799 0.706552799 statinfer.com
  • 146. Statinfer.com statinfer.com 146 Download the course videos and handouts from the below link https://statinfer.com/course/machine-learning-with-r-2/curriculum/?c=b433a9be3189