SlideShare uma empresa Scribd logo
1 de 30
Deep Learning
Vidyasagar Bhargava
Contents
1. Introduction
2. Why deep learning?
3. Fundamentals of deep learning
4. How deep learning works?
5. Activation Function
6. Train Neural Network
• How to minimize loss or cost?
• How to move in right direction?
• Stochastic Gradient Descent
• How to calculate gradient?
7. Adaptive learning
8. Over fitting
9. Regularization
10. H2O
• Introduction
• H2O’s Deep learning
• Features
• Parameters
• Demo
Introduction
• Deep learning is an enhanced and powerful form of neural network
which is build on several hidden layers(more than 2)
• Since Data comes in many forms sometimes its difficult for linear
methods to detect non-linearity in the data. In fact, many a times
even non-linear algorithm such as GBM, decision tree fails to learn
from data.
• In such cases, a multi layered neural network which creates non-
linear interaction among the features gives the better solution!
Why Deep Learning?
• Neural Network has been around quite long time and only in past few
years they become so popular.
• Deep learning is powerful because it is able to learn powerful feature
representation in unsupervised manner which differs from traditional
machine learning algorithm where we have to manually handcraft
features.
• Handcraft features works in a lot of domain but some domains like
image classification where data is very high dimensional and makes it
difficult to craft feature that are useful for prediction.
• So deep learning takes the approach to take all the data and figuring
out what are the best features.
A Perceptron
Fundamentals of Deep learning
• At the core of any neural network is a perceptron
• In neural network visual… circle represents neuron and line
represents synapse.
• Synapse has very simple job they take value from neuron and
multiply by specific weight and output the result.
• Neurons are little bit complicated.. their job is to add together the
outputs from all the synapses add a bias term and apply the
activation function.
• Activation function allow neural net to model complex non linear
pattern.
Perceptron Forward Pass
Multiply weight
and Inputs • Step 1 (Synapse)
SUM all together
and add bias term
• Step 2 (Neuron)
Applying Non
linearity
• Step3
(Activtaion
function)
Why is bias added ?
• Bias is similar to the intercept term in linear
regression.
• It helps in achieving better prediction by shifting the
decision boundary
Activation function
• At the core of every activation function is non
linearity which transforms output from linear feature
to non linear feature.
• There are many -many activation functions .
• Some common activation functions are :- Sigmoid,
TanH, ReLu,
Importance of Activation function
• Activation functions add non linearity to our
network’s function.
• Non linearity is important because most of our real
world data is non linear.
How to build neural network with
perceptron?
• Perceptron is very basic of neural network. However,
perceptron isn’t powerful enough to work on linearly
separable data.
• Due to this Multi-Layer Perceptron came into existence
• We can add a hidden layer between input layer and output
layer which gives rise to Multi-Layer Perceptron. (MLP)
• To extend MLP to deep neural network simply add more
layers.
Deep Learning Model
• The input layer consist of neurons to equal to number of input
variable in the data.
• The number of neurons in hidden layer depends on user.
• We can find optimum number of neurons in hidden layers using
cross-validating strategy.
Applying Neural Network
• To quantify how good is our neural network we calculates loss i.e.
sum of difference of actual output and predicted output.
• There are lot of loss functions like cross entropy loss, mean square
error etc.
• Loss is represented as J (Θ)
• Our goal is to minimize the loss so that network can predict the
output more accurately.
• Note Θ = W1, W2,....Wn
J (Θ) = 1/N∑i
N loss(f(x(i); Θ), y(i))
argΘ min 1/N∑i
N loss(f(x(i); Θ), y(i))
Train Neural Network
• Now we have J (Θ) we express out loss and we will train our neural
network to minimize the loss.
• So the objective is find the theta that minimizes the loss function.
• Theta is just weights of our network.
• So loss is a function of the model’s parameters.
• To minimize the loss we need to find the lowest point.
How to minimize loss or cost?
• Once the predicted value is computed, it propagates back layer by
layer and recalculates weights associated with each neuron.
• This is known as back propagation.
• The back propagation algorithm optimizes the network performance
using cost function.
• This cost function is minimized using an iterative sequence of steps
called the gradient descent algorithm.
Gradient Descent: How to move in right direction?
• Start at random point and to get to the bottom.
• To reach bottom we calculate the gradient at this point which points
in the direction of maximum ascent…but we want to go downhill so
just multiply by negative 1 and move in opposite direction downward
and we form new point based on that.
• This way we update our parameters to form new loss.
• We can do this over and over again untill we reach the minimum
loss(untill we reach convergence)
Stochastic Gradient Descent algorithm
• Initialize Θ randomly
• For N epochs
o For each training example (x,y):
• Compute loss gradient ∂J(Θ)/∂ Θ
• Update Θ with update rule :
• Θ := Θ – ἠ* ∂J(Θ)/∂ Θ
Note Θ = W1, W2,....Wn
• Next :- How to calculate gradient part ? i.e. ∂J(Θ)/∂ Θ
How to calculate Gradient?
• Lets say we have simple neural network that just has three nodes. i.e. a input node ,a hidden
node and an output node.
W1 W2 J (Θ)
Lets look at W2. We want to see how W2 changes as our loss changes.
We calculate derivative of J (Θ) w.r.t W2. To do this we apply chain rule.
i.e. we find derivative of J (Θ) w.r.t to O0 and multiply by derivative of O0 w.r.t W2.
Similarly we can also look for W1 here i.e. calculative derivative of J (Θ) w.r.t W1.
i.e. we find derivative of J (Θ) w.r.t O0 multiply by derivative of O0 w.r.t h0 multiply by derivative of h0 w.r.t W1
This is what meant by the idea of back propagating gradients because often times gradient of one parameter
depends on previous parameter so it makes kind of chain of these.
This is idea of back propagation.
X0 h0 O0
∂J(Θ)/∂ W2 = ∂J(Θ)/∂ O0 * ∂ O0 /∂ W2
∂J(Θ)/∂ W1 = ∂J(Θ)/∂ O0 * ∂ O0 /∂ h0* ∂ h0 /∂ W1
Recap
Now we have good idea of :-
• 1. How to calculate the gradient?
• 2.How to move in the right direction?
• 3.How to minimize our loss?
Loss function can be difficult to optimize
• Update rule
• Learning rate (ἠ) actually represent the step size..i.e. how large a step
should we take with each of our gradient update.
• Next is how to choose learning rate?
Θ := Θ – ἠ* ∂J(Θ)/∂ Θ
How to choose learning rate (ἠ)?
• Small learning rate take lot of time to reach minimum and may be
struck in local minima rather global minima
• Large learning rate can leads to divergence or increase the loss.
• We need to find goldilocks in middle.
• There are couple ways like:- guessing…try whole bunch of different
values and see what gives best result.its very time consuming and not
best use of our resources.
• Do something smarter : Adaptive learning rate which adapt and
change how learning is going.
• We can adapt and change learning rate based on :-
o How fast is learning happening?
o How large are gradients are?
o How large are weights are?
o We can have different learning rate for different parameters.
Adaptive learning Rate Algorithms
• ADAM
• Momentum
• NAG
• Adagrad
• Adadelta (H20)
• RMSProp
For More info on this please check below URL
• http://ruder.io/optimizing-gradient-descent/
Over fitting
• Neural Networks are really powerful models and that are capable of
learning all sorts of features and functions.
• Sometimes they can be too powerful…i.e. they can either over fit or
memorize training examples.
• The idea of over fitting is model performs very well on training set
but when it comes to real world examples model learnt so specific to
training set that it does not apply outside or to test set.
Regularization
• Regularization is how we prevent over fitting in machine learning or
neural networks
• Regularization techniques :-
• Dropout
• Early stop
• Weight Regularization
• …Others
Intro to H2O
• H2O is fast, scalable, open-source machine learning and deep
learning for smarter application.
• Using in-memory compression, H2O handles billions of data
rows in-memory, even with a small cluster.
• H2O includes many common machine learning algorithms,
such as generalized linear modeling (linear regression, logistic
regression, etc.), Naive Bayes, principal components analysis,
time series, k-means clustering, and others.
H2O’s Deep Learning
• H2O ‘s Deep Learning is based on multi-layer feed forward artificial
neural network that is trained with stochastic gradient descent using
back propagation.
• A feed forward artificial neural network (ANN) also known as deep
neural network(DNN) or multi-layer perceptron(MLP) is the most
common type of Deep neural network and the only type that is
supported natively in H2O-3.
• Other types of DNN such as Convolution Neural Network (CNNs) and
Recurrent Neural Network RNN are popular as well.
• MLP works well on transactional data (tabular) ,CNN is great choice
for particularly image classification and RNN for sequential data (e.g.
text, audio, time-series).
• H20 deep water project supports CNNs and RNNs through third party
integration of deep learning libraries such as Tensorflow, Caffe and
MXNet.
Features
Features of H2O’s deep learning are:-
• Multi- threaded distributed parallel computation
• Adaptive learning rate for convergence
• Regularization options like L1 and L2
• Automatic missing value imputation
• Hyper parameter optimization using grid/random search.
• For optimization it uses the Hogwild method which is parallelized
version of SGD.
Parameters
• Hidden – It Specifies the number of hidden layer and number of
neurons in each layer.
• Epochs – It specifies the number of iterations to be done.
• Rate –It specifies the learning rate.
• Activation-It specifies the type of activation function to use.
(In H2O major activation function are TanH, Rectifier and Maxout.)
H2O Deep Learning Demo
Thank You!

Mais conteúdo relacionado

Mais procurados

Introduction to CNN
Introduction to CNNIntroduction to CNN
Introduction to CNNShuai Zhang
 
Les réseaux de neurones
Les réseaux de neuronesLes réseaux de neurones
Les réseaux de neuronesMariam Amchayd
 
Les réseaux de neurones convolutifs CNN
Les réseaux de neurones convolutifs CNNLes réseaux de neurones convolutifs CNN
Les réseaux de neurones convolutifs CNNSALMARACHIDI1
 
Competition winning learning rates
Competition winning learning ratesCompetition winning learning rates
Competition winning learning ratesMLconf
 
Introduction to batch normalization
Introduction to batch normalizationIntroduction to batch normalization
Introduction to batch normalizationJamie (Taka) Wang
 
Convolutional Neural Network and Its Applications
Convolutional Neural Network and Its ApplicationsConvolutional Neural Network and Its Applications
Convolutional Neural Network and Its ApplicationsKasun Chinthaka Piyarathna
 
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...PyData
 
Training Neural Networks
Training Neural NetworksTraining Neural Networks
Training Neural NetworksDatabricks
 
Decision Tree Learning
Decision Tree LearningDecision Tree Learning
Decision Tree LearningMilind Gokhale
 
Label propagation - Semisupervised Learning with Applications to NLP
Label propagation - Semisupervised Learning with Applications to NLPLabel propagation - Semisupervised Learning with Applications to NLP
Label propagation - Semisupervised Learning with Applications to NLPDavid Przybilla
 
Integer quantization for deep learning inference: principles and empirical ev...
Integer quantization for deep learning inference: principles and empirical ev...Integer quantization for deep learning inference: principles and empirical ev...
Integer quantization for deep learning inference: principles and empirical ev...jemin lee
 
Activation functions
Activation functionsActivation functions
Activation functionsPRATEEK SAHU
 
Multi Layer Perceptron & Back Propagation
Multi Layer Perceptron & Back PropagationMulti Layer Perceptron & Back Propagation
Multi Layer Perceptron & Back PropagationSung-ju Kim
 
Introduction au Deep Learning
Introduction au Deep Learning Introduction au Deep Learning
Introduction au Deep Learning Niji
 
Activation function
Activation functionActivation function
Activation functionAstha Jain
 

Mais procurados (20)

Introduction to CNN
Introduction to CNNIntroduction to CNN
Introduction to CNN
 
Autoencoder
AutoencoderAutoencoder
Autoencoder
 
Les réseaux de neurones
Les réseaux de neuronesLes réseaux de neurones
Les réseaux de neurones
 
Les réseaux de neurones convolutifs CNN
Les réseaux de neurones convolutifs CNNLes réseaux de neurones convolutifs CNN
Les réseaux de neurones convolutifs CNN
 
Competition winning learning rates
Competition winning learning ratesCompetition winning learning rates
Competition winning learning rates
 
Introduction to batch normalization
Introduction to batch normalizationIntroduction to batch normalization
Introduction to batch normalization
 
Deep learning
Deep learningDeep learning
Deep learning
 
Convolutional Neural Network and Its Applications
Convolutional Neural Network and Its ApplicationsConvolutional Neural Network and Its Applications
Convolutional Neural Network and Its Applications
 
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
 
Distributed ADMM
Distributed ADMMDistributed ADMM
Distributed ADMM
 
Training Neural Networks
Training Neural NetworksTraining Neural Networks
Training Neural Networks
 
Fuzzy c means manual work
Fuzzy c means manual workFuzzy c means manual work
Fuzzy c means manual work
 
Bagging.pptx
Bagging.pptxBagging.pptx
Bagging.pptx
 
Decision Tree Learning
Decision Tree LearningDecision Tree Learning
Decision Tree Learning
 
Label propagation - Semisupervised Learning with Applications to NLP
Label propagation - Semisupervised Learning with Applications to NLPLabel propagation - Semisupervised Learning with Applications to NLP
Label propagation - Semisupervised Learning with Applications to NLP
 
Integer quantization for deep learning inference: principles and empirical ev...
Integer quantization for deep learning inference: principles and empirical ev...Integer quantization for deep learning inference: principles and empirical ev...
Integer quantization for deep learning inference: principles and empirical ev...
 
Activation functions
Activation functionsActivation functions
Activation functions
 
Multi Layer Perceptron & Back Propagation
Multi Layer Perceptron & Back PropagationMulti Layer Perceptron & Back Propagation
Multi Layer Perceptron & Back Propagation
 
Introduction au Deep Learning
Introduction au Deep Learning Introduction au Deep Learning
Introduction au Deep Learning
 
Activation function
Activation functionActivation function
Activation function
 

Semelhante a Introduction to Deep learning and H2O for beginner's

Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...Simplilearn
 
Deep learning from scratch
Deep learning from scratch Deep learning from scratch
Deep learning from scratch Eran Shlomo
 
Deep Learning Sample Class (Jon Lederman)
Deep Learning Sample Class (Jon Lederman)Deep Learning Sample Class (Jon Lederman)
Deep Learning Sample Class (Jon Lederman)Jon Lederman
 
08 neural networks
08 neural networks08 neural networks
08 neural networksankit_ppt
 
ML Module 3 Non Linear Learning.pptx
ML Module 3 Non Linear Learning.pptxML Module 3 Non Linear Learning.pptx
ML Module 3 Non Linear Learning.pptxDebabrataPain1
 
Visualization of Deep Learning
Visualization of Deep LearningVisualization of Deep Learning
Visualization of Deep LearningYaminiAlapati1
 
Activation functions and Training Algorithms for Deep Neural network
Activation functions and Training Algorithms for Deep Neural networkActivation functions and Training Algorithms for Deep Neural network
Activation functions and Training Algorithms for Deep Neural networkGayatri Khanvilkar
 
EssentialsOfMachineLearning.pdf
EssentialsOfMachineLearning.pdfEssentialsOfMachineLearning.pdf
EssentialsOfMachineLearning.pdfAnkita Tiwari
 
V2.0 open power ai virtual university deep learning and ai introduction
V2.0 open power ai virtual university   deep learning and ai introductionV2.0 open power ai virtual university   deep learning and ai introduction
V2.0 open power ai virtual university deep learning and ai introductionGanesan Narayanasamy
 
Machine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis IntroductionMachine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis IntroductionTe-Yen Liu
 
Hands on machine learning with scikit-learn and tensor flow by ahmed yousry
Hands on machine learning with scikit-learn and tensor flow by ahmed yousryHands on machine learning with scikit-learn and tensor flow by ahmed yousry
Hands on machine learning with scikit-learn and tensor flow by ahmed yousryAhmed Yousry
 
Deep Learning in Recommender Systems - RecSys Summer School 2017
Deep Learning in Recommender Systems - RecSys Summer School 2017Deep Learning in Recommender Systems - RecSys Summer School 2017
Deep Learning in Recommender Systems - RecSys Summer School 2017Balázs Hidasi
 
Introduction to Neural networks (under graduate course) Lecture 9 of 9
Introduction to Neural networks (under graduate course) Lecture 9 of 9Introduction to Neural networks (under graduate course) Lecture 9 of 9
Introduction to Neural networks (under graduate course) Lecture 9 of 9Randa Elanwar
 
Foundations: Artificial Neural Networks
Foundations: Artificial Neural NetworksFoundations: Artificial Neural Networks
Foundations: Artificial Neural Networksananth
 
Deep learning crash course
Deep learning crash courseDeep learning crash course
Deep learning crash courseVishwas N
 

Semelhante a Introduction to Deep learning and H2O for beginner's (20)

Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
 
Deep learning from scratch
Deep learning from scratch Deep learning from scratch
Deep learning from scratch
 
Deep learning - a primer
Deep learning - a primerDeep learning - a primer
Deep learning - a primer
 
Deep learning - a primer
Deep learning - a primerDeep learning - a primer
Deep learning - a primer
 
Deep Learning Sample Class (Jon Lederman)
Deep Learning Sample Class (Jon Lederman)Deep Learning Sample Class (Jon Lederman)
Deep Learning Sample Class (Jon Lederman)
 
08 neural networks
08 neural networks08 neural networks
08 neural networks
 
ML Module 3 Non Linear Learning.pptx
ML Module 3 Non Linear Learning.pptxML Module 3 Non Linear Learning.pptx
ML Module 3 Non Linear Learning.pptx
 
Visualization of Deep Learning
Visualization of Deep LearningVisualization of Deep Learning
Visualization of Deep Learning
 
Activation functions and Training Algorithms for Deep Neural network
Activation functions and Training Algorithms for Deep Neural networkActivation functions and Training Algorithms for Deep Neural network
Activation functions and Training Algorithms for Deep Neural network
 
EssentialsOfMachineLearning.pdf
EssentialsOfMachineLearning.pdfEssentialsOfMachineLearning.pdf
EssentialsOfMachineLearning.pdf
 
Unit 2 ml.pptx
Unit 2 ml.pptxUnit 2 ml.pptx
Unit 2 ml.pptx
 
V2.0 open power ai virtual university deep learning and ai introduction
V2.0 open power ai virtual university   deep learning and ai introductionV2.0 open power ai virtual university   deep learning and ai introduction
V2.0 open power ai virtual university deep learning and ai introduction
 
Machine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis IntroductionMachine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis Introduction
 
Hands on machine learning with scikit-learn and tensor flow by ahmed yousry
Hands on machine learning with scikit-learn and tensor flow by ahmed yousryHands on machine learning with scikit-learn and tensor flow by ahmed yousry
Hands on machine learning with scikit-learn and tensor flow by ahmed yousry
 
Deep Learning in Recommender Systems - RecSys Summer School 2017
Deep Learning in Recommender Systems - RecSys Summer School 2017Deep Learning in Recommender Systems - RecSys Summer School 2017
Deep Learning in Recommender Systems - RecSys Summer School 2017
 
Introduction to Neural networks (under graduate course) Lecture 9 of 9
Introduction to Neural networks (under graduate course) Lecture 9 of 9Introduction to Neural networks (under graduate course) Lecture 9 of 9
Introduction to Neural networks (under graduate course) Lecture 9 of 9
 
Deep learning
Deep learningDeep learning
Deep learning
 
Deeplearning
Deeplearning Deeplearning
Deeplearning
 
Foundations: Artificial Neural Networks
Foundations: Artificial Neural NetworksFoundations: Artificial Neural Networks
Foundations: Artificial Neural Networks
 
Deep learning crash course
Deep learning crash courseDeep learning crash course
Deep learning crash course
 

Último

Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramMoniSankarHazra
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...amitlee9823
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...amitlee9823
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 

Último (20)

Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 

Introduction to Deep learning and H2O for beginner's

  • 2. Contents 1. Introduction 2. Why deep learning? 3. Fundamentals of deep learning 4. How deep learning works? 5. Activation Function 6. Train Neural Network • How to minimize loss or cost? • How to move in right direction? • Stochastic Gradient Descent • How to calculate gradient? 7. Adaptive learning 8. Over fitting 9. Regularization 10. H2O • Introduction • H2O’s Deep learning • Features • Parameters • Demo
  • 3. Introduction • Deep learning is an enhanced and powerful form of neural network which is build on several hidden layers(more than 2) • Since Data comes in many forms sometimes its difficult for linear methods to detect non-linearity in the data. In fact, many a times even non-linear algorithm such as GBM, decision tree fails to learn from data. • In such cases, a multi layered neural network which creates non- linear interaction among the features gives the better solution!
  • 4. Why Deep Learning? • Neural Network has been around quite long time and only in past few years they become so popular. • Deep learning is powerful because it is able to learn powerful feature representation in unsupervised manner which differs from traditional machine learning algorithm where we have to manually handcraft features. • Handcraft features works in a lot of domain but some domains like image classification where data is very high dimensional and makes it difficult to craft feature that are useful for prediction. • So deep learning takes the approach to take all the data and figuring out what are the best features.
  • 6. Fundamentals of Deep learning • At the core of any neural network is a perceptron • In neural network visual… circle represents neuron and line represents synapse. • Synapse has very simple job they take value from neuron and multiply by specific weight and output the result. • Neurons are little bit complicated.. their job is to add together the outputs from all the synapses add a bias term and apply the activation function. • Activation function allow neural net to model complex non linear pattern.
  • 7. Perceptron Forward Pass Multiply weight and Inputs • Step 1 (Synapse) SUM all together and add bias term • Step 2 (Neuron) Applying Non linearity • Step3 (Activtaion function)
  • 8. Why is bias added ? • Bias is similar to the intercept term in linear regression. • It helps in achieving better prediction by shifting the decision boundary
  • 9. Activation function • At the core of every activation function is non linearity which transforms output from linear feature to non linear feature. • There are many -many activation functions . • Some common activation functions are :- Sigmoid, TanH, ReLu,
  • 10. Importance of Activation function • Activation functions add non linearity to our network’s function. • Non linearity is important because most of our real world data is non linear.
  • 11. How to build neural network with perceptron? • Perceptron is very basic of neural network. However, perceptron isn’t powerful enough to work on linearly separable data. • Due to this Multi-Layer Perceptron came into existence • We can add a hidden layer between input layer and output layer which gives rise to Multi-Layer Perceptron. (MLP) • To extend MLP to deep neural network simply add more layers.
  • 12. Deep Learning Model • The input layer consist of neurons to equal to number of input variable in the data. • The number of neurons in hidden layer depends on user. • We can find optimum number of neurons in hidden layers using cross-validating strategy.
  • 13. Applying Neural Network • To quantify how good is our neural network we calculates loss i.e. sum of difference of actual output and predicted output. • There are lot of loss functions like cross entropy loss, mean square error etc. • Loss is represented as J (Θ) • Our goal is to minimize the loss so that network can predict the output more accurately. • Note Θ = W1, W2,....Wn J (Θ) = 1/N∑i N loss(f(x(i); Θ), y(i)) argΘ min 1/N∑i N loss(f(x(i); Θ), y(i))
  • 14. Train Neural Network • Now we have J (Θ) we express out loss and we will train our neural network to minimize the loss. • So the objective is find the theta that minimizes the loss function. • Theta is just weights of our network. • So loss is a function of the model’s parameters. • To minimize the loss we need to find the lowest point.
  • 15. How to minimize loss or cost? • Once the predicted value is computed, it propagates back layer by layer and recalculates weights associated with each neuron. • This is known as back propagation. • The back propagation algorithm optimizes the network performance using cost function. • This cost function is minimized using an iterative sequence of steps called the gradient descent algorithm.
  • 16. Gradient Descent: How to move in right direction? • Start at random point and to get to the bottom. • To reach bottom we calculate the gradient at this point which points in the direction of maximum ascent…but we want to go downhill so just multiply by negative 1 and move in opposite direction downward and we form new point based on that. • This way we update our parameters to form new loss. • We can do this over and over again untill we reach the minimum loss(untill we reach convergence)
  • 17. Stochastic Gradient Descent algorithm • Initialize Θ randomly • For N epochs o For each training example (x,y): • Compute loss gradient ∂J(Θ)/∂ Θ • Update Θ with update rule : • Θ := Θ – ἠ* ∂J(Θ)/∂ Θ Note Θ = W1, W2,....Wn • Next :- How to calculate gradient part ? i.e. ∂J(Θ)/∂ Θ
  • 18. How to calculate Gradient? • Lets say we have simple neural network that just has three nodes. i.e. a input node ,a hidden node and an output node. W1 W2 J (Θ) Lets look at W2. We want to see how W2 changes as our loss changes. We calculate derivative of J (Θ) w.r.t W2. To do this we apply chain rule. i.e. we find derivative of J (Θ) w.r.t to O0 and multiply by derivative of O0 w.r.t W2. Similarly we can also look for W1 here i.e. calculative derivative of J (Θ) w.r.t W1. i.e. we find derivative of J (Θ) w.r.t O0 multiply by derivative of O0 w.r.t h0 multiply by derivative of h0 w.r.t W1 This is what meant by the idea of back propagating gradients because often times gradient of one parameter depends on previous parameter so it makes kind of chain of these. This is idea of back propagation. X0 h0 O0 ∂J(Θ)/∂ W2 = ∂J(Θ)/∂ O0 * ∂ O0 /∂ W2 ∂J(Θ)/∂ W1 = ∂J(Θ)/∂ O0 * ∂ O0 /∂ h0* ∂ h0 /∂ W1
  • 19. Recap Now we have good idea of :- • 1. How to calculate the gradient? • 2.How to move in the right direction? • 3.How to minimize our loss?
  • 20. Loss function can be difficult to optimize • Update rule • Learning rate (ἠ) actually represent the step size..i.e. how large a step should we take with each of our gradient update. • Next is how to choose learning rate? Θ := Θ – ἠ* ∂J(Θ)/∂ Θ
  • 21. How to choose learning rate (ἠ)? • Small learning rate take lot of time to reach minimum and may be struck in local minima rather global minima • Large learning rate can leads to divergence or increase the loss. • We need to find goldilocks in middle. • There are couple ways like:- guessing…try whole bunch of different values and see what gives best result.its very time consuming and not best use of our resources. • Do something smarter : Adaptive learning rate which adapt and change how learning is going. • We can adapt and change learning rate based on :- o How fast is learning happening? o How large are gradients are? o How large are weights are? o We can have different learning rate for different parameters.
  • 22. Adaptive learning Rate Algorithms • ADAM • Momentum • NAG • Adagrad • Adadelta (H20) • RMSProp For More info on this please check below URL • http://ruder.io/optimizing-gradient-descent/
  • 23. Over fitting • Neural Networks are really powerful models and that are capable of learning all sorts of features and functions. • Sometimes they can be too powerful…i.e. they can either over fit or memorize training examples. • The idea of over fitting is model performs very well on training set but when it comes to real world examples model learnt so specific to training set that it does not apply outside or to test set.
  • 24. Regularization • Regularization is how we prevent over fitting in machine learning or neural networks • Regularization techniques :- • Dropout • Early stop • Weight Regularization • …Others
  • 25. Intro to H2O • H2O is fast, scalable, open-source machine learning and deep learning for smarter application. • Using in-memory compression, H2O handles billions of data rows in-memory, even with a small cluster. • H2O includes many common machine learning algorithms, such as generalized linear modeling (linear regression, logistic regression, etc.), Naive Bayes, principal components analysis, time series, k-means clustering, and others.
  • 26. H2O’s Deep Learning • H2O ‘s Deep Learning is based on multi-layer feed forward artificial neural network that is trained with stochastic gradient descent using back propagation. • A feed forward artificial neural network (ANN) also known as deep neural network(DNN) or multi-layer perceptron(MLP) is the most common type of Deep neural network and the only type that is supported natively in H2O-3. • Other types of DNN such as Convolution Neural Network (CNNs) and Recurrent Neural Network RNN are popular as well. • MLP works well on transactional data (tabular) ,CNN is great choice for particularly image classification and RNN for sequential data (e.g. text, audio, time-series). • H20 deep water project supports CNNs and RNNs through third party integration of deep learning libraries such as Tensorflow, Caffe and MXNet.
  • 27. Features Features of H2O’s deep learning are:- • Multi- threaded distributed parallel computation • Adaptive learning rate for convergence • Regularization options like L1 and L2 • Automatic missing value imputation • Hyper parameter optimization using grid/random search. • For optimization it uses the Hogwild method which is parallelized version of SGD.
  • 28. Parameters • Hidden – It Specifies the number of hidden layer and number of neurons in each layer. • Epochs – It specifies the number of iterations to be done. • Rate –It specifies the learning rate. • Activation-It specifies the type of activation function to use. (In H2O major activation function are TanH, Rectifier and Maxout.)