SlideShare uma empresa Scribd logo
1 de 31
Baixar para ler offline
Kevin McGuinness
kevin.mcguinness@dcu.ie
Research Fellow
Insight Centre for Data Analytics
Dublin City University
DEEP
LEARNING
WORKSHOP
Dublin City University
27-28 April 2017
Training Deep Networks with
Backprop
Day 1 Lecture 4
1
Recall: multi-layer perceptron
2
Recall: multi-layer perceptron
3
Recall: multi-layer perceptron
4
Fitting deep networks to data
We need an algorithm to find good weight configurations.
This is an unconstrained continuous optimization problem.
We can use standard iterative optimization methods like gradient descent.
To use gradient descent, we need a way to find the gradient of the loss with
respect to the parameters (weights and biases) of the network.
Error backpropagation is an efficient algorithm for finding these gradients.
Basically an application of the multivariate chain rule and dynamic programming.
In practice, computing the full gradient is expensive. Backpropagation is typically
used with stochastic gradient descent.
Gradient descent
If we had a way to compute the
gradient of the loss with respect to
the parameters, we could use
gradient descent to optimize
6
Stochastic gradient descent
Computing the gradient for the full dataset at each step is slow
● Especially if the dataset is large!
For most losses we care about, the total loss can be expressed as a sum (or
average) of losses on the individual examples
The gradient is the average of the gradients on individual examples
7
Stochastic gradient descent
SGD: estimate the gradient using a subset of the examples
● Pick a single random training example
● Estimate a (noisy) loss on this single training example (the stochastic
gradient)
● Compute gradient wrt. this loss
● Take a step of gradient descent using the estimated loss
8
Stochastic gradient descent
Advantages
● Very fast (only need to compute gradient on single example)
● Memory efficient (does not need the full dataset to compute gradient)
● Online (don’t need full dataset at each step)
Disadvantages
● Gradient is very noisy, may not always point in correct direction
● Convergence can be slower
Improvement
● Estimate gradient on small batch of training examples (say 50)
● Known as mini-batch stochastic gradient descent
9
Finding the gradient with backprop
Combination of the chain rule and dynamic
programming
Chain rule: allows us to find gradient of
the loss with respect to any input,
activation, or parameter
Dynamic programming: reuse
computations from previous steps. You
don’t need to evaluate the full chain for
every parameter.
The chain rule
Easily differentiate compositions of functions.
11
The chain rule
The chain rule
The chain rule
Dynamic programming
Modular backprop
You could use the chain rule on all the individual neurons to compute the gradients
with respect to the parameters and backpropagate the error signal.
Much more useful to use the layer abstraction
Then define the backpropation algorithm in terms of three operations that layers
need to be able to do.
This is called modular backpropagation
The layer abstraction
The layer abstraction
Linear layer
ReLU layer
Modular backprop
Using this idea, it is possible to create
many types of layers
● Linear (fully connected layers)
● Activation functions (sigmoid, ReLU)
● Convolutions
● Pooling
● Dropout
Once layers support the backward and
forward operations, they can be plugged
together to create more complex functions
Convolution
Input Error (L)
Gradients
ReLU
Linear
Gradients
Output Error (L+1)
21
Implementation notes
Caffe and Torch
Libraries like Caffe and Torch implement
backpropagation this way
To define a new layer, you need to create
an class and define the forward and
backward operations
Theano and TensorFlow
Libraries like Theano and TensorFlow
operate on a computational graph
To define a new layer, you only need to
specify the forward operation. Autodiff is
used to automatically infer backward.
You also don't need to implement
backprop manually in Theano or
TensorFlow. It uses computational graph
optimizations to automatically factor out
common computations.
22
Practical tips for training deep nets
23
Choosing hyperparameters
Can already see we have lots of
hyperparameters to choose:
1. Learning rate
2. Regularization constant
3. Number of epochs
4. Number of hidden layers
5. Nodes in each hidden layer
6. Weight initialization strategy
7. Loss function
8. Activation functions
9. …
:(
Choosing these is a bit of an art.
Good news: in practice many
configurations work well
There are some reasonable heuristics. E.g
1. Try 0.1 for the learning rate. If this diverges,
divide by 3. Repeat.
2. Try an existing network architecture and
adapt it for your problem
3. Try overfit the data with a big model, then
regularize
You can also do a hyperparameter search
if you have enough compute:
● Randomized search tends to work well
Choosing the learning rate
For first order optimization methods, we need to
choose a learning rate (aka step size)
● Too large: overshoots local minimum, loss
increases
● Too small: makes very slow progress, can get stuck
● Good learning rate: makes steady progress toward
local minimum
Usually want a higher learning rate at the start and
a lower one later on.
Common strategy in practice:
● Start off with a high LR (like 0.1 - 0.001),
● Run for several epochs (1 - 10)
● Decrease LR by multiplying a constant factor (0.1 -
0.5)
w
L
Loss
w
t
α too large
Good α
α too
small
25
Training and monitoring progress
1. Split data into train, validation, and test sets
○ Keep 10-30% of data for validation
2. Fit model parameters on train set using SGD
3. After each epoch:
○ Test model on validation set and compute loss
■ Also compute whatever other metrics you
are interested in, e.g. top-5 accuracy
○ Save a snapshot of the model
4. Plot learning curves as training progresses
5. Stop when validation loss starts to increase
6. Use model with minimum validation loss epoch
Loss
Validation loss
Training loss
Best model
26
Divergence
Symptoms:
● Training loss keeps increasing
● Inf, NaN in loss
Try:
● Reduce learning rate
● Zero center and scale inputs/targets
● Check weight initialization strategy (monitor
gradients)
● Numerically check your gradients
● Clip gradient norm
w
L
Loss
wt
α too large
27
Slow convergence
Symptoms:
● Training loss decreases slowly
● Training loss does not decrease
Try:
● Increase learning rate
● Zero center and scale inputs/targets
● Check weight initialization strategy
(monitor gradients)
● Numerically check gradients
● Use ReLUs
● Increase batch size
● Change loss, model architecture
w
L
Loss
wt
28
Overfitting
Symptoms:
● Validation loss decreases at first, then starts
increasing
● Training loss continues to go down
Try:
● Find more training data
● Add stronger regularization
○ dropout, drop-connect, L2
● Data augmentation (flips, rotations, noise)
● Reduce complexity of your model
epoch
Loss
Validation loss
Training loss
29
Underfitting
Symptoms:
● Training loss decreases at first but then stops
● Training loss still high
● Training loss tracks validation loss
Try:
● Increase model capacity
○ Add more layers, increase layer size
● Use more suitable network architecture
○ E.g. multi-scale architecture
● Decrease regularization strength
epoch
Loss
Validation loss
Training loss
30
Questions?
31

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Deep Learning for Computer Vision: Visualization (UPC 2016)
Deep Learning for Computer Vision: Visualization (UPC 2016)Deep Learning for Computer Vision: Visualization (UPC 2016)
Deep Learning for Computer Vision: Visualization (UPC 2016)
 
Unsupervised Learning (D2L6 2017 UPC Deep Learning for Computer Vision)
Unsupervised Learning (D2L6 2017 UPC Deep Learning for Computer Vision)Unsupervised Learning (D2L6 2017 UPC Deep Learning for Computer Vision)
Unsupervised Learning (D2L6 2017 UPC Deep Learning for Computer Vision)
 
Skip RNN: Learning to Skip State Updates in Recurrent Neural Networks
Skip RNN: Learning to Skip State Updates in Recurrent Neural NetworksSkip RNN: Learning to Skip State Updates in Recurrent Neural Networks
Skip RNN: Learning to Skip State Updates in Recurrent Neural Networks
 
Methodology (DLAI D6L2 2017 UPC Deep Learning for Artificial Intelligence)
Methodology (DLAI D6L2 2017 UPC Deep Learning for Artificial Intelligence)Methodology (DLAI D6L2 2017 UPC Deep Learning for Artificial Intelligence)
Methodology (DLAI D6L2 2017 UPC Deep Learning for Artificial Intelligence)
 
Deep Learning for Computer Vision: Deep Networks (UPC 2016)
Deep Learning for Computer Vision: Deep Networks (UPC 2016)Deep Learning for Computer Vision: Deep Networks (UPC 2016)
Deep Learning for Computer Vision: Deep Networks (UPC 2016)
 
Deep Learning for Computer Vision: Backward Propagation (UPC 2016)
Deep Learning for Computer Vision: Backward Propagation (UPC 2016)Deep Learning for Computer Vision: Backward Propagation (UPC 2016)
Deep Learning for Computer Vision: Backward Propagation (UPC 2016)
 
Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...
Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...
Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...
 
Deep Learning for Computer Vision: Data Augmentation (UPC 2016)
Deep Learning for Computer Vision: Data Augmentation (UPC 2016)Deep Learning for Computer Vision: Data Augmentation (UPC 2016)
Deep Learning for Computer Vision: Data Augmentation (UPC 2016)
 
Deep Learning for Computer Vision: Unsupervised Learning (UPC 2016)
Deep Learning for Computer Vision: Unsupervised Learning (UPC 2016)Deep Learning for Computer Vision: Unsupervised Learning (UPC 2016)
Deep Learning for Computer Vision: Unsupervised Learning (UPC 2016)
 
Deep Generative Models - Kevin McGuinness - UPC Barcelona 2018
Deep Generative Models - Kevin McGuinness - UPC Barcelona 2018Deep Generative Models - Kevin McGuinness - UPC Barcelona 2018
Deep Generative Models - Kevin McGuinness - UPC Barcelona 2018
 
Transfer Learning and Domain Adaptation (DLAI D5L2 2017 UPC Deep Learning for...
Transfer Learning and Domain Adaptation (DLAI D5L2 2017 UPC Deep Learning for...Transfer Learning and Domain Adaptation (DLAI D5L2 2017 UPC Deep Learning for...
Transfer Learning and Domain Adaptation (DLAI D5L2 2017 UPC Deep Learning for...
 
Convolutional Neural Networks (D1L3 2017 UPC Deep Learning for Computer Vision)
Convolutional Neural Networks (D1L3 2017 UPC Deep Learning for Computer Vision)Convolutional Neural Networks (D1L3 2017 UPC Deep Learning for Computer Vision)
Convolutional Neural Networks (D1L3 2017 UPC Deep Learning for Computer Vision)
 
Semantic Segmentation - Míriam Bellver - UPC Barcelona 2018
Semantic Segmentation - Míriam Bellver - UPC Barcelona 2018Semantic Segmentation - Míriam Bellver - UPC Barcelona 2018
Semantic Segmentation - Míriam Bellver - UPC Barcelona 2018
 
Deep Learning for Computer Vision: Memory usage and computational considerati...
Deep Learning for Computer Vision: Memory usage and computational considerati...Deep Learning for Computer Vision: Memory usage and computational considerati...
Deep Learning for Computer Vision: Memory usage and computational considerati...
 
Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)
Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)
Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)
 
Object Segmentation (D2L7 Insight@DCU Machine Learning Workshop 2017)
Object Segmentation (D2L7 Insight@DCU Machine Learning Workshop 2017)Object Segmentation (D2L7 Insight@DCU Machine Learning Workshop 2017)
Object Segmentation (D2L7 Insight@DCU Machine Learning Workshop 2017)
 
Convolutional Neural Networks - Veronica Vilaplana - UPC Barcelona 2018
Convolutional Neural Networks - Veronica Vilaplana - UPC Barcelona 2018Convolutional Neural Networks - Veronica Vilaplana - UPC Barcelona 2018
Convolutional Neural Networks - Veronica Vilaplana - UPC Barcelona 2018
 
Lecture 6: Convolutional Neural Networks
Lecture 6: Convolutional Neural NetworksLecture 6: Convolutional Neural Networks
Lecture 6: Convolutional Neural Networks
 
Attention Models (D3L6 2017 UPC Deep Learning for Computer Vision)
Attention Models (D3L6 2017 UPC Deep Learning for Computer Vision)Attention Models (D3L6 2017 UPC Deep Learning for Computer Vision)
Attention Models (D3L6 2017 UPC Deep Learning for Computer Vision)
 
Deep Learning for Computer Vision: Generative models and adversarial training...
Deep Learning for Computer Vision: Generative models and adversarial training...Deep Learning for Computer Vision: Generative models and adversarial training...
Deep Learning for Computer Vision: Generative models and adversarial training...
 

Semelhante a Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Workshop 2017)

GLM & GBM in H2O
GLM & GBM in H2OGLM & GBM in H2O
GLM & GBM in H2O
Sri Ambati
 
NITW_Improving Deep Neural Networks.pptx
NITW_Improving Deep Neural Networks.pptxNITW_Improving Deep Neural Networks.pptx
NITW_Improving Deep Neural Networks.pptx
ssuserd23711
 
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
Jinwon Lee
 

Semelhante a Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Workshop 2017) (20)

Deep learning crash course
Deep learning crash courseDeep learning crash course
Deep learning crash course
 
08 neural networks
08 neural networks08 neural networks
08 neural networks
 
Methods of Optimization in Machine Learning
Methods of Optimization in Machine LearningMethods of Optimization in Machine Learning
Methods of Optimization in Machine Learning
 
GLM & GBM in H2O
GLM & GBM in H2OGLM & GBM in H2O
GLM & GBM in H2O
 
Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...
Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...
Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...
 
Deeplearning
Deeplearning Deeplearning
Deeplearning
 
Using Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsUsing Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning Models
 
Using Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsUsing Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning Models
 
Nimrita deep learning
Nimrita deep learningNimrita deep learning
Nimrita deep learning
 
ngboost.pptx
ngboost.pptxngboost.pptx
ngboost.pptx
 
Hyperparameter Tuning
Hyperparameter TuningHyperparameter Tuning
Hyperparameter Tuning
 
Apache Spark Based Hyper-Parameter Selection and Adaptive Model Tuning for D...
 Apache Spark Based Hyper-Parameter Selection and Adaptive Model Tuning for D... Apache Spark Based Hyper-Parameter Selection and Adaptive Model Tuning for D...
Apache Spark Based Hyper-Parameter Selection and Adaptive Model Tuning for D...
 
C3 w1
C3 w1C3 w1
C3 w1
 
NITW_Improving Deep Neural Networks (1).pptx
NITW_Improving Deep Neural Networks (1).pptxNITW_Improving Deep Neural Networks (1).pptx
NITW_Improving Deep Neural Networks (1).pptx
 
NITW_Improving Deep Neural Networks.pptx
NITW_Improving Deep Neural Networks.pptxNITW_Improving Deep Neural Networks.pptx
NITW_Improving Deep Neural Networks.pptx
 
Apache Spark Based Hyper-Parameter Selection and Adaptive Model Tuning for De...
Apache Spark Based Hyper-Parameter Selection and Adaptive Model Tuning for De...Apache Spark Based Hyper-Parameter Selection and Adaptive Model Tuning for De...
Apache Spark Based Hyper-Parameter Selection and Adaptive Model Tuning for De...
 
Adapted Branch-and-Bound Algorithm Using SVM With Model Selection
Adapted Branch-and-Bound Algorithm Using SVM With Model SelectionAdapted Branch-and-Bound Algorithm Using SVM With Model Selection
Adapted Branch-and-Bound Algorithm Using SVM With Model Selection
 
Heuristic design of experiments w meta gradient search
Heuristic design of experiments w meta gradient searchHeuristic design of experiments w meta gradient search
Heuristic design of experiments w meta gradient search
 
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
 
An overview of gradient descent optimization algorithms.pdf
An overview of gradient descent optimization algorithms.pdfAn overview of gradient descent optimization algorithms.pdf
An overview of gradient descent optimization algorithms.pdf
 

Mais de Universitat Politècnica de Catalunya

Generation of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in VideosGeneration of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
Universitat Politècnica de Catalunya
 

Mais de Universitat Politècnica de Catalunya (20)

Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Deep Generative Learning for All
Deep Generative Learning for AllDeep Generative Learning for All
Deep Generative Learning for All
 
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
 
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-NietoTowards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
 
The Transformer - Xavier Giró - UPC Barcelona 2021
The Transformer - Xavier Giró - UPC Barcelona 2021The Transformer - Xavier Giró - UPC Barcelona 2021
The Transformer - Xavier Giró - UPC Barcelona 2021
 
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
 
Open challenges in sign language translation and production
Open challenges in sign language translation and productionOpen challenges in sign language translation and production
Open challenges in sign language translation and production
 
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in VideosGeneration of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
 
Discovery and Learning of Navigation Goals from Pixels in Minecraft
Discovery and Learning of Navigation Goals from Pixels in MinecraftDiscovery and Learning of Navigation Goals from Pixels in Minecraft
Discovery and Learning of Navigation Goals from Pixels in Minecraft
 
Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...
 
Intepretability / Explainable AI for Deep Neural Networks
Intepretability / Explainable AI for Deep Neural NetworksIntepretability / Explainable AI for Deep Neural Networks
Intepretability / Explainable AI for Deep Neural Networks
 
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
 
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
 
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
 
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
 
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
 
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
 
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
 
Curriculum Learning for Recurrent Video Object Segmentation
Curriculum Learning for Recurrent Video Object SegmentationCurriculum Learning for Recurrent Video Object Segmentation
Curriculum Learning for Recurrent Video Object Segmentation
 
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
 

Último

Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
AroojKhan71
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
amitlee9823
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
amitlee9823
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
amitlee9823
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
amitlee9823
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 

Último (20)

Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptx
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 

Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Workshop 2017)

  • 1. Kevin McGuinness kevin.mcguinness@dcu.ie Research Fellow Insight Centre for Data Analytics Dublin City University DEEP LEARNING WORKSHOP Dublin City University 27-28 April 2017 Training Deep Networks with Backprop Day 1 Lecture 4 1
  • 5. Fitting deep networks to data We need an algorithm to find good weight configurations. This is an unconstrained continuous optimization problem. We can use standard iterative optimization methods like gradient descent. To use gradient descent, we need a way to find the gradient of the loss with respect to the parameters (weights and biases) of the network. Error backpropagation is an efficient algorithm for finding these gradients. Basically an application of the multivariate chain rule and dynamic programming. In practice, computing the full gradient is expensive. Backpropagation is typically used with stochastic gradient descent.
  • 6. Gradient descent If we had a way to compute the gradient of the loss with respect to the parameters, we could use gradient descent to optimize 6
  • 7. Stochastic gradient descent Computing the gradient for the full dataset at each step is slow ● Especially if the dataset is large! For most losses we care about, the total loss can be expressed as a sum (or average) of losses on the individual examples The gradient is the average of the gradients on individual examples 7
  • 8. Stochastic gradient descent SGD: estimate the gradient using a subset of the examples ● Pick a single random training example ● Estimate a (noisy) loss on this single training example (the stochastic gradient) ● Compute gradient wrt. this loss ● Take a step of gradient descent using the estimated loss 8
  • 9. Stochastic gradient descent Advantages ● Very fast (only need to compute gradient on single example) ● Memory efficient (does not need the full dataset to compute gradient) ● Online (don’t need full dataset at each step) Disadvantages ● Gradient is very noisy, may not always point in correct direction ● Convergence can be slower Improvement ● Estimate gradient on small batch of training examples (say 50) ● Known as mini-batch stochastic gradient descent 9
  • 10. Finding the gradient with backprop Combination of the chain rule and dynamic programming Chain rule: allows us to find gradient of the loss with respect to any input, activation, or parameter Dynamic programming: reuse computations from previous steps. You don’t need to evaluate the full chain for every parameter.
  • 11. The chain rule Easily differentiate compositions of functions. 11
  • 16. Modular backprop You could use the chain rule on all the individual neurons to compute the gradients with respect to the parameters and backpropagate the error signal. Much more useful to use the layer abstraction Then define the backpropation algorithm in terms of three operations that layers need to be able to do. This is called modular backpropagation
  • 21. Modular backprop Using this idea, it is possible to create many types of layers ● Linear (fully connected layers) ● Activation functions (sigmoid, ReLU) ● Convolutions ● Pooling ● Dropout Once layers support the backward and forward operations, they can be plugged together to create more complex functions Convolution Input Error (L) Gradients ReLU Linear Gradients Output Error (L+1) 21
  • 22. Implementation notes Caffe and Torch Libraries like Caffe and Torch implement backpropagation this way To define a new layer, you need to create an class and define the forward and backward operations Theano and TensorFlow Libraries like Theano and TensorFlow operate on a computational graph To define a new layer, you only need to specify the forward operation. Autodiff is used to automatically infer backward. You also don't need to implement backprop manually in Theano or TensorFlow. It uses computational graph optimizations to automatically factor out common computations. 22
  • 23. Practical tips for training deep nets 23
  • 24. Choosing hyperparameters Can already see we have lots of hyperparameters to choose: 1. Learning rate 2. Regularization constant 3. Number of epochs 4. Number of hidden layers 5. Nodes in each hidden layer 6. Weight initialization strategy 7. Loss function 8. Activation functions 9. … :( Choosing these is a bit of an art. Good news: in practice many configurations work well There are some reasonable heuristics. E.g 1. Try 0.1 for the learning rate. If this diverges, divide by 3. Repeat. 2. Try an existing network architecture and adapt it for your problem 3. Try overfit the data with a big model, then regularize You can also do a hyperparameter search if you have enough compute: ● Randomized search tends to work well
  • 25. Choosing the learning rate For first order optimization methods, we need to choose a learning rate (aka step size) ● Too large: overshoots local minimum, loss increases ● Too small: makes very slow progress, can get stuck ● Good learning rate: makes steady progress toward local minimum Usually want a higher learning rate at the start and a lower one later on. Common strategy in practice: ● Start off with a high LR (like 0.1 - 0.001), ● Run for several epochs (1 - 10) ● Decrease LR by multiplying a constant factor (0.1 - 0.5) w L Loss w t α too large Good α α too small 25
  • 26. Training and monitoring progress 1. Split data into train, validation, and test sets ○ Keep 10-30% of data for validation 2. Fit model parameters on train set using SGD 3. After each epoch: ○ Test model on validation set and compute loss ■ Also compute whatever other metrics you are interested in, e.g. top-5 accuracy ○ Save a snapshot of the model 4. Plot learning curves as training progresses 5. Stop when validation loss starts to increase 6. Use model with minimum validation loss epoch Loss Validation loss Training loss Best model 26
  • 27. Divergence Symptoms: ● Training loss keeps increasing ● Inf, NaN in loss Try: ● Reduce learning rate ● Zero center and scale inputs/targets ● Check weight initialization strategy (monitor gradients) ● Numerically check your gradients ● Clip gradient norm w L Loss wt α too large 27
  • 28. Slow convergence Symptoms: ● Training loss decreases slowly ● Training loss does not decrease Try: ● Increase learning rate ● Zero center and scale inputs/targets ● Check weight initialization strategy (monitor gradients) ● Numerically check gradients ● Use ReLUs ● Increase batch size ● Change loss, model architecture w L Loss wt 28
  • 29. Overfitting Symptoms: ● Validation loss decreases at first, then starts increasing ● Training loss continues to go down Try: ● Find more training data ● Add stronger regularization ○ dropout, drop-connect, L2 ● Data augmentation (flips, rotations, noise) ● Reduce complexity of your model epoch Loss Validation loss Training loss 29
  • 30. Underfitting Symptoms: ● Training loss decreases at first but then stops ● Training loss still high ● Training loss tracks validation loss Try: ● Increase model capacity ○ Add more layers, increase layer size ● Use more suitable network architecture ○ E.g. multi-scale architecture ● Decrease regularization strength epoch Loss Validation loss Training loss 30