2. Regularization
โข The complexity of the DNN can
increase such that the training error
reduces but the testing error doesnโt.
โข Regularization is a technique which makes slight modifications to the
learning algorithm such that the model generalizes better.
โข This in turn improves the modelโs performance on the unseen data as
well.
3. Regularization techniques
โข Regularization refers to a set of different techniques that lower the
complexity of a neural network model during training, and thus
prevent the overfitting.
โข The following are the regularization techniques:
1. L1 & L2
2. Dropout
3. Early stopping
4. Dropout
โข Dropout works by causing hidden neurons of the neural network to
be unavailable during part of the training.
โข Dropping part of the neural network causes the remaining portion to
be trained to still achieve a good score even without the dropped
neurons.
โข This decreases co-adaption between neurons, which results in less
overfitting.
โข Dropout layers will periodically drop some of their neurons during
training. You can use dropout layers on regular feedforward neural
networks.
โข The following animation that shows how dropout works:
https://yusugomori.com/projects/deep-learning/dropout-relu
6. Dropout layer
โข The discarded neurons and their connections are shown as dashed lines.
โข The input layer has two input neurons as well as a bias neuron.
โข The second layer is a dense layer with three neurons as well as a bias
neuron.
โข The third layer is a dropout layer with six regular neurons even though
the program has dropped 50% of them.
โข While the program drops these neurons, it neither calculates nor trains
them. However, the final neural network will use all of these neurons for
the output. As previously mentioned, the program only temporarily
discards the neurons.
7. Dropout is like bootstrapping
โข Bootstrapping is one of the most simple ensemble techniques.
โข Bootstrapping simply trains a number of neural networks to perform exactly the
same task.
โข However, each of these neural networks will perform differently because of some
training techniques and the random numbers used in the neural network weight
initialization.
โข This process decreases overfitting through the consensus of differently trained
neural networks.
โข Dropout works somewhat like bootstrapping.
โข You might think of each neural network that results from a different set of neurons
being dropped out as an individual member in an ensemble.
โข As training progresses, the program creates more neural networks in this way.
โข However, dropout does not require the same amount of processing as does
bootstrapping.
8. L1 and L2 Regularization
โข The most common type of regularization for deep learning models is
the one that keeps the weights of the network small.
โข This type of regularization is called weight regularization and has two
different variations: L2 regularization and L1 regularization.
โข In weight regularization, a penalizing term is added to the loss
function. This term is either L2 norm (the sum of the squared values)
of the weights, or L1 norm (the sum of the absolute values) of the
weights.
9. Early Stopping
โข Early stopping is a kind of cross-validation strategy where we keep
one part of the training set as the validation set.
โข When we see that the performance on the validation set is getting
worse, we immediately stop the training on the model. This is known
as early stopping.
โข In the given image, we will stop training
at the dotted line since after that our model will start
overfitting on the training data.
10. Early stopping in Keras
โข In keras, we can apply early stopping using the callbacks function.
Below is the sample code for it.
โข Here, monitor denotes the quantity that needs to be monitored and
โval_lossโ denotes the validation error.
โข Patience denotes the number of epochs with no further
improvement after which the training will be stopped.