Evolution of Deep Learning and new advancements

Chitta Ranjan
Chitta RanjanData Scientist
Evolution of Deep Learning:
New Methods and Applications
Chitta Ranjan, Ph.D.
Pandora Media.
Feb 15, 2018
nk.chitta.ranjan@gmail.com
1
Evolution of Deep Learning
Outline
• Background
• Challenges
• Solutions
2
Evolution of Deep Learning
How does our brain work?
• How do we know where the ball
will fall?
3
Evolution of Deep Learning
How does our brain work?
• How do we know where the ball
will fall?
• Do we solve these equations in our
head? No.
4
! =
#$
%
sin% )
2+
, =
#$
%
sin 2)
+
- =
2#$ sin )
+
Evolution of Deep Learning
How does our brain work?
• How do we know where the ball
will fall?
• Do we solve these equations in our
head? No.
• Perhaps we break the problem into
pieces and solve it.
5
Evolution of Deep Learning
Traditional block model
One model for the whole problem
6
• One solver to solve it all.
• Has limitation for complex
problems.
! =
#$
% sin% )
2+
, =
#$
% sin 2)
+
- =
2#$ sin )
+
Evolution of Deep Learning
Neural Network
7
• A neuron solves a piece of the big
problem.
• Understand the inter-relationships
between the pieces.
• Merge the small solutions to find
the solution.
Evolution of Deep Learning
Neural Network
8
• Can we have bidirectional
connections?
Evolution of Deep Learning
Neural Network
9
• Can we have bidirectional
connections?
• Can we have edges connecting
neurons in the same layer?
Evolution of Deep Learning
Neural Network
10
• Can we have bidirectional
connections?
• Can we have edges connecting
neurons in the same layer?
• Is Neural Network an Ensemble
model?
Evolution of Deep Learning
Birth of Neural Network
11
Evolution of Deep Learning
Perceptron (1958)
12
Rosenblatt, F. (1960). Perceptron simulation experiments. Proceedings of the IRE, 48(3), 301-309.
Evolution of Deep Learning
Perceptron (1958)
∑
!"
!#
!$
%"
%#
%$
= ∑%(!(
+1
−1
Non-linear
13
• A non-linear computation cell.
• Non-linear cells became the
building block of Neural Networks.
Rosenblatt, F. (1960). Perceptron simulation experiments. Proceedings of the IRE, 48(3), 301-309.
Evolution of Deep Learning
Multi-layer Perceptron (1986)
14
• Nodes are Perceptrons.
• Layers of Perceptrons.
• Relationships (weights on arcs)
found using newly-developed
Backpropagation.
The nonlinear part is critical. Without it, it is equivalent the big block model.
Rumelhart, David E., Geoffrey E. Hinton, and R. J. Williams. "Learning Internal Representations by
Error Propagation". David E. Rumelhart, James L. McClelland, and the PDP research group.
(editors), Parallel distributed processing: Explorations in the microstructure of cognition, Volume 1:
Foundation. MIT Press, 1986.
Evolution of Deep Learning
Multi-layer Perceptron (1986)
15
• Nodes are Perceptrons.
• Layers of Perceptrons.
• Relationships (weights on arcs)
found using newly-developed
Backpropagation.
The nonlinear part is critical. Without it, it is equivalent the big block model.
Evolution of Deep Learning
Multi-layer Perceptron (1986)
16
• Nodes are Perceptrons.
• Layers of Perceptrons.
• Relationships (weights on arcs)
found using newly-developed
Backpropagation.
The nonlinear part is critical. Without it, it is equivalent the big block model.
Rumelhart, David E., Geoffrey E. Hinton, and R. J. Williams. "Learning Internal Representations by
Error Propagation". David E. Rumelhart, James L. McClelland, and the PDP research group.
(editors), Parallel distributed processing: Explorations in the microstructure of cognition, Volume 1:
Foundation. MIT Press, 1986.
Evolution of Deep Learning
Some definitions
17
Activation
function
Neuron/node
Layer
Network depth
Networkwidth
Weight/
connection/arc
Input
Output
Evolution of Deep Learning
We learned..
18
Evolution of Deep Learning
So far we learned
• Problem to be broken into pieces (at
nodes).
• Non-linear decision makers.
19
Evolution of Deep Learning
Timeline
20
Evolution of Deep Learning
1980
Capsules
SeLU
2017
Dropout
2012
ReLU
ResNet, 152 layers
GoogLeNet, 22 layers*
VGG Net, 19 layers
AlexNet, 8 layers
Layers
Perceptron
1958
1969
Perceptron criticized—
XOR problem
∑
!"
!#
!$
%"
%#
%$
= ∑%(!(
+1
−1
1987
1986
Multilayer Perceptron—
Backpropagation
Inputs
Outputs
Forward direction
Backward direction
AI Winter I
(74-80)
2006
CNN for
handwritten image
1998
CNN—Neocognitron
AI Winter II
(87-93) 1997
LSTM
DBM—Faster
learning
*The overall number of layers (independent
building blocks) used for the construction of
the network is about 100.
21
MNIST
Evolution of Deep Learning
Challenges
Computation
GPU!
22
Evolution of Deep Learning
Challenges
Computation
GPU!
23
Evolution of Deep Learning
Challenges
Estimation
Overfitting
Vanishing gradient
Dropout
Activation functions
24
Evolution of Deep Learning
Dropout
25
Evolution of Deep Learning
Let’s take a step back..
26
⋮ ⋮ ⋮ ⋮ ⋮
• Learning becomes difficult in large
networks.
• Off-the-shelf L1/L2 regularization
was used.
• They did not work.
Evolution of Deep Learning
Silenced by L1 (L2)
• Regularization happens based on the
predictive/information capability of a node.
27
Evolution of Deep Learning
Silenced by L1 (L2)
• Regularization happens based on the
predictive/information capability of a node.
• The weak nodes are always
(deterministically) thrown out.
• Weak nodes do not get a say.
28
*Loosely speaking
Evolution of Deep Learning
Co-adaptation
• Nodes co-adapt.
• Rely on presence of another node.
• Few nodes do the heavy lifting while others
do nothing.
29
Evolution of Deep Learning 30
Wide networks doesn’t really help.
Evolution of Deep Learning
Dropout (2014)
• Presence of node is a matter of chance
31
Silencing Co-adaptation
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I.,& Salakhutdinov, R. (2014).
Dropout: A simple way to prevent neural networks from overfitting. The Journal of
Machine Learning Research, 15(1), 1929-1958.
Evolution of Deep Learning
Dropout with Gaussian gate (2017)
• Regular dropout: multiply activations
with Bernoulli RVs.
• Generalization: Multiply with any RV.
32
!"
!#
!$
!%
~'(!)(+)
~'(!)(+)
~'(!)(+)
~'(!)(+)
Molchanov, D., Ashukha, A.,&Vetrov, D. (2017). Variational dropout sparsifies deep
neural networks. arXiv preprint arXiv:1701.05369.
Evolution of Deep Learning
Dropout with Gaussian gate (2017)
• Regular dropout: multiply activations
with Bernoulli RVs.
• Generalization: Multiply with any RV.
• Gaussian gates is found to improve
dropout’s performance.
33
!"
!#
!$
!%
~'(!)(+)
~'(!)(+)
~'(!)(+)
~'(!)(+)
~-(0,1)
~-(0,1)
~-(0,1)
~-(0,1)
Molchanov, D., Ashukha, A.,&Vetrov, D. (2017). Variational dropout sparsifies deep
neural networks. arXiv preprint arXiv:1701.05369.
Evolution of Deep Learning
Activation functions
34
Evolution of Deep Learning
Vanishing Gradient in Deep Networks
35
⋮ ⋮ ⋮ ⋮ ⋮
""""
• Learning was still difficult in large
networks.
• Activation functions at the time
caused the gradient to vanish in
lower layers.
• Difficult to learn weights.
Backpropagation
Evolution of Deep Learning 36
Deep networks doesn’t really help.
Evolution of Deep Learning
Vanishing gradient
• Because sigmoid and tanh functions had saturation regions on both
sides.
37
sigmoid tanh
Evolution of Deep Learning
New Activations
Resolving vanishing gradient
Rectified Linear Unit (ReLU), 2013
38
Maas, A. L., Hannun, A. Y.,&Ng, A. Y. (2013, June). Rectifier nonlinearities improve
neural network acoustic models. In Proc. icml (Vol. 30, No. 1, p. 3).
Clevert, D. A., Unterthiner, T.,&Hochreiter, S. (2015). Fast and accurate deep network learning
by exponential linear units (elus). arXiv preprint arXiv:1511.07289.
Exponential Linear Unit (ELU), 2016
Saturation region on only one side (left) for these activations.
Evolution of Deep Learning
We learned..
39
Evolution of Deep Learning
So far we learned
• Problem to be broken into pieces (at
nodes).
• Non-linear decision makers.
• Challenges met
• Overfitting: Dropout
• Vanishing gradient: New
activations
40
Evolution of Deep Learning
Model types
41
Evolution of Deep Learning
Types of Models
• Unsupervised
• Deep Belief Networks (DBN)
• Supervised
• Feed-forward Neural Network (FNN)
• Recurrent Neural Network (RNN)
• Convolutional Neural Network (CNN)
42
Evolution of Deep Learning
Deep Belief Networks (DBN)
43
Evolution of Deep Learning
Restricted Boltzmann Machine (RBM)
• Has two layers
• Visible: Think of input data
• Hidden: Think of latent factors
• Learn features from data that can
generate the same training data.
44
HiddenVisible
FeaturesData Data
Evolution of Deep Learning
Restricted Boltzmann Machine (RBM)
• Has two layers
• Visible: Think of input data
• Hidden: Think of latent factors
• Learn features from data that can
generate the same training data.
• Bi-directional node relationship.
45
HiddenVisible
FeaturesData
Evolution of Deep Learning
Deep Belief Nets (2006)
Stacked RBMs/Autoencoders
46
• Fast greedy algorithm—learn one layer at a
time.
• Feature extraction and Unsupervised pre-
training.
• MNIST digit classification: Yielded much better
accuracy.
• Used in sensor data.
• Was dying technology after vanishing gradient
was resolved with new ReLU, ELU activations.
Hinton, G. E., Osindero, S.,&Teh, Y. W. (2006). A fast learning
algorithm for deep belief nets. Neural computation, 18(7),
1527-1554.
Evolution of Deep Learning
Multimodal Modeling (2012)
Comeback of DBN
Image data Text data
47
Yellow,
flower
+
• Used to create fused representations by
combining features across modalities.
• Representations useful for classification
and information retrieval.
• Works even if
• Some data modalities are missing, e.g. image-
text.
• Different observation frequencies, e.g. sensor
data.
Srivastava, N.,& Salakhutdinov, R. R. (2012). Multimodal learning with deep
boltzmann machines. In Advances in neural information processing systems (pp.
2222-2230).
Liu, Z., Zhang, W., Quek, T. Q.,&Lin, S. (2017, March). Deep fusion of heterogeneous
sensor data. In Acoustics, Speech and Signal Processing (ICASSP), 2017 IEEE International
Conference on (pp. 5965-5969). IEEE.
Evolution of Deep Learning
Feed-forward Neural Network (FNN)
48
Evolution of Deep Learning
FNN
49
• One of the earliest type of NN—
Multilayer Perceptrons (MLP).
• No success story—learning more than 4
layer deep network was difficult.
• Typically only used as last (top) layers in
other networks.
• Then came SELU activation.
Evolution of Deep Learning
Scaled Exponential Linear Units (SELU), 2017
Self-normalizing Neural Networks. New life for FNNs.
50
Klambauer, G., Unterthiner, T., Mayr, A.,&Hochreiter, S. (2017). Self-normalizing neural networks.
In Advances in Neural Information Processing Systems (pp. 972-981).
• Activations automatically converge to zero
mean and unit variance.
• Converges in presence of noise and
perturbations.
• Allows
• train deep networks with many layers,
• employ strong regularization schemes, and
• to make learning highly robust.
Evolution of Deep Learning
Recurrent Neural Network (RNN)
51
Evolution of Deep Learning
RNN
Image source: http://colah.github.io/posts/2015-08-Understanding-LSTMs/
52
• For temporal data.
• An RNN passes a message to a successor.
Evolution of Deep Learning
RNN
Image source: http://colah.github.io/posts/2015-08-Understanding-LSTMs/
53
• For temporal data.
• An RNN passes a message to a successor.
• Learns dependencies with past.
Evolution of Deep Learning
RNN
Image source: http://colah.github.io/posts/2015-08-Understanding-LSTMs/
54
*Bengio, Y., Simard, P.,&Frasconi, P. (1994). Learning long-term dependencies with gradient
descent is difficult. IEEE transactions on neural networks, 5(2), 157-166.
• For temporal data.
• An RNN passes a message to a successor.
• Learns dependencies with past.
• Failed to learn long-term dependencies*.
• Then came LSTM.
Evolution of Deep Learning
Long short-term memory (LSTM), 1997
55
Image source: http://colah.github.io/posts/2015-08-Understanding-LSTMs/
RNN
LSTM
Hochreiter, S.,&Schmidhuber, J. (1997). Long short-term memory. Neural
computation, 9(8), 1735-1780.
Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk,
H.,&Bengio, Y. (2014). Learning phrase representations using RNN encoder-decoder for
statistical machine translation. arXiv preprint arXiv:1406.1078.
• A special kind of RNN capable of learning long-term
dependencies.
• The added gates regulate addition or removal of
passing information.
• Found powerful in:
• natural language processing,
• unsegmented connected handwriting recognition
• speech recognition
• Gated Recurrent Units (GRUs), 2014
• Fewer parameters than LSTM.
• Performance comparable or lower than LSTM (so far).
Evolution of Deep Learning
Attention Based Model (2015)
• CNN together with LSTM.
• Automatically learns
• to fix gaze on salient objects.
• Object alignments.
• Object relationships with sequence of
words.
56
Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhudinov, R., ...&Bengio, Y.
(2015, June). Show, attend and tell: Neural image caption generation with visual
attention. In International Conference on Machine Learning (pp. 2048-2057).
Fig. 1. Attention model architecture.
Fig. 2. Examples of attending to the correct object (white
indicates the attended regions, underlines indicated the
corresponding word).
Evolution of Deep Learning
Convolutional Neural Network (CNN)
57
Evolution of Deep Learning
CNN
• The workhorse of Deep Learning
• CNN revolution started with LeCun
(1998)—outperformed other
methods on handwritten digit
MNIST data.
58
LeCun, Y., Bottou, L., Bengio, Y.,&Haffner, P. (1998). Gradient-based learning
applied to document recognition. Proceedings of the IEEE, 86(11), 2278-2324.
Fig. 1. LeCun (1998) architecture.
Evolution of Deep Learning
CNN
• The workhorse of Deep Learning
• CNN revolution started with LeCun
(1998)—outperformed other
methods on handwritten digit
MNIST data.
• CNN learns object defining features.
59
LeCun, Y., Bottou, L., Bengio, Y.,& Haffner, P. (1998). Gradient-based learning
applied to document recognition. Proceedings of the IEEE, 86(11), 2278-2324.
Fig. 1. LeCun (1998) architecture.
Fig. 2. Feature learning in CNN.
Evolution of Deep Learning
AlexNet (2012)
New estimation techniques
60
• Performed best on ImageNet data—
ILSVRC 2012 winner.
• A difficult dataset with more than 1000
categories (labels).
• Similar to LeNet-5 with 5 conv and 3
dense layers. But with
• Max Pooling
• ReLU nonlinearity
• Dropout regularization
• Data augmentation.
Krizhevsky, A., Sutskever, I.,& Hinton, G. E. (2012). Imagenet classification with deep convolutional
neural networks. In Advances in neural information processing systems (pp. 1097-1105).
Evolution of Deep Learning
GoogLeNet (2014)
Inception module
• Introduced the idea that CNN layers can be
stacked in serial and parallel.
• Has 22 layer CNN and was the winner of
ILSVRC 2014.
• Let the model decide on the conv. size, e.g.
3x3 or 5x5.
• Puts each convolution in parallel
• Concatenate the resulting feature maps
before going to the next layer.
61
Image source: http://slazebni.cs.illinois.edu/spring17/lec01_cnn_architectures.pdfSzegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., ...&Rabinovich, A.
(2015). Going deeper with convolutions (2014). arXiv preprint arXiv:1409.4842, 7.
Evolution of Deep Learning
Microsoft’s ResNet (2015)
Residual Network
• Went aggressive on adding layers.
• Evaluated depth up to 152 layers on
ImageNet—8x deeper than VGG nets but
still lower complexity.
• How deep can we go?
62
Evolution of Deep Learning
Microsoft’s ResNet (2015)
Residual Network
• How deep can we go? With more layers
• Training and test accuracy drops.
• Degradation due to difficulty in optimization.
• Introduced Residual Network
• Residual network idea: add additional information (the
conv transformation F(x)) in input data and pass to next
layer.
• Traditional CNNs: we learn a completely different
transformation F(x) and pass it on for more
transformation.
• The authors found residual network is easier to optimize
in very deep networks.
63
Fig. 1. Training error (left) and test error (right) on CIFAR-10
with 20- and 50- layer ”plain” networks. The deeper network
has higher training error, and thus test error.
Fig. 2. Residual learning: a building block.
He, K., Zhang, X., Ren, S.,&Sun, J. (2016). Deep residual learning for image
recognition. In Proceedings of the IEEE conference on computer vision and
pattern recognition (pp. 770-778).
Evolution of Deep Learning
Capsules (2017)
Going to the next level
• CNNs do not understand spatial relationships
between features.
• Come Capsules
• preserves hierarchical pose relationships
between object parts.
• makes model understand a new view is just
another view of same thing.
• Performance
• Cut error rate by 45%.
• Used a fraction of the data compared to a CNN.
64
Fig. 1. For CNN, the position of features do not matter.
Image source: https://medium.com/ai³-theory-practice-business/understanding-
hintons-capsule-networks-part-i-intuition-b4b559d1159b
Sabour, S., Frosst, N.,&Hinton, G. E. (2017). Dynamic routing between capsules.
In Advances in Neural Information Processing Systems (pp. 3859-3869).
Fig. 2. Capsules understand all images are the same object.
Evolution of Deep Learning
We learned..
65
Evolution of Deep Learning
In summary, we learned
• Problem to be broken into pieces (at
nodes).
• Non-linear decision makers.
• Challenges met
• Overfitting: Dropout
• Vanishing gradient: New activations
• Scaled Exponential Linear Units—will
bring FNN to forefront.
• Capsules—more closer to how brain
works.
66
Evolution of Deep Learning
In summary, we learned
• Multimodal models with DBM.
• LSTM+CNN for attention based model.
• Inception: Let model figure conv size.
• Residual network: Can learn deeper.
67
Yellow,
flower
Evolution of Deep Learning
Thank you!
68
Evolution of Deep Learning
Why is non-linear activation required?
69
!"
!#
!$
%&
Given !
' " = ) " ! + + "
, " = - " (' " )
' #
= ) #
, "
+ + #
, #
= - #
(' #
)
Layer-1
Layer-2
' # = ) # , " + + #
= ) # ' " + + #
= ) #
() "
! + + "
) + + #
= ) #
) "
! + () #
+ "
+ + #
)
= )′! + +′
⇒ ' #
~	!
⇒ ' 4 ~	!
⋮ Any number of layers
collapse to one.
Processed information
transfer due to non-linear
activation
If this activation is linear, i.e. , "
= ' "
,
then it becomes equivalent to passing
the original input ! to the next layer.
1 de 69

Recomendados

Computer Vision with Deep Learning por
Computer Vision with Deep LearningComputer Vision with Deep Learning
Computer Vision with Deep LearningCapgemini
2.6K visualizações23 slides
What is Deep Learning? por
What is Deep Learning?What is Deep Learning?
What is Deep Learning?NVIDIA
26.6K visualizações12 slides
Introduction to CNN por
Introduction to CNNIntroduction to CNN
Introduction to CNNShuai Zhang
8.8K visualizações18 slides
Machine Learning vs. Deep Learning por
Machine Learning vs. Deep LearningMachine Learning vs. Deep Learning
Machine Learning vs. Deep LearningBelatrix Software
3.8K visualizações40 slides
Deep learning - A Visual Introduction por
Deep learning - A Visual IntroductionDeep learning - A Visual Introduction
Deep learning - A Visual IntroductionLukas Masuch
57.5K visualizações53 slides
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation... por
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation..."Semantic Segmentation for Scene Understanding: Algorithms and Implementation...
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation...Edge AI and Vision Alliance
1.6K visualizações22 slides

Mais conteúdo relacionado

Mais procurados

What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori... por
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...Simplilearn
3.6K visualizações54 slides
An introduction to reinforcement learning por
An introduction to  reinforcement learningAn introduction to  reinforcement learning
An introduction to reinforcement learningJie-Han Chen
801 visualizações60 slides
Convolutional neural network por
Convolutional neural networkConvolutional neural network
Convolutional neural networkMojammilHusain
1.1K visualizações11 slides
Introduction of Deep Reinforcement Learning por
Introduction of Deep Reinforcement LearningIntroduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement LearningNAVER Engineering
5.3K visualizações61 slides
An introduction to Deep Learning por
An introduction to Deep LearningAn introduction to Deep Learning
An introduction to Deep LearningJulien SIMON
4.3K visualizações31 slides
Photo-realistic Single Image Super-resolution using a Generative Adversarial ... por
Photo-realistic Single Image Super-resolution using a Generative Adversarial ...Photo-realistic Single Image Super-resolution using a Generative Adversarial ...
Photo-realistic Single Image Super-resolution using a Generative Adversarial ...Hansol Kang
1.7K visualizações62 slides

Mais procurados(20)

What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori... por Simplilearn
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
Simplilearn3.6K visualizações
An introduction to reinforcement learning por Jie-Han Chen
An introduction to  reinforcement learningAn introduction to  reinforcement learning
An introduction to reinforcement learning
Jie-Han Chen801 visualizações
Convolutional neural network por MojammilHusain
Convolutional neural networkConvolutional neural network
Convolutional neural network
MojammilHusain1.1K visualizações
Introduction of Deep Reinforcement Learning por NAVER Engineering
Introduction of Deep Reinforcement LearningIntroduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement Learning
NAVER Engineering5.3K visualizações
An introduction to Deep Learning por Julien SIMON
An introduction to Deep LearningAn introduction to Deep Learning
An introduction to Deep Learning
Julien SIMON4.3K visualizações
Photo-realistic Single Image Super-resolution using a Generative Adversarial ... por Hansol Kang
Photo-realistic Single Image Super-resolution using a Generative Adversarial ...Photo-realistic Single Image Super-resolution using a Generative Adversarial ...
Photo-realistic Single Image Super-resolution using a Generative Adversarial ...
Hansol Kang1.7K visualizações
文献紹介:Temporal Convolutional Networks for Action Segmentation and Detection por Toru Tamaki
文献紹介:Temporal Convolutional Networks for Action Segmentation and Detection文献紹介:Temporal Convolutional Networks for Action Segmentation and Detection
文献紹介:Temporal Convolutional Networks for Action Segmentation and Detection
Toru Tamaki146 visualizações
Machine learning por vaishnavip23
Machine learningMachine learning
Machine learning
vaishnavip23111 visualizações
VJAI Paper Reading#3-KDD2019-ClusterGCN por Dat Nguyen
VJAI Paper Reading#3-KDD2019-ClusterGCNVJAI Paper Reading#3-KDD2019-ClusterGCN
VJAI Paper Reading#3-KDD2019-ClusterGCN
Dat Nguyen416 visualizações
Introduction Of Artificial neural network por Nagarajan
Introduction Of Artificial neural networkIntroduction Of Artificial neural network
Introduction Of Artificial neural network
Nagarajan18.4K visualizações
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S... por Simplilearn
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
Simplilearn13.6K visualizações
Convolutional Neural Network and Its Applications por Kasun Chinthaka Piyarathna
Convolutional Neural Network and Its ApplicationsConvolutional Neural Network and Its Applications
Convolutional Neural Network and Its Applications
Kasun Chinthaka Piyarathna4.6K visualizações
An introduction to deep reinforcement learning por Big Data Colombia
An introduction to deep reinforcement learningAn introduction to deep reinforcement learning
An introduction to deep reinforcement learning
Big Data Colombia5.1K visualizações
Deep Learning for Recommender Systems por inovex GmbH
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
inovex GmbH1.5K visualizações
Object Detection and Recognition por Intel Nervana
Object Detection and Recognition Object Detection and Recognition
Object Detection and Recognition
Intel Nervana5.1K visualizações
Reinforcement learning por DongHyun Kwak
Reinforcement learningReinforcement learning
Reinforcement learning
DongHyun Kwak925 visualizações
입문 Visual SLAM 14강 - 2장 Introduction to slam por jdo
입문 Visual SLAM 14강  - 2장 Introduction to slam입문 Visual SLAM 14강  - 2장 Introduction to slam
입문 Visual SLAM 14강 - 2장 Introduction to slam
jdo 7K visualizações
Face Recognition Methods based on Convolutional Neural Networks por Elaheh Rashedi
Face Recognition Methods based on Convolutional Neural NetworksFace Recognition Methods based on Convolutional Neural Networks
Face Recognition Methods based on Convolutional Neural Networks
Elaheh Rashedi6.6K visualizações
Convolutional Neural Network (CNN) por Muhammad Haroon
Convolutional Neural Network (CNN)Convolutional Neural Network (CNN)
Convolutional Neural Network (CNN)
Muhammad Haroon824 visualizações

Similar a Evolution of Deep Learning and new advancements

MLIP - Chapter 3 - Introduction to deep learning por
MLIP - Chapter 3 - Introduction to deep learningMLIP - Chapter 3 - Introduction to deep learning
MLIP - Chapter 3 - Introduction to deep learningCharles Deledalle
374 visualizações95 slides
Deep learning 1 por
Deep learning 1Deep learning 1
Deep learning 1Karthick Thiyagu
50 visualizações19 slides
TensorFlow London: Cutting edge generative models por
TensorFlow London: Cutting edge generative modelsTensorFlow London: Cutting edge generative models
TensorFlow London: Cutting edge generative modelsSeldon
242 visualizações37 slides
Introduction to deep learning por
Introduction to deep learningIntroduction to deep learning
Introduction to deep learningAmr Rashed
362 visualizações68 slides
introduction to deeplearning por
introduction to deeplearningintroduction to deeplearning
introduction to deeplearningEyad Alshami
901 visualizações36 slides
Deep Learning, an interactive introduction for NLP-ers por
Deep Learning, an interactive introduction for NLP-ersDeep Learning, an interactive introduction for NLP-ers
Deep Learning, an interactive introduction for NLP-ersRoelof Pieters
13.7K visualizações83 slides

Similar a Evolution of Deep Learning and new advancements(20)

MLIP - Chapter 3 - Introduction to deep learning por Charles Deledalle
MLIP - Chapter 3 - Introduction to deep learningMLIP - Chapter 3 - Introduction to deep learning
MLIP - Chapter 3 - Introduction to deep learning
Charles Deledalle374 visualizações
Deep learning 1 por Karthick Thiyagu
Deep learning 1Deep learning 1
Deep learning 1
Karthick Thiyagu50 visualizações
TensorFlow London: Cutting edge generative models por Seldon
TensorFlow London: Cutting edge generative modelsTensorFlow London: Cutting edge generative models
TensorFlow London: Cutting edge generative models
Seldon242 visualizações
Introduction to deep learning por Amr Rashed
Introduction to deep learningIntroduction to deep learning
Introduction to deep learning
Amr Rashed362 visualizações
introduction to deeplearning por Eyad Alshami
introduction to deeplearningintroduction to deeplearning
introduction to deeplearning
Eyad Alshami901 visualizações
Deep Learning, an interactive introduction for NLP-ers por Roelof Pieters
Deep Learning, an interactive introduction for NLP-ersDeep Learning, an interactive introduction for NLP-ers
Deep Learning, an interactive introduction for NLP-ers
Roelof Pieters13.7K visualizações
From neural networks to deep learning por Viet-Trung TRAN
From neural networks to deep learningFrom neural networks to deep learning
From neural networks to deep learning
Viet-Trung TRAN6.9K visualizações
Artificial Intelligence, Machine Learning and Deep Learning por Sujit Pal
Artificial Intelligence, Machine Learning and Deep LearningArtificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep Learning
Sujit Pal22.9K visualizações
lec01.pptx por Basavaraju43
lec01.pptxlec01.pptx
lec01.pptx
Basavaraju436 visualizações
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2 por Karthik Murugesan
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
Karthik Murugesan522 visualizações
Towards better analysis of deep convolutional neural networks por 曾 子芸
Towards better analysis of deep convolutional neural networksTowards better analysis of deep convolutional neural networks
Towards better analysis of deep convolutional neural networks
曾 子芸350 visualizações
Intro to Neural Networks por Dean Wyatte
Intro to Neural NetworksIntro to Neural Networks
Intro to Neural Networks
Dean Wyatte1.6K visualizações
Introduction to Deep Learning for Image Analysis at Strata NYC, Sep 2015 por Turi, Inc.
Introduction to Deep Learning for Image Analysis at Strata NYC, Sep 2015Introduction to Deep Learning for Image Analysis at Strata NYC, Sep 2015
Introduction to Deep Learning for Image Analysis at Strata NYC, Sep 2015
Turi, Inc.2.8K visualizações
Fundamental of deep learning por Stanley Wang
Fundamental of deep learningFundamental of deep learning
Fundamental of deep learning
Stanley Wang1.3K visualizações
DEF CON 24 - Clarence Chio - machine duping 101 por Felipe Prado
DEF CON 24 - Clarence Chio - machine duping 101DEF CON 24 - Clarence Chio - machine duping 101
DEF CON 24 - Clarence Chio - machine duping 101
Felipe Prado70 visualizações
MDEC Data Matters Series: machine learning and Deep Learning, A Primer por Poo Kuan Hoong
MDEC Data Matters Series: machine learning and Deep Learning, A PrimerMDEC Data Matters Series: machine learning and Deep Learning, A Primer
MDEC Data Matters Series: machine learning and Deep Learning, A Primer
Poo Kuan Hoong1.1K visualizações
Artificial Neural Network (draft) por James Boulie
Artificial Neural Network (draft)Artificial Neural Network (draft)
Artificial Neural Network (draft)
James Boulie2K visualizações
Deep Learning for Personalized Search and Recommender Systems por Benjamin Le
Deep Learning for Personalized Search and Recommender SystemsDeep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender Systems
Benjamin Le37.4K visualizações
Deep learning por Simon Belak
Deep learningDeep learning
Deep learning
Simon Belak1.4K visualizações
Deep Learning in Recommender Systems - RecSys Summer School 2017 por Balázs Hidasi
Deep Learning in Recommender Systems - RecSys Summer School 2017Deep Learning in Recommender Systems - RecSys Summer School 2017
Deep Learning in Recommender Systems - RecSys Summer School 2017
Balázs Hidasi11.3K visualizações

Último

Spesifikasi Lengkap ASUS Vivobook Go 14 por
Spesifikasi Lengkap ASUS Vivobook Go 14Spesifikasi Lengkap ASUS Vivobook Go 14
Spesifikasi Lengkap ASUS Vivobook Go 14Dot Semarang
35 visualizações1 slide
The Research Portal of Catalonia: Growing more (information) & more (services) por
The Research Portal of Catalonia: Growing more (information) & more (services)The Research Portal of Catalonia: Growing more (information) & more (services)
The Research Portal of Catalonia: Growing more (information) & more (services)CSUC - Consorci de Serveis Universitaris de Catalunya
73 visualizações25 slides
.conf Go 2023 - How KPN drives Customer Satisfaction on IPTV por
.conf Go 2023 - How KPN drives Customer Satisfaction on IPTV.conf Go 2023 - How KPN drives Customer Satisfaction on IPTV
.conf Go 2023 - How KPN drives Customer Satisfaction on IPTVSplunk
88 visualizações20 slides
The details of description: Techniques, tips, and tangents on alternative tex... por
The details of description: Techniques, tips, and tangents on alternative tex...The details of description: Techniques, tips, and tangents on alternative tex...
The details of description: Techniques, tips, and tangents on alternative tex...BookNet Canada
121 visualizações24 slides
SAP Automation Using Bar Code and FIORI.pdf por
SAP Automation Using Bar Code and FIORI.pdfSAP Automation Using Bar Code and FIORI.pdf
SAP Automation Using Bar Code and FIORI.pdfVirendra Rai, PMP
19 visualizações38 slides
AI: mind, matter, meaning, metaphors, being, becoming, life values por
AI: mind, matter, meaning, metaphors, being, becoming, life valuesAI: mind, matter, meaning, metaphors, being, becoming, life values
AI: mind, matter, meaning, metaphors, being, becoming, life valuesTwain Liu 刘秋艳
35 visualizações16 slides

Último(20)

Spesifikasi Lengkap ASUS Vivobook Go 14 por Dot Semarang
Spesifikasi Lengkap ASUS Vivobook Go 14Spesifikasi Lengkap ASUS Vivobook Go 14
Spesifikasi Lengkap ASUS Vivobook Go 14
Dot Semarang35 visualizações
.conf Go 2023 - How KPN drives Customer Satisfaction on IPTV por Splunk
.conf Go 2023 - How KPN drives Customer Satisfaction on IPTV.conf Go 2023 - How KPN drives Customer Satisfaction on IPTV
.conf Go 2023 - How KPN drives Customer Satisfaction on IPTV
Splunk88 visualizações
The details of description: Techniques, tips, and tangents on alternative tex... por BookNet Canada
The details of description: Techniques, tips, and tangents on alternative tex...The details of description: Techniques, tips, and tangents on alternative tex...
The details of description: Techniques, tips, and tangents on alternative tex...
BookNet Canada121 visualizações
SAP Automation Using Bar Code and FIORI.pdf por Virendra Rai, PMP
SAP Automation Using Bar Code and FIORI.pdfSAP Automation Using Bar Code and FIORI.pdf
SAP Automation Using Bar Code and FIORI.pdf
Virendra Rai, PMP19 visualizações
AI: mind, matter, meaning, metaphors, being, becoming, life values por Twain Liu 刘秋艳
AI: mind, matter, meaning, metaphors, being, becoming, life valuesAI: mind, matter, meaning, metaphors, being, becoming, life values
AI: mind, matter, meaning, metaphors, being, becoming, life values
Twain Liu 刘秋艳35 visualizações
AMAZON PRODUCT RESEARCH.pdf por JerikkLaureta
AMAZON PRODUCT RESEARCH.pdfAMAZON PRODUCT RESEARCH.pdf
AMAZON PRODUCT RESEARCH.pdf
JerikkLaureta15 visualizações
Understanding GenAI/LLM and What is Google Offering - Felix Goh por NUS-ISS
Understanding GenAI/LLM and What is Google Offering - Felix GohUnderstanding GenAI/LLM and What is Google Offering - Felix Goh
Understanding GenAI/LLM and What is Google Offering - Felix Goh
NUS-ISS41 visualizações
ChatGPT and AI for Web Developers por Maximiliano Firtman
ChatGPT and AI for Web DevelopersChatGPT and AI for Web Developers
ChatGPT and AI for Web Developers
Maximiliano Firtman181 visualizações
Igniting Next Level Productivity with AI-Infused Data Integration Workflows por Safe Software
Igniting Next Level Productivity with AI-Infused Data Integration Workflows Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Safe Software225 visualizações
[2023] Putting the R! in R&D.pdf por Eleanor McHugh
[2023] Putting the R! in R&D.pdf[2023] Putting the R! in R&D.pdf
[2023] Putting the R! in R&D.pdf
Eleanor McHugh38 visualizações
Java Platform Approach 1.0 - Picnic Meetup por Rick Ossendrijver
Java Platform Approach 1.0 - Picnic MeetupJava Platform Approach 1.0 - Picnic Meetup
Java Platform Approach 1.0 - Picnic Meetup
Rick Ossendrijver25 visualizações
Future of Learning - Khoong Chan Meng por NUS-ISS
Future of Learning - Khoong Chan MengFuture of Learning - Khoong Chan Meng
Future of Learning - Khoong Chan Meng
NUS-ISS33 visualizações
Roadmap to Become Experts.pptx por dscwidyatamanew
Roadmap to Become Experts.pptxRoadmap to Become Experts.pptx
Roadmap to Become Experts.pptx
dscwidyatamanew11 visualizações
Empathic Computing: Delivering the Potential of the Metaverse por Mark Billinghurst
Empathic Computing: Delivering  the Potential of the MetaverseEmpathic Computing: Delivering  the Potential of the Metaverse
Empathic Computing: Delivering the Potential of the Metaverse
Mark Billinghurst470 visualizações
Emerging & Future Technology - How to Prepare for the Next 10 Years of Radica... por NUS-ISS
Emerging & Future Technology - How to Prepare for the Next 10 Years of Radica...Emerging & Future Technology - How to Prepare for the Next 10 Years of Radica...
Emerging & Future Technology - How to Prepare for the Next 10 Years of Radica...
NUS-ISS16 visualizações
TouchLog: Finger Micro Gesture Recognition Using Photo-Reflective Sensors por sugiuralab
TouchLog: Finger Micro Gesture Recognition  Using Photo-Reflective SensorsTouchLog: Finger Micro Gesture Recognition  Using Photo-Reflective Sensors
TouchLog: Finger Micro Gesture Recognition Using Photo-Reflective Sensors
sugiuralab15 visualizações
Business Analyst Series 2023 - Week 3 Session 5 por DianaGray10
Business Analyst Series 2023 -  Week 3 Session 5Business Analyst Series 2023 -  Week 3 Session 5
Business Analyst Series 2023 - Week 3 Session 5
DianaGray10209 visualizações

Evolution of Deep Learning and new advancements

  • 1. Evolution of Deep Learning: New Methods and Applications Chitta Ranjan, Ph.D. Pandora Media. Feb 15, 2018 nk.chitta.ranjan@gmail.com 1
  • 2. Evolution of Deep Learning Outline • Background • Challenges • Solutions 2
  • 3. Evolution of Deep Learning How does our brain work? • How do we know where the ball will fall? 3
  • 4. Evolution of Deep Learning How does our brain work? • How do we know where the ball will fall? • Do we solve these equations in our head? No. 4 ! = #$ % sin% ) 2+ , = #$ % sin 2) + - = 2#$ sin ) +
  • 5. Evolution of Deep Learning How does our brain work? • How do we know where the ball will fall? • Do we solve these equations in our head? No. • Perhaps we break the problem into pieces and solve it. 5
  • 6. Evolution of Deep Learning Traditional block model One model for the whole problem 6 • One solver to solve it all. • Has limitation for complex problems. ! = #$ % sin% ) 2+ , = #$ % sin 2) + - = 2#$ sin ) +
  • 7. Evolution of Deep Learning Neural Network 7 • A neuron solves a piece of the big problem. • Understand the inter-relationships between the pieces. • Merge the small solutions to find the solution.
  • 8. Evolution of Deep Learning Neural Network 8 • Can we have bidirectional connections?
  • 9. Evolution of Deep Learning Neural Network 9 • Can we have bidirectional connections? • Can we have edges connecting neurons in the same layer?
  • 10. Evolution of Deep Learning Neural Network 10 • Can we have bidirectional connections? • Can we have edges connecting neurons in the same layer? • Is Neural Network an Ensemble model?
  • 11. Evolution of Deep Learning Birth of Neural Network 11
  • 12. Evolution of Deep Learning Perceptron (1958) 12 Rosenblatt, F. (1960). Perceptron simulation experiments. Proceedings of the IRE, 48(3), 301-309.
  • 13. Evolution of Deep Learning Perceptron (1958) ∑ !" !# !$ %" %# %$ = ∑%(!( +1 −1 Non-linear 13 • A non-linear computation cell. • Non-linear cells became the building block of Neural Networks. Rosenblatt, F. (1960). Perceptron simulation experiments. Proceedings of the IRE, 48(3), 301-309.
  • 14. Evolution of Deep Learning Multi-layer Perceptron (1986) 14 • Nodes are Perceptrons. • Layers of Perceptrons. • Relationships (weights on arcs) found using newly-developed Backpropagation. The nonlinear part is critical. Without it, it is equivalent the big block model. Rumelhart, David E., Geoffrey E. Hinton, and R. J. Williams. "Learning Internal Representations by Error Propagation". David E. Rumelhart, James L. McClelland, and the PDP research group. (editors), Parallel distributed processing: Explorations in the microstructure of cognition, Volume 1: Foundation. MIT Press, 1986.
  • 15. Evolution of Deep Learning Multi-layer Perceptron (1986) 15 • Nodes are Perceptrons. • Layers of Perceptrons. • Relationships (weights on arcs) found using newly-developed Backpropagation. The nonlinear part is critical. Without it, it is equivalent the big block model.
  • 16. Evolution of Deep Learning Multi-layer Perceptron (1986) 16 • Nodes are Perceptrons. • Layers of Perceptrons. • Relationships (weights on arcs) found using newly-developed Backpropagation. The nonlinear part is critical. Without it, it is equivalent the big block model. Rumelhart, David E., Geoffrey E. Hinton, and R. J. Williams. "Learning Internal Representations by Error Propagation". David E. Rumelhart, James L. McClelland, and the PDP research group. (editors), Parallel distributed processing: Explorations in the microstructure of cognition, Volume 1: Foundation. MIT Press, 1986.
  • 17. Evolution of Deep Learning Some definitions 17 Activation function Neuron/node Layer Network depth Networkwidth Weight/ connection/arc Input Output
  • 18. Evolution of Deep Learning We learned.. 18
  • 19. Evolution of Deep Learning So far we learned • Problem to be broken into pieces (at nodes). • Non-linear decision makers. 19
  • 20. Evolution of Deep Learning Timeline 20
  • 21. Evolution of Deep Learning 1980 Capsules SeLU 2017 Dropout 2012 ReLU ResNet, 152 layers GoogLeNet, 22 layers* VGG Net, 19 layers AlexNet, 8 layers Layers Perceptron 1958 1969 Perceptron criticized— XOR problem ∑ !" !# !$ %" %# %$ = ∑%(!( +1 −1 1987 1986 Multilayer Perceptron— Backpropagation Inputs Outputs Forward direction Backward direction AI Winter I (74-80) 2006 CNN for handwritten image 1998 CNN—Neocognitron AI Winter II (87-93) 1997 LSTM DBM—Faster learning *The overall number of layers (independent building blocks) used for the construction of the network is about 100. 21 MNIST
  • 22. Evolution of Deep Learning Challenges Computation GPU! 22
  • 23. Evolution of Deep Learning Challenges Computation GPU! 23
  • 24. Evolution of Deep Learning Challenges Estimation Overfitting Vanishing gradient Dropout Activation functions 24
  • 25. Evolution of Deep Learning Dropout 25
  • 26. Evolution of Deep Learning Let’s take a step back.. 26 ⋮ ⋮ ⋮ ⋮ ⋮ • Learning becomes difficult in large networks. • Off-the-shelf L1/L2 regularization was used. • They did not work.
  • 27. Evolution of Deep Learning Silenced by L1 (L2) • Regularization happens based on the predictive/information capability of a node. 27
  • 28. Evolution of Deep Learning Silenced by L1 (L2) • Regularization happens based on the predictive/information capability of a node. • The weak nodes are always (deterministically) thrown out. • Weak nodes do not get a say. 28 *Loosely speaking
  • 29. Evolution of Deep Learning Co-adaptation • Nodes co-adapt. • Rely on presence of another node. • Few nodes do the heavy lifting while others do nothing. 29
  • 30. Evolution of Deep Learning 30 Wide networks doesn’t really help.
  • 31. Evolution of Deep Learning Dropout (2014) • Presence of node is a matter of chance 31 Silencing Co-adaptation Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I.,& Salakhutdinov, R. (2014). Dropout: A simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research, 15(1), 1929-1958.
  • 32. Evolution of Deep Learning Dropout with Gaussian gate (2017) • Regular dropout: multiply activations with Bernoulli RVs. • Generalization: Multiply with any RV. 32 !" !# !$ !% ~'(!)(+) ~'(!)(+) ~'(!)(+) ~'(!)(+) Molchanov, D., Ashukha, A.,&Vetrov, D. (2017). Variational dropout sparsifies deep neural networks. arXiv preprint arXiv:1701.05369.
  • 33. Evolution of Deep Learning Dropout with Gaussian gate (2017) • Regular dropout: multiply activations with Bernoulli RVs. • Generalization: Multiply with any RV. • Gaussian gates is found to improve dropout’s performance. 33 !" !# !$ !% ~'(!)(+) ~'(!)(+) ~'(!)(+) ~'(!)(+) ~-(0,1) ~-(0,1) ~-(0,1) ~-(0,1) Molchanov, D., Ashukha, A.,&Vetrov, D. (2017). Variational dropout sparsifies deep neural networks. arXiv preprint arXiv:1701.05369.
  • 34. Evolution of Deep Learning Activation functions 34
  • 35. Evolution of Deep Learning Vanishing Gradient in Deep Networks 35 ⋮ ⋮ ⋮ ⋮ ⋮ """" • Learning was still difficult in large networks. • Activation functions at the time caused the gradient to vanish in lower layers. • Difficult to learn weights. Backpropagation
  • 36. Evolution of Deep Learning 36 Deep networks doesn’t really help.
  • 37. Evolution of Deep Learning Vanishing gradient • Because sigmoid and tanh functions had saturation regions on both sides. 37 sigmoid tanh
  • 38. Evolution of Deep Learning New Activations Resolving vanishing gradient Rectified Linear Unit (ReLU), 2013 38 Maas, A. L., Hannun, A. Y.,&Ng, A. Y. (2013, June). Rectifier nonlinearities improve neural network acoustic models. In Proc. icml (Vol. 30, No. 1, p. 3). Clevert, D. A., Unterthiner, T.,&Hochreiter, S. (2015). Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289. Exponential Linear Unit (ELU), 2016 Saturation region on only one side (left) for these activations.
  • 39. Evolution of Deep Learning We learned.. 39
  • 40. Evolution of Deep Learning So far we learned • Problem to be broken into pieces (at nodes). • Non-linear decision makers. • Challenges met • Overfitting: Dropout • Vanishing gradient: New activations 40
  • 41. Evolution of Deep Learning Model types 41
  • 42. Evolution of Deep Learning Types of Models • Unsupervised • Deep Belief Networks (DBN) • Supervised • Feed-forward Neural Network (FNN) • Recurrent Neural Network (RNN) • Convolutional Neural Network (CNN) 42
  • 43. Evolution of Deep Learning Deep Belief Networks (DBN) 43
  • 44. Evolution of Deep Learning Restricted Boltzmann Machine (RBM) • Has two layers • Visible: Think of input data • Hidden: Think of latent factors • Learn features from data that can generate the same training data. 44 HiddenVisible FeaturesData Data
  • 45. Evolution of Deep Learning Restricted Boltzmann Machine (RBM) • Has two layers • Visible: Think of input data • Hidden: Think of latent factors • Learn features from data that can generate the same training data. • Bi-directional node relationship. 45 HiddenVisible FeaturesData
  • 46. Evolution of Deep Learning Deep Belief Nets (2006) Stacked RBMs/Autoencoders 46 • Fast greedy algorithm—learn one layer at a time. • Feature extraction and Unsupervised pre- training. • MNIST digit classification: Yielded much better accuracy. • Used in sensor data. • Was dying technology after vanishing gradient was resolved with new ReLU, ELU activations. Hinton, G. E., Osindero, S.,&Teh, Y. W. (2006). A fast learning algorithm for deep belief nets. Neural computation, 18(7), 1527-1554.
  • 47. Evolution of Deep Learning Multimodal Modeling (2012) Comeback of DBN Image data Text data 47 Yellow, flower + • Used to create fused representations by combining features across modalities. • Representations useful for classification and information retrieval. • Works even if • Some data modalities are missing, e.g. image- text. • Different observation frequencies, e.g. sensor data. Srivastava, N.,& Salakhutdinov, R. R. (2012). Multimodal learning with deep boltzmann machines. In Advances in neural information processing systems (pp. 2222-2230). Liu, Z., Zhang, W., Quek, T. Q.,&Lin, S. (2017, March). Deep fusion of heterogeneous sensor data. In Acoustics, Speech and Signal Processing (ICASSP), 2017 IEEE International Conference on (pp. 5965-5969). IEEE.
  • 48. Evolution of Deep Learning Feed-forward Neural Network (FNN) 48
  • 49. Evolution of Deep Learning FNN 49 • One of the earliest type of NN— Multilayer Perceptrons (MLP). • No success story—learning more than 4 layer deep network was difficult. • Typically only used as last (top) layers in other networks. • Then came SELU activation.
  • 50. Evolution of Deep Learning Scaled Exponential Linear Units (SELU), 2017 Self-normalizing Neural Networks. New life for FNNs. 50 Klambauer, G., Unterthiner, T., Mayr, A.,&Hochreiter, S. (2017). Self-normalizing neural networks. In Advances in Neural Information Processing Systems (pp. 972-981). • Activations automatically converge to zero mean and unit variance. • Converges in presence of noise and perturbations. • Allows • train deep networks with many layers, • employ strong regularization schemes, and • to make learning highly robust.
  • 51. Evolution of Deep Learning Recurrent Neural Network (RNN) 51
  • 52. Evolution of Deep Learning RNN Image source: http://colah.github.io/posts/2015-08-Understanding-LSTMs/ 52 • For temporal data. • An RNN passes a message to a successor.
  • 53. Evolution of Deep Learning RNN Image source: http://colah.github.io/posts/2015-08-Understanding-LSTMs/ 53 • For temporal data. • An RNN passes a message to a successor. • Learns dependencies with past.
  • 54. Evolution of Deep Learning RNN Image source: http://colah.github.io/posts/2015-08-Understanding-LSTMs/ 54 *Bengio, Y., Simard, P.,&Frasconi, P. (1994). Learning long-term dependencies with gradient descent is difficult. IEEE transactions on neural networks, 5(2), 157-166. • For temporal data. • An RNN passes a message to a successor. • Learns dependencies with past. • Failed to learn long-term dependencies*. • Then came LSTM.
  • 55. Evolution of Deep Learning Long short-term memory (LSTM), 1997 55 Image source: http://colah.github.io/posts/2015-08-Understanding-LSTMs/ RNN LSTM Hochreiter, S.,&Schmidhuber, J. (1997). Long short-term memory. Neural computation, 9(8), 1735-1780. Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H.,&Bengio, Y. (2014). Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078. • A special kind of RNN capable of learning long-term dependencies. • The added gates regulate addition or removal of passing information. • Found powerful in: • natural language processing, • unsegmented connected handwriting recognition • speech recognition • Gated Recurrent Units (GRUs), 2014 • Fewer parameters than LSTM. • Performance comparable or lower than LSTM (so far).
  • 56. Evolution of Deep Learning Attention Based Model (2015) • CNN together with LSTM. • Automatically learns • to fix gaze on salient objects. • Object alignments. • Object relationships with sequence of words. 56 Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhudinov, R., ...&Bengio, Y. (2015, June). Show, attend and tell: Neural image caption generation with visual attention. In International Conference on Machine Learning (pp. 2048-2057). Fig. 1. Attention model architecture. Fig. 2. Examples of attending to the correct object (white indicates the attended regions, underlines indicated the corresponding word).
  • 57. Evolution of Deep Learning Convolutional Neural Network (CNN) 57
  • 58. Evolution of Deep Learning CNN • The workhorse of Deep Learning • CNN revolution started with LeCun (1998)—outperformed other methods on handwritten digit MNIST data. 58 LeCun, Y., Bottou, L., Bengio, Y.,&Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278-2324. Fig. 1. LeCun (1998) architecture.
  • 59. Evolution of Deep Learning CNN • The workhorse of Deep Learning • CNN revolution started with LeCun (1998)—outperformed other methods on handwritten digit MNIST data. • CNN learns object defining features. 59 LeCun, Y., Bottou, L., Bengio, Y.,& Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278-2324. Fig. 1. LeCun (1998) architecture. Fig. 2. Feature learning in CNN.
  • 60. Evolution of Deep Learning AlexNet (2012) New estimation techniques 60 • Performed best on ImageNet data— ILSVRC 2012 winner. • A difficult dataset with more than 1000 categories (labels). • Similar to LeNet-5 with 5 conv and 3 dense layers. But with • Max Pooling • ReLU nonlinearity • Dropout regularization • Data augmentation. Krizhevsky, A., Sutskever, I.,& Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems (pp. 1097-1105).
  • 61. Evolution of Deep Learning GoogLeNet (2014) Inception module • Introduced the idea that CNN layers can be stacked in serial and parallel. • Has 22 layer CNN and was the winner of ILSVRC 2014. • Let the model decide on the conv. size, e.g. 3x3 or 5x5. • Puts each convolution in parallel • Concatenate the resulting feature maps before going to the next layer. 61 Image source: http://slazebni.cs.illinois.edu/spring17/lec01_cnn_architectures.pdfSzegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., ...&Rabinovich, A. (2015). Going deeper with convolutions (2014). arXiv preprint arXiv:1409.4842, 7.
  • 62. Evolution of Deep Learning Microsoft’s ResNet (2015) Residual Network • Went aggressive on adding layers. • Evaluated depth up to 152 layers on ImageNet—8x deeper than VGG nets but still lower complexity. • How deep can we go? 62
  • 63. Evolution of Deep Learning Microsoft’s ResNet (2015) Residual Network • How deep can we go? With more layers • Training and test accuracy drops. • Degradation due to difficulty in optimization. • Introduced Residual Network • Residual network idea: add additional information (the conv transformation F(x)) in input data and pass to next layer. • Traditional CNNs: we learn a completely different transformation F(x) and pass it on for more transformation. • The authors found residual network is easier to optimize in very deep networks. 63 Fig. 1. Training error (left) and test error (right) on CIFAR-10 with 20- and 50- layer ”plain” networks. The deeper network has higher training error, and thus test error. Fig. 2. Residual learning: a building block. He, K., Zhang, X., Ren, S.,&Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-778).
  • 64. Evolution of Deep Learning Capsules (2017) Going to the next level • CNNs do not understand spatial relationships between features. • Come Capsules • preserves hierarchical pose relationships between object parts. • makes model understand a new view is just another view of same thing. • Performance • Cut error rate by 45%. • Used a fraction of the data compared to a CNN. 64 Fig. 1. For CNN, the position of features do not matter. Image source: https://medium.com/ai³-theory-practice-business/understanding- hintons-capsule-networks-part-i-intuition-b4b559d1159b Sabour, S., Frosst, N.,&Hinton, G. E. (2017). Dynamic routing between capsules. In Advances in Neural Information Processing Systems (pp. 3859-3869). Fig. 2. Capsules understand all images are the same object.
  • 65. Evolution of Deep Learning We learned.. 65
  • 66. Evolution of Deep Learning In summary, we learned • Problem to be broken into pieces (at nodes). • Non-linear decision makers. • Challenges met • Overfitting: Dropout • Vanishing gradient: New activations • Scaled Exponential Linear Units—will bring FNN to forefront. • Capsules—more closer to how brain works. 66
  • 67. Evolution of Deep Learning In summary, we learned • Multimodal models with DBM. • LSTM+CNN for attention based model. • Inception: Let model figure conv size. • Residual network: Can learn deeper. 67 Yellow, flower
  • 68. Evolution of Deep Learning Thank you! 68
  • 69. Evolution of Deep Learning Why is non-linear activation required? 69 !" !# !$ %& Given ! ' " = ) " ! + + " , " = - " (' " ) ' # = ) # , " + + # , # = - # (' # ) Layer-1 Layer-2 ' # = ) # , " + + # = ) # ' " + + # = ) # () " ! + + " ) + + # = ) # ) " ! + () # + " + + # ) = )′! + +′ ⇒ ' # ~ ! ⇒ ' 4 ~ ! ⋮ Any number of layers collapse to one. Processed information transfer due to non-linear activation If this activation is linear, i.e. , " = ' " , then it becomes equivalent to passing the original input ! to the next layer.